CLAMP - Concur Labs AI and Machine Learning Prototypes

Making machine learning more reproducible and scalable

Machine learning projects can experience challenges beyond the actual model development, and are often hard to replicate and reproduce. In addition, the trained model and generated data sets can be difficult to validate. To complicate things further, data scientists regularly update deployed machine learning models manually. This time-consuming work prevents machine learning engineers and data scientists from solving new business problems with machine learning. All of these shortcomings result in a low return on investment for data science projects. In 2020, Concur Labs investigated and implemented solutions to increase data science ROI and to increase the certainty around machine learning projects.

Machine Learning Pipelines

With the help of machine learning pipelines, we were able to update models automatically. As soon as new data was placed in a cloud storage bucket, the pipelines validated the data set, performed the feature engineering, trained a machine learning model, validated and evaluated the model, and once all validation checks were passed, automatically deployed the model.

Machine Learning Pipeline Evangelism

We shared our learnings around machine learning pipelines externally and internally during a variety of conferences in 2019 and 2020, including the SAP Conference on Machine Learning, the Apache Beam 2020 Summit, and PyData Global. Hannes and Catherine, two of our teammates, also joined Laurence Moroney from Google during his “TensorFlow meets ...” YouTube session to talk about machine learning pipelines. In addition to sharing their knowledge in these talks, their learnings have also been published in the recent O’Reilly Media publication Building Machine Learning Pipelines.

BERT Projects

In addition to the pipeline work, we also built prototypes around novel machine learning applications using transformer architectures. These applications included:

  • - Question - Answer models using BERT
  • - Named entity recognition using transformer architectures
  • - Sentiment classification of text using BERT

You can play with our sentiment classification model below. Type in a fake movie review, and Hugo will try to classify it for you.

For effective model training and deployment, we collaborated with engineers from Google’s TensorFlow team and shared the results on Google’s TensorFlow blog. You can follow our work in Part One and Part Two of the blog.

To learn more about everything else we have been up to this year and to stay up to date with what we do next, keep an eye on our events page.