Today, the hype around Machine Learning/Artificial Intelligence (ML/AI) often focuses on potentially breakthrough algorithms such as GPT-3 (and its upcoming successor GPT-4), which promise to transform the efficacy of Artificial Intelligence.
There is, however, a “picks-and-shovel” play that investors need to pay more attention to in ML/AI, as demonstrated by the accelerating volume of both investments and acquisitions over the past year.
The reality is it’s not the algorithms holding back AI. The quality of the data used to train and retrain algorithms is the problem, and MLOps is rapidly emerging as the solution. Ultimately, by helping businesses ensure high-quality data consistency across the ML project lifecycle, MLOps is set to take AI into the mainstream.
Machine learning models depend on high-quality datasets
ML /AI is a hot topic for both operating businesses and investors. However, there’s a stark reality behind all the hype. 87% of ML projects don’t get past the experiment phase and thus, never make it to full-scale production.
That’s a staggering statistic, albeit not a surprising one. Scaling ML models relies on high-quality datasets with sufficient volume of data that can deal with real-world users and ever-shifting conditions and hence is immensely challenging.
Exponential amounts of high-quality data are required to train and retrain algorithms to achieve peak performance.
Companies often opt to create their own industry-specific dataset using a team to carry manually, collect, review, and edit the data though this is labour intensive and is sub-optimal usage of high value data scientists and ML engineers’ time. For example, once data is collected, the laborious task of ensuring data is of high-quality begins. Without high-quality data, it is merely a case of garbage in, garbage out. The collected data must be unified and transformed from an unstructured form into standardised formats to ensure it makes sense. It must be cleaned and checked for corrupt, duplicate, and missing data points because even minor errors in training data can lead to wide-scale mistakes in a model’s output. Each data type must also be labelled to assign meaning so the algorithm develops an understanding of the information.
In addition, because data is a reflection of the real world, it naturally changes over time. As a result, there’s a need to continuously update datasets and track and retrain ML models to prevent model staleness as real-world conditions change and new edge cases occur. Just take a look at the recent impact of COVID-19 on retailers as an example. In recent years, ML models have become a valuable tool for retailers to forecast demand and save money on unsold inventory. These models learn from past sales data and buying patterns to estimate how much of a particular product to stock. In 2020, almost overnight, the coronavirus pandemic completely changed consumer behaviours and upended the typical sales cycle. This meant the historical data relied on the ML models was no longer relevant. The models were essentially rendered useless as they simply didn’t reflect reality.
Ultimately, managing and monitoring diverse datasets, retraining models, and building shareable and repeatable processes across departments is a significant headache for even the most well-resourced and AI-savvy company.
Translating ML applications from experimental projects into scalable, commercial business applications that deliver real value is still far too complicated.
MLOps is an approach to ML lifecycle management focused on streamlining the machine learning production process. It applies to the entire lifecycle, from data gathering, cleaning, and modelling creation to training, validation, monitoring, and retraining. Much like DevOps transformed the development and deployment of software, MLOps seeks to do the same to ML.
While MLOps is a relatively new field, investments are rising as more organisations seek to use ML solutions to optimise operations and meet growing consumer expectations. We’ve seen an accelerating number of transactions over the last year, including:
- Scale AI’s $325m round at $7bn valuation
- Labelbox’s $110m round led by Softbank
- Taskus’ IPO (at c.$3bn market cap)
The ecosystem has also begun to show signs of maturation with deals like Databricks $1.6bn Series H funding round ($38 billion valuation) and DataRobot’s $300m Series G ($6.3 billion valuation). By 2025, the global market for MLOps solutions is predicted to surpass $4 billion.
MLOps provides the missing piece of the AI puzzle
The world of ML/AI has made some incredible advancements over the past few decades. One needs only to look at the capability of ML/AI in AlphaGo, radiology, accelerating drug discovery, or self-driving cars to get a glimpse into what’s possible.
But these examples are the outliers – the headline makers. Deploying new ML models still remains an overwhelming challenge for most businesses. The process usually takes months and is fraught with difficulties, and the vast majority of models never even make it out of the production stage.
As more companies look to develop machine learning-powered products and services in the coming years, the demand for MLOps tools that make it easier to deploy new models will only continue to grow. At DAI Magister, we have worked extensively on these “picks and shovels” plays on the AI/ML space with transactions including CloudFactory’s $65m equity raise led by FTV Capital, Acunu’s acquisition by Apple, and most recently Bright Computing’s acquisition by Nvidia.
With lucrative new market opportunities on the horizon, we’re excited to continue engaging in this emergent and rapidly evolving ecosystem. For us, MLOps looks set to provide the breakthrough required to take AI to the next level. The implications for business and society will be profound.