Article

A guide to Machine Learning Operations (MLOps)

Vani Dhandapani , Prabhakar Rallabhandi & Harika Maruboyina
Published: September 29, 2021

Less than half of ML models go into production! Why?

With the continuous evolution of Machine Learning technologies, mainstreaming of ML-based solutions and production rollouts at scale is gaining momentum rapidly. However, a Gartner study reveals that only 47% of enterprise AI/ML models go into production. Part of the reason lies in underestimating the fundamental realities of machine learning. Some of which include:  

  1. The manual and experimental nature of ML (one must try out different parameters, features, algorithms, hyperparameters, etc. manually)
  2. Manual tracking of Data dependencies
  3. The complexity of versioning and tracking of model experiments, training, and deployment
  4. Hidden ML technical debt accumulation 

Another reason lies in the lack of a streamlined process to execute ML projects. ML initiatives are a coordinated effort of multiple functions, and code is only one of them.

Machine Learning Operations (MLOps)
Machine Learning Operations (MLOps)

Research shows that most data science teams spend most of their time doing data wrangling, data preparation, software package & framework management, infrastructure configuration, and component integration – most of which can be generalized as supporting tasks. Even ML development per se has been chiefly manual thus far, driven by the Data Science and Data Engineering community. 

Overall, these factors determine the pace at which ML projects move from pilot to production stage. To bring a rigor of repeatability and reliability to ML-based solutions, enterprises need a mature MLOps ecosystem. This includes the right tools, processes, and appropriately skilled talent.

 

Decoding MLOps: A game-changer!

According to Boston Consulting Group (BCG), pioneers of AI at scale, companies that have scaled AI across the business have achieved meaningful value from their investments. These companies typically dedicate 10% of their AI investment to algorithms, 20% to technologies, and up to 70% to embedding AI into business processes and new ways of working. In other words, these organizations invest twice as much in people and processes as they do in technologies.

Organizations striving to move beyond experimentation and embed ML into their business processes will find MLOps to be a game-changer. MLOps is the emerging software engineering discipline that helps reduce the development cycle and accelerate the deployment velocity of ML-based solutions at scale. It operates at the intersection of DevOps, Data Engineering, and Machine Learning, as described in the illustration below. 

Machine Learning Operations (MLOps)
Machine Learning Operations (MLOps)
  • DevOps has helped software engineering teams with their struggles around fragmented toolchains, the big picture perspective, silos across development and operations teams, and long release cycle times. Similarly, MLOps addresses ML lifecycle challenges across Feature Store, Model Creation, Model Training and Evaluation, Model Versioning, Model Stage Transition, Model Serving, Monitoring and Governance. And this is achieved through - Continuous Integration (CI), Continuous Delivery (CD), and Continuous Training & Monitoring.    

  • While CI/CD addresses the challenges of Data Collection and Model Creation, Continuous Training caters to the experimental nature of the ML Lifecycle. As mentioned earlier, ML applications are very experimental. Many aspects such as Algorithms, Hyperparameters, etc., need to be tweaked continuously to achieve deployable accuracy levels. Once deployed, ML solutions are known to drift (Performance Degradation). This drift happens due to multiple factors -   Change in Data Profiles, Difference in trained pipeline vs Served pipeline, etc.  Because of this, deployed models need constant monitoring and governance, and Continuous Monitoring takes care of this. 

MLOps adoption: Journey from foundation to excellence

CDOs and CIOs planning to invest in MLOps will need to focus on People, Processes, and Tools/Accelerators. The first step can be to assess the MLOps maturity status. 

Machine Learning Operations (MLOps)
Machine Learning Operations (MLOps)

People:

  • As with DevOps requiring Dev and Ops teams to collaborate, MLOps requires multiple competencies such as Data engineers, ML engineers, and Business logic development teams to collaborate. Each needs to see different aspects of the ML application. 

Processes:

  • The process needs to be tailored based on the MLOps maturity assessment.
  • MLOps process needs to ensure integration with existing DevOps processes while also delivering the additional unique capabilities required to manage ML.

Tools/Accelerators:

The MLOps toolchain needs to provide visibility, managed access control, and collaboration features for the teams involved. There are three main categories of MLOps platforms available as of date.

  • Big cloud platforms such as AWS, Microsoft, and Google having built-in MLOps capabilities across their machine learning services.  

  • Open-sourced based MLOps Frameworks such as KubeFlow or MLFlow manage many aspects of the lifecycle of machine learning solutions. 

  • Offerings from AI focused players like DataBricks, DataRobot, Verta, etc. focus on the total life cycle or individual features of machine learning pipelines like training, monitoring, and governance.

  • In-house accelerators built on top of open-source tools streamlines the end-to-end ML lifecycle and, in some instances, provide advanced capabilities such as monitoring and governance.

The MLOps market is poised for a decent level of consolidation soon.     

Optimal practices for common issues

Challenges Best Practices
Skilled Resources Availability and Collaboration  MLOps needs diverse but related skills (Data Engineers, Data Scientists, and DevOps Engineers). Cross Skilling and Teaming skills need precise planning from the beginning.
Data Quality and ML Production Risks Data Deviation Detection, Canary Pipelines, etc., should be part of the ML applications.
Scalable Infrastructure  The ML application needs to be sized and mapped to specific Hardware (e.g. GPUs) and analytical engines as necessary for the pipelines.
Governance and Compliance  A comprehensive production governance mechanism is critical to ensure that ML applications are tracked for reproducibility, audits, and explainability.
RoI KPIs for business applications should be defined, tracked, and correlated with ML application’s performance.

Competitive insight into the market trends

Our internal (secondary) research to enquire organizational reality and their application of MLOps revealed the following insights:

Machine Learning Operations (MLOps) - Market Trends
Machine Learning Operations (MLOps) - Market Trends

The US Banking sector is an early mover (9 out of the top 10 US banks) with designated roles assigned to establish and execute MLOps. The US Automotive industry stands a close second with 7 out of the top 10 grappling with MLOps.  Pharma, P&C Insurance Players, and Healthcare Payers are lagging in this area – even though ML seems to be a technology they’re actively investing in, for business innovation. MLOps is relevant and necessary for businesses wanting to leap with AI/ML. Also, industry data shows that the market for MLOps solutions is expected to reach $4 billion by 2025, according to Forbes.

Integrate/ Operationalize your ML models with Virtusa's MLOps Platform

Virtusa offers an MLOps platform that can help accelerate ML development and deployments at scale. The platform serves as a single place for model development, lifecycle management, and monitoring. Model deployments to higher environments, including production, is simplified. The platform offers excellent monitoring and governance capabilities, enabling constant evaluation and continuous learning capabilities to avoid surprise changes in model performance in the future.

Virtusa's Machine Learning Operations (MLOps) Platform
Virtusa's Machine Learning Operations (MLOps) Platform

Conclusion:

To build a complete MLOps model evaluating the best tools, frameworks, and methodologies becomes a critical part of any organization. Understanding what the requirements are for the successful implementation of MLOps, as explained above, will help generate optimal results. While the MLOps platform helps improve the ML maturity curve of the organization, the accelerated adoption of MLOps enhances the success of AI/ML programs implementation by 50-60%. The downstream impact in terms of monetization opportunities is humongous. Virtusa's MLOps platform helps businesses create efficient workflows and improve customer experience by leveraging data analytics to accelerate decision-making.

Learn more about Virtusa's solutions on ML

 

References

  • Who Needs MLOps: What Data Scientists seek to Accomplish and How can MLOps help? – Paper published by Sasu Mäkinen, Henrik Skogström, Eero Laaksonen, Tommi Mikkonen on Cornell University’s portal: Link
  • Continuous Delivery for Machine Learning – Automating the End-to-End Life Cycle of Machine Learning Applications", by Danilo Sato, Arif Wider, and Christoph Windheuser published on September 2019, on the link https://martinfowler.com/articles/cd4ml.html
  • https://www.forbes.com/sites/tomtaulli/2020/08/01/mlops-what-you-need-to-know/
Related content