What is MLOps?

Machine learning applications have become the de facto solution to many problems in our daily lives. Building these applications is a little different from building standard software for several reasons. Some of these differences include: You need to constantly monitor the performance of models because they may degrade with time. Bringing a machine learning model […]
Jun 23rd 2021

Share this post

What is MLOps?

What is MLOps?

Derrick Mwiti

Data Scientist @ Layer

Machine learning applications have become the de facto solution to many problems in our daily lives. Building these applications is a little different from building standard software for several reasons. Some of these differences include:

  • You need to constantly monitor the performance of models because they may degrade with time.
  • Bringing a machine learning model to productions involves many people, including data scientists, data engineers, and business people.
  • Deploying and operating machine learning models at scale is a big challenge.

For a machine learning project to be successful, a set of practices, tools, and techniques need to be considered. These practices fall under a broad term known as MLOps. In this article, let’s take a look at various aspects of machine learning operations.

What is MLOps?

Building a machine learning model involves creating the model, training it, tuning, and deploying it. This process should be:

  • Scalable
  • Collaborative
  • Reproducible

For instance, it would be regrettable to build an excellent model but not reproduce the results in a production environment. The set of principles, tools, and techniques that ensure building machine learning models is scalable, collaborative, and reproducible are referred to as MLOps. In the world of software engineering, these practices are referred to as DevOps.

word image 9

Inspired by this paper.

DevOps vs. MLOps vs. Data Ops

DevOps are a set of principles that ensure that there is continuous delivery of high-quality software. In the machine learning realm, these practices are referred to as MLOps. DataOps involve a set of rules that ensure that high-quality data is available to analyze and train machine learning models. DataOps can be thought of as tightly integrating with MLOps.

word image 10

What problems does MLOps solve?

The set of tools and techniques defined in MLOps are geared towards making the lives of data scientists and machine learning practitioners easier. Let’s take a look at some of the problems that MLOps solves.


Versioning is a common practice in software engineering where tools such as Git and GitHub are used to version code. Apart from versioning code, other things need to be versioned in machine learning. These items include:

  • Data used in model training
  • Model artifacts

Versioning models and data ensures that machine learning experiments are reproducible.

Monitoring model performance

Models placed in production can degrade over time. This is caused by differences in training data and testing data. This is usually referred to as data drift. By monitoring the performance of a model, these issues can be identified and addressed quickly.

Feature generation

Creating features can be a time-consuming and compute-intensive task. Applying proper MLOps techniques ensures that features that are generated once can be reused as many times as needed. This frees up the data scientist to focus on designing and testing the model.

What skills do you need for MLOps?

MLOps is quite a broad field and requires quite several skills. Fortunately, you are not expected to have all of these skills. Specializing in a couple of areas makes more sense. However, here are the skills needed by an MLOps team to deliver a machine learning project successfully:

  • Ability to articulate the business problem and the objectives
  • Collect the data needed to solve the identified problem
  • Prepare and process the data so that it is acceptable by machine learning models
  • Create features that are important to the problem in question
  • Build and train machine learning models
  • Develop a pipeline for ingesting data, generating features, training, and evaluating the model
  • Deploy the model so that the actual users can use it. This can also be part of the above pipeline
  • Monitor how the model performs in the real world

With those basics out of the way, let’s now take a look at the main components of MLOps.

Parts of MLOps

Despite the field being quite broad, a couple of parts come together to make it one piece. In this section, we’ll explore those parts.

word image 11

Feature store

Also referred to as a feature factory, it stores the features used in training a machine learning model. It is a critical part of MLOps because it ensures that there is no duplicity in creating features. If necessary, features can also be fetched and used for building other models or for general analysis. Features are also versioned while in the feature store, ensuring that one can revert to a particular feature version that resulted in a better model.

Data versioning

Apart from versioning features, the entire dataset used to create a certain model can also be versioned. Versioning data ensures that there is reproducibility in the process of creating models. It is also essential during auditing since it makes it easier to identify the datasets used to develop various models.

ML metadata store

To get rid of the magic involved in creating machine learning models, one has to log everything. Logging is critical for reproducibility. Some of the essential items to log include:

  • The seed used in splitting the data. This ensures that you are using the same split when creating a training and testing set
  • The random state used to initialize the model. The random state affects the reproducibility of model training
  • Model metrics
  • Hyperparameters
  • Learning curves
  • Training code and configuration files
  • Code used to generate features
  • Hardware logs

Storing model metadata is vital for various reasons:

  • Building dashboard with different models
  • Enabling the searchability of models based on hyperparameters

Model versioning

Versioning models is important because it enables switching between models in real-time. Apart from that, multiple models can be served to users at the same time to monitor performance. For instance, once a new model is available, it can be served to a few users to ensure that it performs as expected before rolling it out to everyone. Versioning is also critical from a compliance, governance, and historical point of view.

Model registry

Once a model has been trained, it is stored in a model registry. Every model in the registry will have a version for reasons already mentioned above. Each model should also be coupled with its:

  • Hyperparameters
  • Metrics
  • Feature version used to create the model
  • Dataset version used in training the model

..to mention a few

The model metadata mentioned above is important for:

  • Compliance with regulations
  • Management of the models
  • Identifying the endpoints of models in production

Model artifacts will usually be saved automatically depending on the MLOps tool you are using. You can also instruct the tool to save the best model checkpoints and upload them to the registry.

Model serving

Once a model is in the registry it can be deployed and served to users. Serving a model means creating endpoints that can be used to run inference on the model. The model artifact can also be downloaded and packaged with an application. However, deploying API endpoints makes it easier to use the model in various applications. That said, there is a case for packaging the model in applications such as mobile applications to reduce the inference latency.

Model monitoring

Once machine learning models have been deployed, they have to be monitored for model drift and production skew. Model drift occurs when the statistical differences between the training data and inference data change in unexpected ways. The performance of the model thus degrades. You can catch these problems early by monitoring the statistical properties of the training and prediction data.

Production skew occurs when the served model performs dismally compared to the offline model. This can be caused by bugs during the training process, serving bugs, and discrepancies in training and inference data.

Model drift and production skew should be monitored to ensure that the model is behaving as expected.

Model retraining

Machine learning models can be retrained for two main reasons:

  • To improve the performance of the model
  • When new training data becomes available

Your machine learning pipeline should detect the availability of new data or the dismal performance of the model and trigger retraining of the model. The system should also detect and deprecate models that would not benefit from retraining.


Continuous integration and continuous deployment in machine learning ensure that high-quality models are created and deployed often. Continuous delivery ensures that code is frequently merged in a central repository where automated builds and tests are implemented. In machine learning, this would involve not only testing the code but also the resulting models. It also entails packaging the models in readiness for use by actual users.

Continuous delivery involves automatically deploying code changes to a staging or production environment. In a machine learning pipeline, this would involve deploying a model to test and or production servers. Frequent deployments are significant because they ensure that code and models are tested vigorously and often before moving them to production.

How to implement MLOps

You can create a system to implement the items we have mentioned above. Alternatively, you can use a machine learning pipeline orchestration platform that will make your workflow easier. Let’s discover some of the best tools that you can use for an ML pipeline orchestration.

MLOps solutions

The choice of a machine learning orchestration tool will depend on a couple of factors, including:

  • The skills of your team
  • Your budget
  • Whether you want to automate part of your pipeline or the entire pipeline
  • The ease of integrating the new tool

Just to mention a few.

Let’s now mention some of the best tools you can use to orchestrate your machine learning pipeline.

  • MLflow is an open-source platform for managing a machine learning life cycle. The platform can be used for ML experiment tracking, deployment as well as a central model registry.
  • Sacred is an open-source library that can be used to organize, log and reproduce machine learning experiments. It doesn’t ship with a web UI. Omniboard is a popular front-end library for Sacred.
  • ModelDB is an open-source tool for versioning models, storing model metadata, and managing machine learning experiments. It can be used for making ML pipelines reproducible as well as displaying model performance dashboards.

MLOps best practices

MLOps is a relatively new field; however, some best practices will lead to the success of your machine learning orchestration process when adhered to. Let’s mention some of them:

  • Use tools that are collaborative. This makes it easier for everyone on the team to access code, data, and information about the project, for example, the generated features. It also makes it easier to raise and track issues.
  • Start with a simple model. Starting with a simple model gives you adequate time to ensure that the infrastructure is right. A complex model means that you have to debug a complicated model and optimize the infrastructure it will run on.
  • Just launch. Don’t spend months on end building and deploying the machine learning model. It is better to launch the model as soon as possible to start testing it on actual users. You can serve the model to a small number of users to start getting initial feedback. That feedback can be used to iterate the model and infrastructure as necessary.
  • Perform automated regression tests. This is crucial to ensure that new code doesn’t introduce bugs in existing code. Code that fails the tests is not merged into the main source code. Regression testing ensures that new code doesn’t break existing functions.
  • Automate model deployments. This ensures that new models that pass certain tests become automatically available to users. It also frees up engineers from the manual process of packaging models for production. The process involves automated packing of models together with their dependencies and delivering them to production or staging environments. The models should be monitored constantly and automatically rolled back when they perform dismally.
  • Attach predictions to model versions and data. This makes it easier to track every prediction to a specific model and data. This is important for traceability, reproducibility, and compliance reasons. Logging predictions with data and model versions also makes it easy to debug models in the event of unexpected behavior.
  • Measure training and serving skew. Machine learning models may not always perform as expected when served for use with unseen data. As a result, it is crucial to measure the difference between performance on training and unseen data. If the difference is not acceptable by your standards, you will have to implement a way to alleviate that, for instance, reworking the features.
  • Implement shadow production. This involves using production data to make predictions on a model. The predictions are, however, not to used to make real-world decisions. They are compared with the decisions made by the existing system of making decisions even if that system is manual. When the decisions made by the model are acceptable, it can be promoted to making real decisions.
  • Hyperparameter tuning optimization. Selecting and searching for the best model parameters by hand can be a pain in the neck. Automating the process will hasten your machine learning experimentation process. The results of hyperparameter optimization can then be compared, and the best algorithms and parameters were chosen for production.

You have learned a lot from this article. Let’s summarize it in the table below.

A summary overview of MLOps

Item Description
What is MLOps? MLOps encompass a set of best practices that ensure that a machine learning project is completed successfully. These practices include automated testing of ML code, data validation, and automated deployment and rollback of ML models.
How is MLOps different from DevOps? DevOps are standards to ensure the success of a software project. In the machine learning space, there are several other items that one needs to be concerned about. For instance, a machine learning model’s performance can degrade over time. This could be a result of changes in data or errors during deployment. As a result of that MLOps is also concerned with ensuring that an ML model performs optimally at any point in time. It also ensures that the model is retrained in the event of dismal performance.
Which problems does MLOps Solve? MLOps practices are geared at solving the following ML problems:

  • Versioning of data and models to ensure reproducibility and traceability
  • Tracking the performance of the model to ensure that dismal model performance is addressed immediately
  • Generating and version, storing features so that they can be reused in the future
Which are the most crucial elements of an MLOps system? An end-to-end MLOps system should have the following elements:

  • A Featurestore where all the generated features are stored
  • Ability to version features so that each model is linked to its features
  • An ML metadata store to log all model metadata such as test and training size, model hyperparameters, hardware metrics, code used to generate the features, etc
  • A model registry for automatically storing trained models and serving them
  • Ability to version all models to enable easy switching from one model to another
  • Monitor the model in production to tell if it’s performing as required
Why should you use an end-to-end MLOps platform? Many solutions solve different parts of the machine learning life cycle. To ensure that you don’t get bogged down with multiple tools, you must select a solution covering every aspect of the ML lifecycle. An end-to-end solution should allow you to:

  • Perform machine learning experimentation
  • Generate features easily
  • Store and version the features
  • Manage and version multiple machine learning models
  • Serve and run inferences on the trained models
  • Monitor the machine learning models in production for concept and data drift
  • Retraining dismally performing models
  • Rollback to a previous model version if the current version performs poorly

Final thoughts

This article has been a primer into the world of machine learning operations, popularly known as MLOps. We have covered various aspects of MLOps as well as a couple of best practices. We have also looked at some tools that you can use to automate the MLOps process. More specifically, you have learned:

  • What is MLOps
  • The difference between MLOps and DevOps
  • Problems that MLOps solve
  • Skills needed to operate in the MLOps space
  • Various components of MLOps
  • End-to-end MLOps solutions

Just to mention a few.

Layer Newsletter

Get the latest machine learning blog posts in your inbox!

Also, join us on Slack and follow us on Linkedin and Twitter so that you don’t miss an update.


Rules of ML by Google

Engineering best practices for machine learning

Introducing MLOps

Jun 23rd 2021

Share this post

Start for Free

Get started with Layer

Start Free