Operationalizing machine learningadmin
Once a machine learning model has been built, the next step is to share it with the world so that other people can benefit from it. Creating a working model is the beginning of a more extensive process where machine learning is operationalized. But what exactly is operationalization?
Operationalization in machine learning is the process by which a model transitions from development into production. Without this process, data science teams become stuck in a silo of continuous modeling without providing any practical value to a business or organization. It does not matter how good your models get; if you cannot get them out the door and into the real world, they are useless.
Productionalizing machine learning is the challenge being faced by many data science teams today. Despite being comfortable building models in a development setting, most models still never make it into production. Deploying applications powered by machine learning requires synchronized deployment involving the code, the model, and the data used to train the model. This complicates related processes like logging, version control, and reproducibility.
We will break down the operationalization process into two main parts:
- Data and modeling
A lot of the conversation in machine learning revolves around the models being used. New optimization methods, complex architectures, and all the things that inspire awe and wonderment from outsiders always seem to take center stage. However, if you have been in the industry for long enough, you will know that the data models are trained on are just as important as the models.
Data looks boring when compared to the models they train. In saying this, their value and impact have mostly been underestimated. In a recent talk, Andrew Ng explored the concept of a data-centric approach to machine learning and compared it with a model-centric approach. Essentially, a machine learning system is composed of two things: code and data. If we want to improve the performance of our system, we can improve either of these components. But which one do we focus on?
In the ideal environment, there has to be a balance between the two approaches. In the model-centric approach, the data we have is more or less fixed after preprocessing. We then focus on iteratively improving the model. Any noise we find in the data is fixed by tuning the model to deal with it.
|Model-Centric Approach||Data-Centric Approach|
|Collect what data you can.||Consistency of the data is paramount.|
|Optimize the model so it can deal with the noise in the data.||Focus on data quality rather than modeling.|
|The data is fixed after preprocessing.||Iteratively improve data quality.|
|Model is iteratively improved.||Allows multiple models to perform well.|
In contrast to this, the data-centric approach holds the code/model fixed. The focus then becomes iteratively improving the quality of the data. Take note that the quality of the data should be our focus rather than quantity. People may say that more data is always better, but that is not always the case. If there is no variety in our data, it does not matter how much we have; the model’s performance will not improve.
Once the data and model have been taken care of, we turn our attention to what separates machine learning systems from the one-off experiments that have become so common in the industry. Deployment here is the method by which we put a machine learning model into production. We will look at different deployment strategies and as well as how to monitor a pipeline once it is up and running.
Here we will look at different deployment strategies for machine learning models and the benefits of each. A deployment strategy is a way to move a model from development into production or change an existing one. In our case, we assume that some version of our model is already running, and we want to update it with a newer version.
A good deployment strategy aims to switch between different versions of a model with as little downtime as possible. This way, a user cannot see the change, and their experience with the application or service is uninterrupted. By analyzing different strategies and their strengths, we will see which one best suits different scenarios.
Shadow mode is a deployment strategy where traffic from a production server runs through a newer version of a model without returning their responses or predictions. An older version of the model continues to run and return responses to the traffic in this deployment method. The older model version has already been validated.
Shadow Mode Deployment – Image by Author
The value in this approach comes in being able to test the model with live data without actually affecting the system’s current performance. A data science team will see what the model would have predicted, given real input from their system. This will quickly reveal the flaws of their model and allow them to fix any problems they see. After fixing these problems, they can iteratively run the updated version in shadow mode and fix any problems until they are satisfied with its performance.
The moment they are satisfied, they can easily switch from the old version to the new model. A lot less can go wrong at this point, as the team already knows how the model will perform before deploying it.
Canary deployment is a strategy where a new version of a model runs on a subset of production requests/servers. This usually starts as a small percentage and gradually increases as the model is validated and confidence in its ability grows. Using this deployment strategy poses a lower risk to current systems because, in the worst case, only a tiny percentage of the system’s traffic would be affected by a misbehaving model.
Canary Deployment – Image by Author
The term “canary deployment” actually comes from an old coal mining technique. A coal mine can be a very dangerous place filled with lethal gases like carbon monoxide. Miners came up with a way to protect themselves by using canary birds. These canaries were highly sensitive to these lethal gases, so miners would use them to detect their presence.
Due to their high sensitivity, the canaries would often perish from the gas before it reached the miners. This approach ensured the miners’ safety, with a single bird being able to save multiple human lives. A similar effect is seen when using this type of deployment with new models. The small number of requests a new model receives will detect problems before they reach the rest of the system.
In blue-green deployment, two versions of a model are run simultaneously on identical environments. Traffic then quickly gets rerouted (“swapped”) from the old version to the new version. The old version is normally called the blue version, while the new version is called the green version.
Blue-Green Deployment – Image By Author
In this deployment strategy, the blue version is usually kept on standby in case something goes wrong with the green version. The benefit here is the ability to roll back to your old environment rapidly. This is done by simply routing traffic back to the blue version, which does not have the new changes.
For blue-green deployment to work well, both environments must be identical. They may be running on different hardware entirely or be different virtual environments running on the same hardware. Another advantage to having these two similar environments running simultaneously is the ability to test your system’s disaster recovery plan. Ideally, if your green environment has any sort of failure, your system would seamlessly switch to the blue environment with minimal delay.
Once your system is up and running, it is crucial to ensure that everything is working as intended and stays that way. Identifying problems within the system (and preempting them) can save you a lot of time down the road when making new changes. Pipeline monitoring ensures that our machine learning system is working correctly by detecting and dealing with problems in production.
When working with complex machine learning systems, a lot can go wrong. But how do we know when something is going wrong? This is where metrics come into play. What are they?
A metric is some form of measurement. In our case, metrics are quantitative and are used to track the performance of a system in production. In the same way that we have metrics for checking the performance of a machine learning model, we also need metrics for ensuring that our system continues to function as intended given certain constraints.
Which metrics should we be tracking? When thinking about this problem, it is good to meet with your team and brainstorm everything that can go wrong with your system. Once you have a list of these failure points, you should try to figure out what metrics or statistics will help you detect these problems.
Input metrics check if the inputs going into our model are unusual in any way. These metrics will differ depending on the system or application. Say, for example, we have a face recognition system designed to recognize human faces. We may notice that the images being passed to it start to become darker. Maybe our users have begun taking pictures at night or in low-light settings.
Noticeable decrease in image brightness – Image by Author
If our model was not trained on low brightness images, its performance would surely suffer. The moment this starts to happen, we should have some trigger that alerts us to the change. The images here are the inputs to our model; hence they will be classified as input metrics.
The same thing applies to output metrics. These metrics might not directly be the output of our model (e.g., model prediction) but instead may indicate that something is wrong with our system. Going back to the example of the face recognition system, we might notice that a user keeps feeding the same image to our model. In this scenario, the model may be working fine, but it may indicate that something is wrong with another part of the system (maybe the interface for uploading images or how we render prediction results).
Increase in Number of Identical Image Uploads – Image by Author
The last type of metric we will talk about is the software metric. We must keep track of system constraints that do not have anything to do with the model but the system that runs it. If the machine hosting the model does not function properly, the model’s performance will not matter as it will be impossible for other people to use it.
Say we have a set of servers that host our model. We start noticing that the servers start crashing often. If we had a metric like our server’s load, we might see that the number of requests increases during certain times of the day. This could be a probable cause of the frequent crashes. Though not related to the model itself, it is still important to track metrics like these so that our system can handle incoming traffic.
In this section, let’s look at some of the main challenges data science teams may face during the operationalization of a machine learning system. Knowing about these challenges can help you and your team preempt them and develop a plan beforehand to deal with them.
The only constant in life is change. The same principle applies to our machine learning models, especially the data used to train them. Over time, the performance of any model in production will inevitably decrease. This can be its accuracy, error rate, or a non-technical business KPI that relies on the model. What causes this degradation?
Data drift, also known as covariate shift, is quite simple to understand. Essentially, the input data we are training on has changed. Its distribution is no longer the same as when we first trained the model. Due to this change, the model’s relevance degrades. It may still work, but not to the standard as when it was trained on fresh data.
Static vs retrained models – Image by Author
The rate of model decay can vary greatly. Some models can last several years without any major update. For example, a computer vision model trained on images of people will likely last longer than a model trained on stock market data. This is simply due to the nature of the data.
If we notice that the data our model has been trained on has changed significantly, this is a sign of data drift. Once a model’s performance starts to decrease, be sure to consider data drift as a likely cause. To alleviate the problem of data drift, we need to retrain our model on the new data or build an entirely new one for the changed data.
Related to data drift is concept drift. How do they differ? While data drift describes changes in our input data, concept drift occurs when the relationships between our input and output data have changed. An example of this would be if we have a model to forecast the sales of our existing product and the competition launches a new product to compete with ours. Due to this new product being in the market, the behavior of consumers will change, and so will our model.
Our input data may remain the same in this case, but we see that the model’s performance is still decreasing. Once we identify concept drift as the reason for model degradation, there are a few ways to fix it.
The first way is similar to data drift; we need to retrain the model on new data. It may be the case that we need to retrain on both old and new data for the model to adjust to new trends. Here, it may also be a good idea to have a higher weight assigned to newer data, so the model prioritizes it. Once enough of the new data is collected, we can drop the old data.
Sometimes, the first approach may not be enough. If the problem we are predicting has inherently evolved, we may need to tune the model’s scope, collect a new type of data, or change the way we process our existing data. In the example of the model to forecast product sales, we might change from a weekly or seasonal forecast to a daily one to account for less predictable behavior from our consumers.
One last important note to talk about is the difference between gradual and sudden changes in data. This seemingly small difference may pose significant challenges in maintaining your system if you are not prepared for it.
Gradual change, as the name implies, is a change that happens slowly over an extended period. Changes like inflation are good examples of gradual change as they happen over the years rather than days or weeks.
Sudden change, on the other hand, occurs within a shorter period. Think of events like the coronavirus pandemic, which forced people into quarantine in a matter of days to weeks. This caused consumer behavior to change almost overnight, leaving most models with stale data from the pre-pandemic period.
It is much harder to prepare for sudden change, and many times, the best we can do is try to work quickly enough to get our models back on track with minimal disruption. It is good practice to periodically retrain our models for gradual change depending on how fast our data distributions change.
If we know the change is “recurring”, we may want to include this change in the model itself as a form of seasonality. An example could be higher sales of a product during the holiday season or on a specific day like Black Friday. Gradual change and sudden change are inherently different, and the strategy for dealing with one may not work for the other. It is vital to have systems in place to deal with each of them.
A machine learning platform is any service or tool that allows the user to automate different parts of the machine learning operationalization process. To reduce the time it takes to bring models into production and effectively manage their life cycles, we should always try to use a machine learning platform. Here we will look at some of the main benefits of such a platform and how they can help us with our work.
One all too common problem that occurs in operationalization is the silo. Teams become stuck in their world and are unable to contribute to the company’s bottom line. Though they may develop interesting models, these models become what-ifs rather than actual value-adding projects.
A good machine learning platform comes with tools to be able to help ease the process of deployment. Not all data scientists have a background in software engineering, so having automated deployment will quickly overcome the silo problem. Once a model has been tested in a development environment, it will not take as much effort to productionalize it.
Being able to automate your pipeline is crucial to operationalizing machine learning. There are simply too many small, repetitive tasks that would be prohibitively expensive if a human did them. That is why we should try to automate as much of our pipeline as we can. This gives your team more time to work on the most pressing challenges while leaving the repetitive tasks to computers.
Having a dedicated machine learning platform is the perfect solution to this and why every company working with machine learning should have one. It makes the entire company function more efficiently, allowing data to move more easily across different teams and systems. With this ease of movement, the rate that you utilize your data will inevitably increase for other projects. If the data is easy to reach, people will be more willing to use it.
Another benefit of a machine learning platform is improved collaboration. Having a central platform will allow everyone on the team to work more cohesively. Rather than working individually, team members will be working on the same platform, introducing a more collaborative environment where the tasks of each member are unified under a common goal.
Employees always want to feel like the work they are doing is meaningful and is helpful to others. This point is related to overcoming silos within a company where people work in isolated environments. Working in these environments reduces morale as employees think their work does not matter. If everyone works on a centralized platform, the effects of their work are much more transparent, and their motivation to work and collaborate is increased.
Operationalizing machine learning effectively has a lot of different challenges associated with it. We need to create data processing pipelines, think about deployment, and scale up in the future during this process. Using a dedicated machine learning platform makes this process a lot easier. The overhead that comes with setting up the platform will be well worth it in the future as other processes become simplified and automated.
This inherently generates more business value as employee productivity increases. Since a machine learning platform can automate so much of your deployment pipeline, time is spent on more critical tasks than repetitive work.
The process of operationalizing machine learning has a lot of steps involved. It can be a complex process with many moving parts. To help you in operationalizing your machine learning models, here are some best practices to follow. These will cover several areas where difficulties often occur.
This might sound counterintuitive since this article is about operationalizing machine learning, but machine learning is not always the answer, especially when you are just starting. One of the biggest downsides to machine learning systems is that they require data—lots of it. If you’re just starting on a new project, chances are you won’t have much data at all. Without this requirement, any model you hope to create will most likely fall short of your expectations.
In this case, your best bet would be to use basic heuristics. A simple set of heuristics should be able to get you started while you collect more data. For example, if you’re building a system for recommending products to your customers, you could simply recommend the products that have been selling the most or have the most views. It’s better to wait before building your model instead of building a model that does not meet expectations.
At the beginning of a machine learning project, we may be tempted to jump straight into the latest cutting-edge models available to us. Most of the time, this is a bad idea as the complexity involved brings unforeseen challenges when bringing the model to production. The early models need not be anything fancy; what is important is that we get them into production quickly.
Instead, focus on integrating the model into your infrastructure and getting a reliable pipeline up and running. This will help tremendously with future deployments. The same goes for your input data. Keep your features simple to avoid having to preprocess your data too much before training. This also helps with debugging in the early stages of development. Once your infrastructure is set up properly, most of the work has already been done. Always remember that your model is just a small part of a much larger system.
This practice is related to the problems of data and concept drift we discussed earlier. Silent failures are very subtle, especially when working with machine learning models. Imagine that your training data involves a join to a table that has gone stale. Your model still works fine, but its performance slowly decreases over time. There is no error message, no warnings on your terminal, just a model that gradually becomes useless.
Updating tables and data like this can have a huge impact on the performance of your models. How will you know when your data needs an update? It is a good idea to keep separate statistics on any data you are using to train models. Setting some sort of simple notification system to tell you when the data has deviated by a significant margin will save you a lot of time down the road. Manually doing inspections on your data from time to time can also help ensure that your data and notification systems work correctly.
Operationalization is an iterative process. Do not expect the first model you deploy to be the last. As you iterate through your pipeline the first few times, you may start thinking about adding complexity to your models. When doing so, consider whether the complexity you add can slow down future deployments. There is always a trade-off to model complexity, and that extra 0.5% boost may not be worth it if it means slowing down your launch cycle by a large margin.
When do we deploy new models in the first place? It usually happens when we develop new features, tune hyperparameters, or tune the target objective to reflect new changes. Regardless of the cause, think about how the changes will impact your work in the future. How easy will it be to add, remove, or recombine features later on? Will you be able to create a copy of this pipeline and verify its performance? Will its results be reproducible? Keep questions like this in mind, and be cautious when adding complexity to your system.
Declarative MLOps helps us solve operationalization problems in a much simpler and more efficient way. A lot of the legwork and boilerplate will usually be done by a declarative system, leaving you and your team free to work on the more important challenges in deployment.
Operationalizing machine learning comes with a lot of different challenges – Source
Layer is a Declarative MLOps platform that helps streamline the process of operationalizing machine learning models. What exactly is Declarative MLOps, and how does it help us? In Declarative MLOps, a lot of the unimportant details are abstracted from the user. Instead of defining how to operationalize a model and each of the steps involved, you only need to define what you want to accomplish.
The key components of your machine learning system: datasets, features, and models, can all be abstracted as first-class entities. This allows you to manage and monitor them more easily. Also, since each of your components is an entity, reusability becomes extremely simple. Reusing a component like a dataset or ML model would be like instantiating a new object or calling a pre-existing function.
One key pain point often seen when operationalizing machine learning is the issue of version control. In a non-ML system, a lot of the work of version control falls just on the codebase. When you add ML into the mix, you have to worry about versioning the exact dataset you used for training, the preprocessed features, and the model with all its parameters. A slight hiccup in any one of these components results in a system where the output is not reproducible. In ML, this is a big no-no.
Declarative MLOps systems like Layer allow you to combine your data, ML model, and code into one tightly coupled version, saving you the headache of having to do it yourself. This is a big deal as teams will typically spend a good portion of their initial effort trying to version every component correctly.
In a previous section, we talked about pipeline monitoring and how important it is to track metrics relevant to the performance of your system. With a system like Layer, processes are traceable, allowing you to monitor your entities’ life cycle even after they have been deployed to production. You can track different input and output metrics and the business impact that your model has on your bottom line.
In this piece, we have discussed the process of operationalizing machine learning and all the work involved when putting a machine learning system into production. We looked at two approaches to data and modeling and saw how we should balance both to launch an effective ML system. Once a model has proven its worth in development, the next logical step is to deploy it to your system for consumption.
We saw different deployment strategies and how they allow you to deal with different problems in your pipeline. After deployment, we need to make sure our model stays up-to-date. Through pipeline monitoring, we can track various metrics related to our system, allowing us to see problems as they arise and anticipate issues that are likely to occur in the future.
Data is constantly changing, and we have to make sure the assumptions we made when we created our model stay the same as time goes on. We looked at the types of changes that occur as our model ages and ways to keep our model relevant. We also looked at some best practices involved in the operationalization process to prevent many commonly occurring problems.
Lastly, we saw how Declarative MLOps platforms like Layer can help streamline a lot of the operationalization process, allowing us to focus our time and energy on the most critical parts of our system.