Why We Need DevOps for ML Data

Machine Learning is the subclass of artificial intelligence that focuses on using data and algorithms to mimic the way humans learn. The machine aims to identify patterns and make decisions with less human intervention. The steps involved in performing machine learning with great accuracy may become hectic sometimes, and it gets harder to get ML […]
Nov 12th 2021
read

Share this post

Nov 12th 2021
read

Share this post

Why We Need DevOps for ML Data

Neetika Khandelwal

Software Engineer

Machine Learning is the subclass of artificial intelligence that focuses on using data and algorithms to mimic the way humans learn. The machine aims to identify patterns and make decisions with less human intervention. The steps involved in performing machine learning with great accuracy may become hectic sometimes, and it gets harder to get ML models into production than traditional software deployment. Teams find it difficult to build and deploy ML models at the expected scale. This is because companies haven’t been able to bring DevOps practices to machine learning.

Before DevOps entered the ML world, the development and deployment of machine learning applications was as challenging and slow as the development and deployment of software 20 years ago. The evaluation and feedback loops used to be too long, and by the time the application was released, the requirements and designs you started with became obsolete. To solve this issue, MLOps comes to the rescue. In this blog, we will discuss:

  • DevOps fundamentals
  • How DevOps is different from traditional Software Development Life Cycle (SDLC)
  • DevOps usage in the ML world
  • Model-centric and data-centric approaches
  • Why there is a need to shift from model-centric to data-centric

What is DevOps?

DevOps is a portmanteau word created from two words, “Development” and “Operations.” One must not think of DevOps as a technology, tool, or programming language. It is a working practice to produce models from the development to the production stage. The development and the operations team remain in sync to deliver the end product efficiently to the user.

There is no problem when dealing with small-scale applications because only a few people manage the software. Things change when it comes to large-scale applications like Swiggy, Zomato, Youtube, etc. Different teams handle different parts of the website, and they collaboratively put it all into a mid-sized or large-sized application.

Typically, the development team is responsible for writing code (in any language like React, Java, Node, etc.), designing new features, and testing them. On the other hand, the operations team is responsible for scaling the server, maintaining the bandwidth, managing security, and keeping backups. In DevOps, both groups sit together to discuss everything side by side. If needed, they exchange roles and responsibilities so that everyone in the team is at the same level of the development cycle. There is no wall of confusion between the teams with the DevOps approach.

The infinity symbol of DevOps signifies that it is a continuous process of improving efficiency and constant activities. There are various phases in DevOps. Let’s have a look.

word image 42

Source

  • Planning phase: The development team creates a plan keeping in mind the objectives of the application to be delivered.
  • Coding phase: The team works on the programming part of the new feature and keeps different versions of the code using tools like Git and merges it when required.
  • Build: Code is made executable with tools like Maven and Gradle.
  • Testing phase: Now, the code is tested for any errors and bugs. Selenium is a popular tool for automation testing.
  • Deployment phase: Once the code passes test cases, it is now ready to be deployed and is sent to the operation team. They deploy the code to the working environment. Docker and Kubernetes are some of the popular tools for automating this process.
  • Monitoring phase: The product is continuously monitored, and the feedback provided is sent back to the planning phase. It creates an infinite cycle.
  • Integration phase: This is the core of DevOps. After passing through several tests, the code is sent for deployment; this is called continuous integration.

Traditional workflow vs. DevOps workflow

There are two popular approaches to proceed with the development; these are Traditional workflow and DevOps workflow. According to the GitLab study, 84% of IT people believe that code releases and delivery have accelerated significantly in the last few years due to DevOps practices.

Now let’s compare both approaches.

Traditional Workflow DevOps Workflow
Lack of communication between development teams and operation teams. Smooth interdepartmental collaboration.
The goals of the development team and operation teams vary. The former focuses on completing the development part, and the latter ensures that the IT and infra parts are functional. Both teams support each other and have a common goal to achieve. They both focus on delivering value to customers with continuous and fast delivery.
Software releases often get postponed. Well-organized scheduled and on-demand releases.
A significant amount of time is required to fix any post-release defects. It is possible to roll back the latest updates automatically, so users are not affected.
There is a lack of testing. Most testing operations are automated as a part of the release procedure.
Teams have a risk-averse mindset. Teams have a risk-aware mindset; they are prepared to fail and recover early with pre-planned strategies.
The traditional way of working does not increase the productivity of businesses and IT teams. DevOps way of working increases the productivity of businesses and IT teams.
Maintenance and upgrade cost is more. Maintenance and upgrade cost is less.
Merge conflicts are more likely to appear after deployment to production and cause irregularities and higher chances of post-production defects. Software changes are automatically merged, tested, and deployed with less effort.
Traditional workflows are biased towards planning big projects that involve a lot of code with bundled releases and jammed production. DevOps workflow believes in taking smaller steps because large batch sizes might become complex and risky. With small projects, releases are frequent and more responsive to the customers.

Benefits Of DevOps

The following are some of the benefits of inheriting the DevOps way of working:

  • DevOps accelerates Innovation: According to the 2018 DORA (DevOps Research and Assessment) reports,companies that have adopted the DevOps way spent almost 66% less time on customer support issues and about 50% less time on customer identified defects and security issues. As a result, they spend more time on new work and innovations, upscale themselves in markets, and provide better products to their customers.
  • DevOps accelerates the business: According to McKinsey Velocity research reports, the top performer of DevOps has increased their revenue four times and increased customer satisfaction.
  • Transparency: DevOps allows transparency in work, which helps ease communication between the teams and lets them be more focused in their specialized fields.
  • Balanced work environment: DevOps practices help to stabilize the work environment. It reduces the tension involved in new releases or fixes that affect the overall productivity.
  • Improved product quality: A good collaboration between the data science and operation team and continuous user feedback leads to improved product quality. A better product not only helps in business growth but also improves customer satisfaction levels.
  • Automated tasks: Unlike the traditional model, DevOps helps in detecting and fixing problems more efficiently. Defects are repeatedly tested automatically, and teams get the bandwidth to frame new ideas.
  • Minimal cost: DevOps provides a collaborative work environment that helps to cut down the management and production costs of the departments as the process of maintenance and working on new updates is brought under a single broader umbrella.
  • Sufficient test and probability of fewer errors: Traditional models lack continuous testing in the development phase. If the product is functionally complex, the tests conducted in the traditional model are not sufficient to properly detect flaws that affect the product’s quality. But in DevOps, there is continuous testing of the product after every stage to avoid post-release bugs.

Challenges: Why is DevOps not suitable for ML Pipelines?

Although DevOps have ample benefits, it is impossible to use DevOps tools to operationalize machine learning. Some of the requirements are specific to machine learning only, these are:

  • Versioning for machine learning: DevOps utilizes code version control to ensure precise documentation of any changes or adjustments made to the developed software. However, in machine learning, the code isn’t the only changing input. Data is another critical input that needs to be taken care of, as will parameters, metadata, logs, and finally, the model.
  • Continuous monitoring: Monitoring is another essential part of good DevOps practices. In recent years, site reliability engineering (SRE) has highlighted the significance of monitoring in the software development lifecycle. The software doesn’t degrade in the monitoring process, but machine learning models do.

After a model is deployed into production, it generates predictions from new data received from the real world. This data continues to change and adapt as per the business environment, resulting in model degradation. MLOps provides procedures that facilitate continuous monitoring and re-training so that the algorithms may continue to be used in production.

  • Hardware requirement: The build time for software projects is significantly less, so the hardware on which it is done is irrelevant, but you need to have high computation for training machine learning models, especially for deep learning models. Larger models can take anywhere from hours to weeks to train, even on most GPU machines cloud vendors offer, meaning that an MLOps setup needs to be much more sophisticated in what kind of machines it can manage.

MLOps: What is DevOps for the ML world?

MLOps is becoming one of the hot concepts in the world of Artificial Intelligence. It stands for Machine Learning Operations and involves data scientists and operations teams working parallel to produce efficient models.

MLOps is DevOps in the context of machine learning. The idea of MLOps has added value to the business by making it faster and more reliable than ever before. Previously, the projects used to get stuck into the experiment phase; they could not make it into production, thus making it difficult for companies to deliver their ML product on time.

MLOps has automated the lifecycle of ML algorithms in production, i.e., from initial training to deployment to re-training the model with new data. Teams can combine their skills, tools, and techniques in machine learning, data engineering, and DevOps. In this way, firms are using the power of MLOps to optimize the performance of ML models.

Let’s discuss how to implement MLOps.

There are three ways to implement MLOps. The choice among them depends on the organization’s size and the number of ML models to run.

They are as follows:

1. Manual process (MLOps level 0): As the name suggests, ML workflows are entirely manual here. This practice is typical for companies that have just started using machine learning.

This way of implementing MLOps is adequate for non-technical companies like insurance agencies and banks who upgrade their models once a year or in any financial crisis.

Below are some characteristics of level 0 MLOps:

  • As discussed above, each step is executed manually. These steps include data preparation, data analysis, model training, and model validation. Even the transition from one phase to another is also manual.
  • There is a disconnect between machine learning and the operations team. The ML team does all the work from the initial data extraction step to the final stage of model registry and hands over the model to the engineering team who deploy the model to their API infrastructure.
  • It isn’t easy to have velocity in model training because of the manual process which leads to infrequent releases.
  • Due to lack of changes and frequent model version deployments, continuous integration and delivery are less considered.
  • It is challenging to determine when to re-train the model since there is no tracking of model performance.

2. ML pipeline automation (MLOps level 1): Machine learning pipelines are automated at this level, as the aim is to achieve continuous training of the ML models. This process includes automating training with new data, model re-training in production, automating data validation and model validation, introducing triggers to initiate pipelines, and storing machine learning model metadata.

Below are some of the characteristics of level 1 MLOps:

  • ML steps are planned and done automatically.
  • The training of models into production is automatically done using new data.
  • There is modularized code for components and pipelines so that the components are reusable and shareable across ML pipelines.
  • Unlike level 0 MLOps, the whole training pipeline is deployed in production and is trained on a fresh set of data present in the production.

3. CI/CD pipeline automation (MLOps level 2): For quick and reliable updates on the ML pipeline, you need a robust automated CI/CD pipeline that allows data scientists to explore new ideas related to hyperparameter tuning, feature engineering, and model architecture.

This level is suitable for companies that re-train their model daily and re-deploy on several servers simultaneously. It will be difficult for these kinds of companies to survive without the practice of MLOps.

The components included in the setup are feature store, model registry, source control, test and build services, deployment services, metadata store, and pipeline orchestrator.

Below are some of the characteristics of level 2 MLOps:

  • Experimentation and development: You repeatedly try out new ML algorithms and models where experimentation steps are planned.
  • Continuous Integration(CI) pipeline: You build source code and perform various tests, and the output of this step is pipeline components (executable, packages, and artifacts) that need to be deployed later.
  • Continuous Development(CD) pipeline: The artifacts produced in the CI stage are deployed to the target environment. The result of this step is a deployed pipeline with the new implementation of the model.
  • Automated triggering: The ML pipeline executes automatically in production based on the schedule or in response to the trigger.
  • Monitoring: Stats are prepared based on the model’s performance on live data. The output of this step executes a new experiment cycle.

MLOps is becoming an upcoming trend in the world of Artificial Intelligence. Let’s look into some of its benefits:

  • Fast innovation through an efficient machine learning lifecycle: MLOps solution, or DevOps for machine learning, makes collaboration possible for data processing teams, ML professionals, and IT engineers. It increases the speed of model development and deployment with the help of monitoring, validation, and management systems for machine learning models.
  • Optimize team’s productivity: MLOps allows integration with ongoing workflows to provide intelligible roles and reduce time wastage and obstacles between the project groups. It permits constant access to monitor and report on existing projects to make timely decisions.
  • Create reproducible workflows and models: MLOps uses the concept of dataset registries and advanced model registries for tracking resources. It provides improved traceability by tracking code, data, and metrics in the execution log. It also helps in сreating machine learning pipelines to design, deploy, and administer reproducible model workflows for consistent model delivery.
  • Easy model deployment: MLOps allows for the deployment of high-precision models with speed and confidence. It uses automatic scaling, managed clusters of CPUs and GPUs with distributed learning in the cloud. Additionally, it packs models quickly, ensuring high quality at every step through profiling and model validation.

Model-centric vs. data-centric approach

The AI system is the combination of code (model/algorithm) and data. These components go hand in hand to produce the desired results. Based on this, the data science community is split into two approaches: model-centric and data-centric. Let’s first discuss what they are.

Model-centric approach: As the name says, it focuses on how we can change the model to improve the performance of the AI system. This includes choosing the correct model architecture among a vast set of possibilities.

Data-centric approach: In this approach, the dataset is modified to improve the AI system. This part of improvement is usually overlooked and is treated as an off task.

Model-centric approach Data-centric approach
In this approach, the way to a solution is through the model. In this approach, the way to a solution is through the dataset.
Depending on the problem, model-centric scientists make their data fit the model using feature engineering. The goal of data-centric scientists is to find better data rather than finding a robust model or better feature engineering.
If the existing model fails, scientists develop a new model that can solve the problem. Scientists try to analyze the data origin and what might be missing in the existing datasets.
For being model-centric, you can collect as much data as you can. In data-centric, data consistency is the key.
You optimize the model to deal with the noisy data. It would help if you made a higher investment in data quality tools rather than collecting more data.
In model-centric, data is fixed after standardized pre-processing. In data-centric, code and algorithm are fixed.

Big data or good data: Why we need to shift from model to data-centric

While working on the complex machine learning problems, data scientists have the question, does data quality matter more than quantity?

There are various data science problems where a lot of data is unavailable, like healthcare, agriculture, etc. But that doesn’t mean that these use cases cannot be considered in data science. It all depends on how meaningful the data is. Sometimes it’s better to have a good quality of data than quantity. It all depends on your use case.

There are scenarios where having more data increases noise, but having correct hyperparameters and model selection for good quality data can help achieve generalizable results; you must have vast knowledge in these domains.

The choice between big data and sound data depends on various parameters. Big data needs more space and processing power; this increases the cost. On the other hand, good data and fewer data are sometimes insufficient to train and model the problem to be solved.

A well-known AI pioneer, Andrew Ng, recently talked about the need to shift from a model-centric to a data-centric approach. He added that this approach would have a dramatic impact on ML models.

Data has a more significant stake in AI/ML development, and quality data is scarce, noisy, and very expensive to obtain. There has been a consistent focus on model training steps for a long time and less on how your data is.

To adopt the data-centric approach, data scientists have some crucial questions like, do you have complete data to work on? Is the existing data suitable for your use-case? If the labels are available, are they consistent? How is the quality of data? What should you prefer, big data or sound data?

One cannot answer the questions mentioned above in a single stage of ML development. They will be answered as you proceed with the ML steps.

For practicing the data-centric approach, one needs to focus on three significant aspects of data; let’s discuss them one by one:

  1. Volume: The amount of data you have is essential. Yann Lecun, a French computer scientist, once asked, “How can humans learn to drive a car in about 20 hours of practice with very little supervision, while fully autonomous driving still eludes our best AI systems trained with thousands of hours of data from human drivers?” Deep networks have low bias and high variance, and we believe that more data solves the variance issue. But the high volume of data can be costly. Before jumping on new data, it is essential to consider what kind of data needs to be added.
  2. Consistency: This is another crucial aspect of data. Having inconsistent data can derail your model and can make the results unreliable. Andrew Ng showed an example in his talk. The image is shown below:

word image 43

Source

The example highlights how inconsistency in human labeling can negatively influence the dataset. The annotations are not wrong; they are inconsistent and can confuse the ML algorithm. Here are some recommendations: Annotate a small sample of the dataset yourself before formulating instructions to better understand possible errors that an annotator can make. It is preferable to review an arbitrary batch of annotated data to make sure everything is as expected. It is recommended to get the data annotated by several people if you find inconsistency after reviewing the instructions and use the majority vote as ground truth.

3. Quality: The quality of data directly affects the performance of the

machine learning algorithm. The data you have must fulfill your expectations by covering all the variations that production data will present. Some of the common flaws in datasets that you need to take care of are:

  • Lack of variation: When the data attributes fail to vary to a sufficient level, a neural network can overfit that attribute’s distribution and fail to generalize well.
  • False correlation: A neural network may fail to learn the intended solution when the data attributes are associated with a level. For example, if an animal stands in grass, a model might learn to associate the background with the grass.

How MLOps helps us to have a data-centric environment

Studies show that between 85-95% of the machine learning projects never make it to production. What accounts for this failure rate? Despite most people agreeing that data is vital, much of the research in AI is often on the latest algorithms and technologies.

A famous AI pioneer, Andrew Ng, did an informal survey of the recently published papers on ArXiv and found that approximately 99% of the articles were on machine learning models, and only 1% were on ML data. He asserts that a data-centric approach leads to faster and better improvements in model performance. He shared an example of a model detecting steel sheet for defects and commented that a model-centric focus on code did not increase model performance. Still, a data-centric approach to improving data quality increased the model performance from 76.2% to 93.1%.

MLOps assist in having a data-centric approach. An essential task of MLOps is to make high-quality data available through all the stages of the machine learning lifecycle. Andrew sees MLOps as an emerging field for data scientists, machine learning engineers, software engineers, domain experts, and many more MLOps roles to be created in the upcoming years. The quality of data will fuel the ML revolution.

He also believes that the growth of the machine learning operations (MLOps) field will be critical to popularizing efficient, systematic, data-centric AI practices. Whether it is identifying the correct dataset for the current problem statement, enforcing standards for data labeling, deciding when to gather additional data for training, or refining the datasets when machine learning projects reach actual production, MLOps teams should ensure that high-quality data is available at every step of a machine learning project cycle.

MLOps tools that support a data-centric approach are elementary to significant advancements and allow the companies to grasp valuable insights from their datasets, no matter how small or functional they are. The MLOps tools that do this well will be critical in providing a standard for companies of all sizes to use machine learning to make the world a better place.

Conclusion

DevOps for ML has become a need for any real-world data science project to drive business value in production. It is impossible to use standard software engineering DevOps tools because you need to track several intrinsically new concepts in DevOps for ML. The practices are distinct from the main DevOps because the ML development lifecycle and artifacts are different. For the smooth functioning of a pipeline, the MLOps practice is a perfect choice. It is becoming an emerging area and allows data scientists and other experts to explore new ideas. Companies have started adopting this kind of workflow to create successful businesses.

References

Following are some of the references that discuss why there is a need for DevOps in machine learning.

  1. https://www.tecton.ai/blog/devops-ml-data/
  2. https://www.businessprocessincubator.com/content/why-ml-data-devops-is-needed/
  3. https://dotscience.com/blog/2019-10-21-devops-for-ml/
  4. https://www.altexsoft.com/blog/mlops-methods-tools/https://techcrunch.com/sponsor/microsoftazure/why-firms-are-welcoming-mlops-into-the-fold-of-software-development/
  5. https://www.forbes.com/sites/janakirammsv/2018/11/04/the-growing-significance-of-devops-for-data-science/?sh=7f2b3f7a7481
  6. https://www.analyticsvidhya.com/blog/2021/04/bring-devops-to-data-science-with-continuous-mlops/
Nov 12th 2021
read

Share this post

Try Layer for free

Get started with Layers Beta

Start Free