What is a model registry?

Model registry, a part of MLOps, helps track, govern, and monitor ML artifacts at different stages of the machine learning lifecycle. It’s an associative hub where data science teams can collaborate on model development. A model registry improves workflow performance and standardizes deployments. The model registry has eased the way MLOps works, from the experimentation […]
Oct 13th 2021
read

Share this post

Oct 13th 2021
read

Share this post

What is a model registry?

Neetika Khandelwal

We share blogs from our research team.

Model registry, a part of MLOps, helps track, govern, and monitor ML artifacts at different stages of the machine learning lifecycle. It’s an associative hub where data science teams can collaborate on model development. A model registry improves workflow performance and standardizes deployments.

The model registry has eased the way MLOps works, from the experimentation phase to the production phase. The model registry can manage the details of the model artifacts at any stage. There are several model registry tools. We have discussed them in this blog, along with other general information for the model registry.

What is MLOps?

MLOps (Machine Learning Operations) is a strategy used in machine learning for model delivery. It helps organizations in scaling model productivity to get faster results and generate valuable insights for business growth.

You must have heard the term DevOps; its primary goal is to coordinate developers and the operation team. It enables the production of software in no time and monitors it smoothly in the future. DevOps consists of tools that allow developers to focus on the root cause of any issue.

Data in machine learning is real-time data that keeps changing with time, and the challenge is to connect these two dots, i.e., data and code. So, MLOps is an intersection of DevOps (aligning with the operation team), Machine Learning (consisting of ML models), and Data Engineering (managing real-time data). Its purpose is to support Continuous Integration, Continuous Development, and delivery of machine learning models into production at scaling. Thus, MLOps facilitates the journey from experimentation to the production of an ML model.

Using MLOps, data scientists get the tinch of an organizational way of working with measurable benchmarks. The practices which come under MLOps are:

  • Hybrid Teams,
  • ML Pipeline,
  • Model and Data Versioning,
  • Model Validation, Data Validation, and
  • Monitoring.

Since machine learning models are advancing daily, there is a need to increase their operation process. We can achieve this using Machine Learning Operations (MLOps). The Model Registry, a crucial part of MLOps, will be the key focus of this piece.

Model registry

Traditional development of an ML model comes with issues like:

  • version control,
  • collaborative environment, and
  • standardized deployment.

Developers try to solve these problems using tools that may be temporarily effective. What if there is a permanent solution for this?

Here comes the concept of the model registry. ML models are registered and governed with the help of a model registry. It can be done using various tools like Layer, MLflow, SageMaker, KubeFlow, etc. We will discuss these in detail later.

Therefore, the model registry is a collaboration center where a company:

  • shares the ML models,
  • does experimentation for online testing and production,
  • monitors model deployment and its future performance.

The model registry manages the whole lifecycle of the machine learning artifact. Therefore, it is one of the most critical components of MLOps.

Why do we need a model registry?

The model registry reduces the effort of governing activities of the ML model. In addition, it helps data scientists in dealing with ML artifacts and tracking them.

Below are some of the reasons why we need a model registry:

  1. Model registries provide an aerial view of all the ML models. This aerial view is present as a dashboard where details like model version, model stage, etc., are present. Therefore, it becomes easy to filter out the models based on our preferences.
  2. Suppose the team has added a new model in place of the existing model without updating the version. This would cause difficulty in tracking the model. With a model registry, it is possible to follow a specific version number with each change in the model.
  3. A model registry allows you to associate structured and unstructured data with each model. This metadata includes model descriptions and annotations. This helps in comparing the ML models.
  4. Different models may have different dependencies. These dependencies can be tracked using a model registry. As a result, it becomes easy to deploy models in the correct environment.
  5. Implementing a trial becomes difficult in case something goes wrong. However, with a model registry, it is possible to track:
  • what happened with each generated prediction,
  • which model generated the prediction, and
  • the version of the model when the prediction was generated.

What problems does the model registry solve?

The model registry has proved advantageous in handling ML models. It has solved various pitfalls that were affecting the process and resulting in costly mistakes. These are listed below:

  1. It allows the labeling of model artifacts. Previously, models were not labeled. Therefore, it was difficult to track which model has come from which training job. Previously, model details were shared over emails and messages, leading to leakage of confidential model information.
  2. The model registry makes it easier to compare different models. For executing a task, models go through many iterations. After every iteration, a new version of the model is created. Without a model registry, the performances would be stored in different locations. It, therefore, became difficult to make comparisons among other versions.
  3. The model registry maintains records for mapping between datasets and models. Previously, teams had no idea which dataset is related to which model.
  4. Sometimes the model produces inaccurate results, and teams lose track of the source code and version used to train that model. It was hard to analyze the model that caused an error, due to which the efforts got doubled. With a model registry, it becomes easy to understand the issue caused by erroneous models and produce effective results.

Model registry best practices

A model registry is a kind of data warehouse for all the ML models and artifacts. It has simplified the research and development of machine learning engineers and data scientists.

Let’s now take a look at some model registry best practices:

  1. Sanity checks all externally sourced data: Data collected from various sources may be ill-formatted or incomplete, so it is crucial to verify the quality of the data.
  2. Ensure that the data is appropriately labeled, balanced, and well distributed: Labelling the data is a vital step in the proper functioning of the algorithm. Incorrect labeling may lead to noise and won’t provide optimal results.
  3. Machine learning code should be peer-reviewed to ensure quality: Bugs can quickly occur in the code during the development phase. Peer review is a technique where team members review the code among themselves. To avoid errors and make debugging easier, use this practice.
  4. Use a collaborative development environment: Collaborative environments include GitHub, GitLab, and Azure DevOps Server. They allow storage of larger datasets, versions and handle model deployments. For instance, you can deploy and serve your models using the Layer Model Catalog.
  5. Use continuous integration and continuous development pipelines: Sometimes, small changes in codes can induce problems in the application as a whole. We can run automated build scripts each time code is committed to the versioning repository to detect these issues.
  6. Prevent discriminatory attributes in model features: If this practice is not adhered to, the resulting models may base their decisions on these attributes and ultimately affect results.
  7. Share a clear objective with the team members to understand the ML model: Any misunderstanding and miscommunication may diverge the team from the aim.
  8. Automate model deployment: This includes packing models and their dependencies and passing them to the production server instead of manually connecting to the production server.
  9. Perform continuous checks of models: It is better to resolve the errors in the early phase of the training process. It will also avoid the wastage of cost and resources.
  10. Log the model’s predictions with the input data and model versions: This practice helps trace decisions back to the input data and model versions.
  11. Conduct internal risk assessments to avoid future discrepancies: It allows identifying the errors and negative impacts as early as possible.
  12. Store all the previous information so that previous experiments can be re-assessed: You may need to access old information to get the current things resolved; it is, therefore, essential to have prior records.
  13. Monitor the behavior of deployed models: Performance of the training data and the data in production may vary significantly, so it’s crucial to continuously monitor deployed models’ behavior.
  14. Using secure channels for discussions: There should be a secure channel for communication between users and developers if users raise any concerns.
  15. Ensure the security of the ML models: Every application gets exposed to an external interface or contains some sensitive data, so it’s essential to keep this data safe from attackers.

Top model registry tools

Various tools help manage a machine learning model. No tool is perfect; the secret is to choose wisely. Below are some of the top model registry tools.

MLflow model registry

word image 283

MLFlow Model Registry | Source

MLflow consists of a set of API and UI tools for managing the flow of ML models. Here are some of the concepts that manage the MLflow model lifecycle.

  1. Model: An MLflow model is produced from an experiment or runs logged with the model flavor’s `mlflow.<model-flavor>.log_model` method. The model gets registered once it’s logged.
  2. Registered model: A registered model has a unique name, version, transition stages, and respective metadata.
  3. Model version: Every registered model has a version associated with it. This version gets updated each time a new model is added to the repository with the same name as an existing model.
  4. Model stage: MLflow has three steps by default, these are Staging, Production and Archived. A model version is associated with one stage at a time, but it is possible to change a model version from one to another.
  5. Description: It is possible to annotate a model’s version and its description and other relevant information like methodologies and deployed datasets.
  6. Activities: The activities of the registered models are recorded. These activities may include stage transitions. It enables the model evolution from experimentation to production to be done quickly.

ModelDB model registry

ModelDB is an open-source library that versions ML models and all their elements.

word image 284

ModelDB Model Registry | Source

Below are some points about the use of ModelDB in the context of the model registry.

  1. It tracks the whole lifecycle of an ML model very efficiently.
  2. Apart from version control, it also provides logging and a central dashboard.
  3. It has a robust backend system that can run Docker containers, storage systems and handle integrations with other ML tools.
  4. ModelDB provides an end-to-end productive way of handling your ML experiment, including real-time monitoring, model developments, and deployments.
  5. With continuous monitoring, it becomes easy to gain valuable insights and get productive results.
  6. It has a rich user interface hence making it easier to use.

Azure model registry

Azure is a machine learning cloud platform that helps in automating the lifecycle of machine learning models.

word image 285

Azure model registry | Source

Here are the characteristics of the Azure model registry:

  1. It creates logical ML pipelines for your model.
  2. Software settings can be reused for training and deploying models.
  3. Model registry and deployment are performed efficiently.
  4. Data is maintained for the whole lifecycle of an ML model. The data can be used for monitoring the operations and any future issues.

Amazon SageMaker

Amazon SageMaker is a controlled machine learning service. It allows data scientists to build and train ML models and deploy them directly into a production-ready hosted environment.

word image 286

SageMaker Model Registry | Source

Amazon SageMaker comes with a Jupyter Notebook so that things can get easily integrated into SageMaker resources. It also provides optimized machine learning algorithms and can run effectively for larger datasets in distributed environments.

Amazon SageMaker model registry has the following characteristics:

  • Categorize models for production
  • Version management of models
  • Stores metadata associated with each model
  • Allows deploying of ML models to production
  • Automates the deployment process
  • Enables the viewing of deployment history

Models are categorized among various groups based on their versions. Then, each trained model gets registered, and the model registry adds it to a model group as a new model version. Below are the steps for registering a model with Amazon SageMaker:

  • Create a model
  • Create an ML pipeline to train the model
  • With each run of the ML pipeline, a new version of the model is added by the model registry to model groups.

KubeFlow model registry

KubeFlow is an open-source platform for making deployments of machine learning projects on Kubernetes. It makes the deployment scalable, simpler, and portable.

word image 287

KubeFlow Model Registry | Source

KubeFlow enables scaling ML models and takes them from the starting phase, i.e., experimentation, to production, very smoothly. Deployments are portable and straightforward and take place on diverse infrastructure. Scaling the ML models is done on demand. Model versions are maintained very easily in KubeFlow. KubeFlow components contribute to its model registry. These are as follows:

Central Dashboard is the central user interface of KubeFlow. It provides quick access to the other components.

Notebook Servers enables the use of a Jupyter notebook in KubeFlow so that it becomes easy to share the notebooks across the organization and integrate it with the rest of the infrastructure in the toolkit. In addition, users can create them directly into the cluster instead of making them locally.

KubeFlow Pipelines help in tracking and managing jobs and enable scheduling multi-step ML workflows.

KF Serving is used for model deployments.

Katib is a native project of Kubernetes for AutoML. It supports early stopping and hyperparameter tuning.

Model registry on Layer

Layer is an MLOps platform that aids ML teams of various companies to produce machine learning applications based on code.

It is a declarative model that describes what to accomplish rather than describing how to achieve it. Layer only needs dataset, features, and ML model definitions as input to build your entities. The main focus is designing, developing, and deploying the models rather than worrying about the infrastructure.

ML Models are first-class entities in Layer. They are integral to and built within a Layer Project. They are versioned and stored in the Layer Model Catalog.

word image 288

Layer Model Registry | Source

Model registry on Layer is carried out using the Layer Model Catalog. It comes up with centralized and indexed storage space for ML artifacts. It helps secure the versions of the model artifacts, allowing data teams to continuously manage and monitor the lifecycle of the ML models in production.

Layer Model Catalog can perform the following actions:

  1. Training at scale: You can leverage Layer Data Catalog to reuse high-quality training data from FeatureSets and Datasets to train your models at scale.
  2. Auto versioning: It has a built-in ML model versioning feature that enables faster and effective experimentation of models.
  3. Model testing: Automated model testing can be performed by developing your unit and backtests.
  4. Performance monitoring: Powerful observability on your ML models throughout their lifecycle. Track your parameters or advanced metrics or develop your own business KPIs to attribute how your model impacts your business.
  5. Hyperparameter tuning: Your models can be improved using parallelized and intuitive hyperparameter tuning using Layer.
  6. Deep integration: Layer can deploy your trained models to your model hosting solution, whether AWS SageMaker or a Kubernetes cluster.

Advantages of using Layer Model Catalog:

  • Quick experimentations
  • Duplicable pipelines
  • Powerful observability
  • Robust and reliable productionization

ML Models in Layer

All models can be defined in a directory, with a `model.yml` file at the root linked to one or more Python files. Models have the following basic layout:

models/ 
├── churn_model/ 
│ ├── model.yml 
│ ├── model_source_code.py

Models are configured in a `model.yml` file, which looks like this:

# required. this is used to make sure backward-incompatible changes 
# in config format do not break layer CLI 
apiVersion: 1 

# required. 
name: my_layer_model 

# optional. 
description: "My layer model description" 

# required. used to determine how to train this model 
training: 
- name: my_layer_model_training 
description: "My Layer Model Training" 
entrypoint: my_layer_model.py 
environment: requirements.txt

You can train your models using Python files referred from the ‘model.yml’ file.

This Python code defines a `train_model` function that takes a `train` argument and a series of `feature set arguments. You can train your model within this function.

While training your model, you can use `train.log_parameter` and `train.log_metric` to save parameters and metrics of your training runs which can be viewed in Layer Model Catalog UI.

You can also use `train.register_input` and `train.register_output` to define the model signature, which can then be used for determining the data lineage of this model.

Here’s an example of a model training code outline:

from typing import Any 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import average_precision_score 
import xgboost as xgb 
from layer import Featureset, Train # imported 

def train_model(train: Train, tf: Featureset("transaction_features")) -> Any: 
# We create the training and label data 
train_df = tf.to_pandas() 
X = train_df.drop(["unused_features"], axis=1) 
Y = train_df["labeled_data"] 
random_state = 13 
test_size = 0.2 
train.log_parameter("random_state", random_state) 
train.log_parameter("test_size", test_size) 
trainX, testX, trainY, testY = train_test_split(X, Y, test_size=test_size, 
random_state=random_state) 

# Here we register the input & output of the train. Layer will use 
# this registers to extract the signature of the model and calculate 
# the drift 
train.register_input(trainX) 
train.register_output(trainY) 
max_depth = 3 
objective = 'binary:logitraw' 
train.log_parameter("max_depth", max_depth) 
train.log_parameter("objective", objective) 
# Train model 
param = {'max_depth': max_depth, 'objective': objective} 
dtrain = xgb.DMatrix(trainX, label=trainY) 
model_xg = xgb.train(param, dtrain) 
dtest = xgb.DMatrix(testX) 
preds = model_xg.predict(dtest) 
# Since the data is highly skewed, we will use the area under the 
# precision-recall curve (AUPRC) rather than the conventional area under 
# the receiver operating characteristic (AUROC). This is because the AUPRC 
# is more sensitive to differences between algorithms and their parameter 
# settings rather than the AUROC (see Davis and Goadrich, 
# 2006: http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf) 
auprc = average_precision_score(testY, preds) 
train.log_metric("auprc", auprc) 
# Return the model 
return model_xg

This is how you define your model with Layer.

Final thoughts

This blog focused on how a model registry improves the performance of ML models. It is the critical component in the MLOps framework. It has improved the way models are managed and deployed using collaborative approaches through various tools. Every tool has its features, and the objective is to choose based on your requirements.

The model registry has changed the way Data Scientists and developers deal with ML models. It has shifted the processes from manual to automatic. Things have got optimized with this component as a part of MLOps. But learning does not stop here. Now implement the strategy of model registry practically using the Layer platform. Keep advancing !!

Resources
Following are the resources that could help you with a better and deep understanding of model registry:

  1. https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-version.html
  2. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli
  3. https://docs.databricks.com/applications/mlflow/model-registry.html#model-registry-concepts
  4. https://opendatascience.com/simplifying-mlops-with-model-registry/
  5. https://www.mlflow.org/docs/latest/model-registry.html#model-registry-workflows
  6. https://medium.com/@ODSC/what-are-mlops-and-why-does-it-matter-8cff060d4067
  7. https://mlinproduction.com/model-registries-for-ml-deployment-deployment-series-06/
  8. https://adatis.co.uk/mlflow-introduction-to-model-registry/
  9. https://neptune.ai/blog/best-mlops-tools
  10. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
Oct 13th 2021
read

Share this post

Try Layer for free

Get started with Layers Beta

Start Free