How to package and distribute ML models with MLFlow

One of the fundamental activities during each stage of the ML model life cycle development is collaboration. Taking an ML model from its conception to deployment requires participation and interaction between different roles involved in constructing the model. In addition, the nature of ML model development involves experimentation, tracking of artifacts and metrics, model versions, […]
Oct 13th 2021

Share this post

Oct 13th 2021

Share this post

How to package and distribute ML models with MLFlow

Fernando López

We share blogs from our research team.

One of the fundamental activities during each stage of the ML model life cycle development is collaboration. Taking an ML model from its conception to deployment requires participation and interaction between different roles involved in constructing the model. In addition, the nature of ML model development involves experimentation, tracking of artifacts and metrics, model versions, etc., which demands an effective organization for the correct maintenance of the ML model life cycle.

Fortunately, there are tools for developing and maintaining a model’s life cycle, such as MLflow. In this article, we will break down MLflow, its main components, and its characteristics. We’ll also offer examples showing how MLflow works in practice.

What is MLflow?

MLflow is an open-source tool for the development, maintenance, and collaboration in each phase of the life cycle of an ML model. Furthermore, MLflow is a framework-agnostic tool, so any ML / DL framework can quickly adapt to the ecosystem that MLflow proposes.

MLflow emerges as a platform that offers tools for tracking metrics, artifacts, and metadata. It also provides standard formats for packaging, distributing, and deploying models and projects.

MLflow also offers tools for managing model versions. These tools are encapsulated in its four main components:

  • MLflow Tracking,
  • MLflow Projects,
  • MLflow Models and
  • MLflow Registry.

MLflow Tracking

MLflow Tracking is an API-based tool for logging metrics, parameters, model versions, code versions, and files. MLflow Tracking is integrated with a UI for visualizing and managing artifacts, models, files, etc.

Each MLflow Tracking session is organized and managed under the concept of runs. A run refers to the execution of code where the artifact log is performed explicitly.

MLflow Tracking allows you to generate runs through MLflow’s Python, R, Java, and REST APIs. By default, the runs are stored in the directory where the code session is executed. However, MLflow also allows storing artifacts on a local or remote server.

MLflow Model

MLflow Models allow packaging machine learning models in a standard format to be consumed directly through different services such as REST API, Microsoft Azure ML, Amazon SageMaker, or Apache Spark. One of the advantages of the MLflow Models convention is that the packaging is multi-language or multi-flavor.

For packaging, MLflow generates a directory with two files, the model and a file that specifies the packaging and loading details of the model. For example, the following code snippet shows an MLmodel file where the flavor loader is specified as well as the `conda.yaml` file that defines the environment.

artifact_path: model 
env: conda.yaml 
loader_module: MLflow.sklearn 
model_path: model.pkl 
python_version: 3.8.2 
pickled_model: model.pkl 
serialization_format: cloudpickle 
sklearn_version: 0.24.2 
run_id: 39c46969dc7b4154b8408a8f5d0a97e9 
utc_time_created: '2021-05-29 23:24:21.753565'

MLflow Project

MLflow Projects provides a standard format for packaging, sharing, and reusing machine learning projects. Each project can be a remote repository or a local directory. Unlike MLflow Models, MLflow Projects aims at the portability and distribution of machine learning projects.

An MLflow Project is defined by a YAML manifest called `MLProject`, where the project’s specifications are exposed.

The key features for the implementation of the model are specified in the MLProject file. These include:

  • the input parameters that the model receives,
  • the data type of the parameters,
  • the command for executing the model, and
  • the environment in which the project runs.

The following code snippet shows an example of an MLProject file where the model to implement is a decision tree whose only parameter is the depth of the tree and whose default value is 2.

name: example-decision-tree 
conda_env: conda.yaml 
tree_depth: {type: int, default: 2} 
command: "python {tree_depth}"

Likewise, MLflow provides a CLI to run projects located on a local server or a remote repository. The following code snippet shows an example of how a project is run from a local server or a remote repository:

$ mlflow run model/example-decision-tree -P tree_depth=3 
$ mlflow run -P tree_depth=3

In both examples, the environment will be generated based on the `MLProject file` specification. The command that triggers the model will be executed under the arguments passed on the command line. Since the model allows input parameters, these are assigned through the `-P` flag. In both examples, the model parameter refers to the maximum depth of the decision tree.

By default, a run like the one shown in the example will store the artifacts in the `.mlruns` directory.

How to store artifacts in an MLflow Server?

One of the most common use cases when implementing MLflow is using MLflow Server to log metrics and artifacts. The MLflow Server is responsible for managing the artifacts and files generated by an MLflow Client. These artifacts can be stored in different schemes, from a file directory to a remote database. For example, to run an MLflow Server locally, we type:

$ mlflow server

The above command will start an MLflow service through the IP address To store artifacts and metrics, the tracking URI of the server is defined in a client session

In the following code snippet, we will see the basic implementation of artifact storage in an MLflow Server:

import MLflow 
remote_server_uri = "" 
with MLflow.start_run(): 
MLflow.log_param("test-param", 1) 
MLflow.log_metric("test-metric", 2)

The `MLflow.set_tracking_uri ()` command sets the location of the server.

How to run authentication in an MLflow Server?

Exposing a server with no authentication can be risky. Therefore, it is convenient to add authentication. Authentication will depend on the ecosystem in which you will deploy the server:

  • on a local server, it is enough to add a basic authentication based on user and password,
  • on a remote server, credentials must be adjusted coupled with respective proxies.

For illustration, let’s look at an example of an MLflow Server deployed with basic authentication (username and password). We will also see how to configure a client to make use of this server.

Example: MLflow Server authentication

In this example, we apply basic user and password authentication to the MLflow Server through an Nginx reverse proxy.

Let’s start with the installation of Nginx, which we can do in the following way:

# For darwin based OS

$ brew install nginx

# For debian based OS

$ apt-get install nginx

# For redhat based OS

$ yum install nginx

For Windows OS, you have to use the native Win32 API. Please follow the detailed instructions here.

Once installed, we will proceed to generate a user with its respective password using the `htpasswd` command, which is as follows:

$ sudo htpasswd -c /usr/local/etc/nginx/.htpasswd MLflow-user

The above command generates credentials for the user `mlflow-user` in the `.htpasswd` file of the nginx service. Later, to define the proxy under the created user credentials, the configuration file `/usr/local/etc/nginx/nginx.conf` is modified, which by default has the following content: :

server { 
listen 8080; 
server_name localhost; 
# charset koi8-r; 
# access_log logs/host.access.log main; 
location / { 
root html; 
index index.html index.htm; 

which has to look like this:

server { 
# listen 8080; 
# server_name localhost; 

# charset koi8-r; 

# access_log logs/host.access.log main; 

location / { 
proxy_pass http://localhost:5000; 
auth_basic "Restricted Content"; 
auth_basic_user_file /usr/local/etc/nginx/.htpasswd; 

We are defining an authentication proxy for localhost through port 5000. This is the IP address and port number where MLflow Server is deployed by default. When using a cloud provider, you must configure the credentials and proxies necessary for the implementation. Now initialize the MLflow server as shown in the following code snippet:

$ MLflow server --host localhost

When trying to access http://localhost in the browser, authentication will be requested through the username and password created.

word image 213

Figure 1. Login

Once you have entered the credentials, you will be directed to the MLflow Server UI.

word image 214

Figure 2. MLflow Server UI

To store data in MLflow Server from a client, you have to:

  • define the environment variables that will contain the credentials to access the server and
  • set the URI where the artifacts will be stored.

So, for the credentials, we are going to export the following environment variables:

$ export MLflow_TRACKING_USERNAME=MLflow-user 
$ export MLflow_TRACKING_PASSWORD=MLflow-password

Once you have defined the environment variables, you only need to define the server URI for the artifact storage.

import MLflow

# Define MLflow Server URI 
remote_server_uri = "http://localhost" 

with MLflow.start_run(): 
MLflow.log_param("test-param", 2332) 
MLflow.log_metric("test-metric", 1144)

When executing the code snippet above, we can see the test metric and parameter reflect on the server.

word image 215

Figure 3. Metrics and parameters stored from a client service with authentication on the server.

How to register an MLflow Model?

One of the everyday needs when developing machine learning models is to maintain order in the versions of the models. For this, MLflow offers the MLflow Registry.

The MLflow Registry is an extension that helps to:

  • manage versions of each MLModel and
  • record the evolution of each model in three different phases: archive, staging, and production. It is very similar to the git version system.

There are four alternatives for registering a model:

  • through the UI,
  • as an argument to `MLflow.<flavor> .log_model()`,
  • with the `MLflow.register_model()` method or
  • with the `create_registered_model()` client API.

In the following example, the model is registered using the `MLflow.<flavor> .log_model()` method:

with MLflow.start_run(): 

model = DecisionTreeModel(max_depth=max_depth) 

MLflow.log_param("tree_depth", max_depth) 
MLflow.log_metric("precision", model.precision) 
MLflow.log_metric("recall", model.recall) 
MLflow.log_metric("accuracy", model.accuracy) 

# Register the model 
MLflow.sklearn.log_model(model.tree, "MyModel-dt", registered_model_name="Decision Tree")

If it is a new model, MLFlow will initialize it as Version 1. If the model is already versioned, it will be initialized as Version 2 (or subsequent version).

By default, when registering a model, the assigned status is none. To assign a status to a registered model, we can do it in the following way:

client = MLflowClient() 
name="Decision Tree", 

In the above code snippet, version 2 of the Decision Tree model is assigned to the Staging state. In the server UI, we can see the states as shown in Figure 4:

word image 216

Figure 4. Registered Models

To serve the model we will use the MLflow CLI, for this we only need the server URI, the model name, and the model status, as shown below:

$ export MLflow_TRACKING_URI=http://localhost 
$ mlflow models serve -m "models:/MyModel-dt/Production"

Once the model is served, POST requests can be sent to make predictions. For this example, it would be as follows:

$ curl http://localhost/invocations -H 'Content-Type: application/json' -d '{"inputs": [[0.39797844703998664, 0.6739875109527594, 0.9455601866618499, 0.8668404460733665, 0.1589125298570211]}'


In the previous code snippet, a POST request is made to the address where the model is served. An array that contains five elements has been passed in the request, which is what the model expects as input data for the inference. The prediction, in this case, turned out to be 1.

However, it is important to mention that MLFlow allows defining the data structure for inferring in the `MLmodel` file through the implementation of signatures. Likewise, the data passed through the request can be of different types, which can be consulted here.

The full implementation of the previous example can be found here:

MLflow Plugins

Due to the framework-agnostic nature of MLflow, MLflow Plugins emerged. Its primary function is to extend the functionalities of MLflow in an adaptive way to different frameworks.

MLflow Plugins allow customization and adaptation of the deployment and storage of artifacts for specific platforms.

For example, there are plugins for a platform-specific deployment:

On the other hand, for the management of MLflow Projects, we have MLflow-yarn, a plugin for managing MLProjects under a Hadoop / Yarn backed. For the customization of MLflow Tracking, we have MLflow-elasticsearchstore, which allows the management of the MLFlow Tracking extension under an Elasticsearch environment.

Likewise, specific plugins are offered for deployment in AWS and Azure. They are:

It is essential to mention that MLflow provides the ability to create and customize plugins according to needs.

MLflow vs. Kubeflow

Due to the increasing demand for tools to develop and maintain the life cycle of machine learning models, different alternatives such as MLflow and KubeFlow have emerged.

As we have already seen throughout this article, MLflow is a tool that allows collaboration in developing the life cycle of machine learning models, mainly focused on tracking artifacts (MLflow Tracking), collaboration, maintenance, and versioning of the project.

On the other hand, there is KubeFlow, which, like MLflow, is a tool for developing machine learning models with some specific differences.

Kubeflow is a platform that works on a Kubernetes cluster; that is, KubeFlow takes advantage of the containerization nature of Kubernetes. Also, KubeFlow provides tools such as KubeFlow Pipelines, which aim to generate and automate pipelines (DAGs) through an SDK extension.

KubeFlow also offers Katib, a tool for optimizing hyperparameters on a large scale and provides a service for management and collaboration from Jupyter notebooks.


Specifically, MLflow is a tool focused on management and collaboration for the development of machine learning projects. On the other hand, Kubeflow is a platform focused on developing, training, and deploying models through a Kubernetes cluster and the use of containers.

Both platforms offer significant advantages and are alternatives for developing, maintaining, and deploying machine learning models. However, it is vital to consider the barrier to entry for the use, implementation, and integration of these technologies in development teams.

Since Kubeflow is linked to a Kubernetes cluster for its implementation and integration, it is advisable to have an expert for managing this technology. Likewise, developing and configuring pipeline automation is also a challenge that demands a learning curve, which under specific circumstances may not be beneficial for companies.

In conclusion, MLflow and Kubeflow are platforms focused on specific stages of the life cycle of machine learning models. MLflow is a tool with a collaboration orientation, and Kubeflow is more oriented to take advantage of a Kubernetes cluster to generate machine learning tasks. However, Kubeflow requires experience in the MLOps part. One needs to know about the deployment of services in Kubernetes, which can be an issue to consider when trying to approach Kubeflow.

The new paradigm: Layer

Layer emerges as a disruptive platform for the maintenance and development of the life cycle of machine learning models. Layer is a Declarative MLOps (DM) platform that reduces the entry barrier for Data Scientists, Machine Learning Engineers, and Data Analysts to MLOps tasks. Layer is a platform where you define what you want to obtain and not how you want to obtain it.

One significant challenge when developing large-scale machine learning models is the definition and automation of pipelines. With Layer, that barrier is eliminated since it takes care of the orchestration and deployment of the components of the machine learning model.

The development environment in Layer revolves around the definition of a Layer project. A Layer Project is a directory that contains the project definition, data, and model. Each component of a Layer Project is described in YAML files. These files are consumed by the Layer infrastructure for the orchestration, automation, and deployment of the model.

To start a Layer Project, just run:

$ layer clone

This will clone an empty Layer project:

├── .layer 
│ ├── project.yaml # Main project configuration file 
├── data 
│ ├── features 
│ │ ├── dataset.yaml # Definition of your featureset 
│ └── dataset 
│ └── dataset.yaml # Definition of the source data 
└── models 
└── model 
├── model.yaml # Training directives of your model 
├── # Definition of your model 
└── requirements.txt # Environment config file, if required

Where project.yaml contains the specifications, definitions, and metadata of the project. The features/dataset.yaml file contains specifications for each feature that makes up the dataset. The data/dataset.yaml file specifies where the data lives, either an external or local repository. And finally, model.yaml, which is the general description of the model.

Once each component of Layer Project has been specified, to launch it, just execute:

$ layer start

The above command will build an automatic pipeline and execute the entities in the right order. From this, the deployment can be done and served, and inference done through an API. In the next section, we will see how to generate a project, execute it, deploy and serve a model with Layer.

Deploying and Serving an ML Model with Layer

Deploying and serving a machine learning model with Layer is quite simple. Layer offers a CLI for creating, managing, and executing a Layer Project.

So let’s see the following example where we will develop an ML model with Layer. For the development of a Layer Project, the `layer-sdk` is required. It is installed as shown in Figure 5.

word image 217

Figure 5. Installation of layer-sdk

It is important to mention that the `layer-sdk` requires a Python version 3.8 or higher.

Once the `layer-sdk` is installed, we will have access to the layer command with its respective options.

word image 218

Figure 6. Command layer and options

To upload all the components of the Layer Project, we will need to log in, as shown in Figure 7. The login process will take us to `` where we will have to log in with our previously created user.

word image 219

Figure 7. Login

Once logged in, we are ready to upload our project to the Layer infrastructure. For practical purposes, we will make use of a predefined example that makes use of the well-known titanic dataset. To initialize the Layer Project, use the `layer clone` command as shown in Figure 8.

layer clone 
cd examples/titanic

The above command will clone a Layer Project with the following structure:

word image 220

Figure 9. Layer Project structure

The project is organized into two main sections, `data` and `models`. The description of each feature and the dataset description are defined with YAML manifests within the data folder. The model is described within a YAML file. The main model file and the environment are defined in this file.

Once the model’s specification and data within the Layer Project structure are done, run the project. This will create a Data and Model catalog in Layer.


To deploy the model in the Layer, just press the `Deploy` button. In this step, the model will be prepared and served through an API. The deployment process will take only a few seconds. In Figure 11, we can see the deployment process.

word image 222

Figure 11. Deploying

Once the model is deployed, Layer will generate a link through where the model will be served. To make inference, just make requests to the API passing the data to be inferred as arguments, as shown in figure 12.

word image 224

Figure 12. API Request

In this case, the inference request returns class 0 as a result.

As we could see in this example, developing a machine learning project, from its conception to the deployment and serving of the model, is relatively simple. One of the great advantages of Layer is that it allows Data Scientists, Machine Learning Engineers, etc., to focus on the fundamental parts that are the model and the data. Likewise, Layer addresses the entire MLOps process, which is a great barrier for developers in many cases.

And that’s it. As we can see, implementing models in Layer is very easy since the infrastructure takes care of the entire MLOps process, the user only takes care of the definition of each component of the model, Layer does the rest.

Final thoughts

Today, there are many frameworks and platforms for developing and maintaining the life cycle of machine learning models. Some tools like MLflow are primarily geared towards collaboration and tracking the different components of the models. On the other hand, there are tools created to use a specific platform for model development, such as Kubeflow with Kubernetes.

However, the barrier to entry to these technologies can be difficult for machine learning model developers, given the complexity of the MLOps process. Fortunately, platforms such as Layer are emerging that cover the complete MLOps process, streamlining the development, deployment, and serving of the model in a friendly and easy-to-use infrastructure.

The growing demand for ML model developers also demands technologies that facilitate each phase of the ML model life cycle. Today Layer is an exciting alternative that streamlines this process.

Oct 13th 2021

Share this post

Try Layer for free

Get started with Layers Beta

Start Free