Registering MLflow Models to SAS Model Manager using sasctl: A Comprehensive Guide

Introduction

As companies continue to embrace machine learning and data science, managing and deploying machine learning models at scale has become a significant challenge. One of the most popular open-source tools for managing machine learning workflows is MLflow. It provides a platform-agnostic way to manage and deploy machine learning models across different platforms and languages. SAS Model Manager, on the other hand, is an enterprise-grade model management platform that provides comprehensive capabilities for model governance, deployment, and monitoring.

In this blog, we will explore how to register MLflow models to SAS Model Manager using sasctl, a Python package that provides an interface to SAS Viya for model deployment and management. We will cover the necessary steps to install and configure sasctl, register an MLflow model to SAS Model Manager, and deploy the model to SAS Viya for scoring. This guide aims to provide a comprehensive and practical guide for data scientists and engineers who want to integrate their MLflow models with SAS Model Manager.

Prerequisites

Instal libraries, in cmd run

pip install mlflow
pip install sasctl

Lunch mlflow server, in cmd run

mlflow server --backend-store-uri sqlite:///backend.db --default-artifact-root ./mlruns

This command starts an instance of the MLflow server with the following configurations:

--backend-store-uri sqlite:///backend.db: specifies the backend store URI where the MLflow server should persist metadata related to experiments, runs, parameters, metrics, and artifacts. In this case, the backend store uses an SQLite database file named backend.db.
--default-artifact-root ./mlruns: specifies the default artifact store location where the MLflow server should store artifacts generated by runs. In this case, the default artifact store location is the ./mlruns directory relative to the current working directory.

Configure mlflow

## setup mlflow experiment
import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:5000") #  connects to a tracking URI.
mlflow.set_experiment("digits-classification-experiment_sasctl") ##

This code snippet is used to configure the MLflow client to connect to a tracking server running at the specified URL and to set the active experiment to “digits-classification-experiment_sasctl”.

The mlflow.set_tracking_uri function specifies the tracking server URI that the client will use to communicate with the tracking server. In this case, it sets the tracking URI to "http://127.0.0.1:5000", which is a local server running on the same machine as the code.

The mlflow.set_experiment function is used to set the active experiment for this client session. Experiments are used to group runs and artifacts in MLflow, making it easier to organize and track experiments. The set_experiment function takes an experiment name as a parameter, and in this case, it sets the experiment name to "digits-classification-experiment_sasctl". This means that all subsequent runs and artifacts created by this MLflow client will be associated with this experiment.

Load data

### import libraries
from mlflow.models.signature import infer_signature
import mlflow
from sklearn import datasets
from sklearn import metrics
import requests
import json
import numpy as np 
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from pathlib import Path
# sasctl interface for importing models
import sasctl.pzmm as pzmm 
from sasctl import Session
import warnings
import getpass
from sasctl import Session
warnings.filterwarnings("ignore")


#### load dataset 
## split data to train and test 
digits = datasets.load_digits() #dataset loading
x = digits.data               #Features stored in X 
y = digits.target 

df = pd.DataFrame(data= np.c_[digits['data'], digits['target']],
                     columns= digits['feature_names'] + ['target'])
df.head()

x_train, x_test, y_train, y_test = train_test_split(df[digits['feature_names']], df['target'], test_size=0.2, random_state=42)

This code is loading the ‘digits’ dataset from the sklearn library which is a dataset of hand-written digits that are already flattened into an array. The dataset contains 64 features (8x8 image pixels) and 10 classes (0 to 9).

The ‘digits’ dataset is then split into input features (x) and target variables (y). A pandas dataframe is created from the input features (x) and the target variables (y). The ‘train_test_split’ function from the sklearn library is used to split the data into training and testing datasets. The training dataset is used to fit a machine learning model while the testing dataset is used to evaluate the performance of the model.

The ‘train_test_split’ function takes in the following arguments:

df[digits[‘feature_names’]]: the input features from the pandas dataframe
df[‘target’]: the target variables from the pandas dataframe
test_size=0.2: the percentage of data to use for testing (in this case, 20%)
random_state=42: a random seed to ensure that the same results can be reproduced.

Build the model and register it to mlflow

## define randomforest model 
model = RandomForestClassifier(n_estimators=300).fit(x_train, y_train)

##Model signature defines schema of model input and output
signature = infer_signature(x_train, model.predict(x_train))

## log model score to mlflow
score = model.score(x_test, y_test)
print("Score: %s" % score)
mlflow.log_metric("score", score)

### log model 
mlflow.sklearn.log_model(model, "model", signature=signature)
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)

This code defines a Random Forest classification model using the RandomForestClassifier algorithm from Scikit-learn. The n_estimators parameter is set to 300, which determines the number of trees in the random forest.

After training the model with the training dataset (x_train and y_train), the code uses infer_signature function to define a model signature that specifies the schema of the model's input and output. The signature is later used to log the model in the MLflow experiment.

The code then calculates the model score on the test dataset (x_test and y_test) using the score method of the trained model. The score is then logged in MLflow as a metric with the name "score".

Finally, the code logs the trained model in the MLflow experiment using the mlflow.sklearn.log_model method. The model is saved with the name "model" and the signature defined earlier. The code prints the ID of the run that contains the saved model. This will allow to track the model's performance and history of changes made to the model.

open localhost http://127.0.0.1:5000. you will find digits-classification-experiment created

Register model to SAS model manager

mlPath = Path(f'./mlruns/1/{mlflow.active_run().info.run_uuid}/artifacts/model')

## get info aboud model variables ,input and output
varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file( mlPath)

This code is related to reading information about an MLflow model that was saved in SAS Model Manager.

mlPath is a path to the MLflow model saved in the specified mlruns directory for the active run.

pzmm.MLFlowModel is a class from the sasctl package that provides functionality for working with MLflow models in SAS Model Manager. read_mlflow_model_file is a method of the MLFlowModel class that takes the path to the MLflow model as an input and returns three dictionaries:

varDict: a dictionary that contains the names of the input and output variables and their types
inputsDict: a dictionary that maps the input variable names to their types
outputsDict: a dictionary that maps the output variable names to their types

These dictionaries provide information about the structure of the MLflow model and its inputs and outputs, which is necessary for registering the model in SAS Model Manager.

## pickle model 
modelPrefix = 'RandomForestClassifier'
zipFolder = Path.cwd() / f'MLFlowModels/{modelPrefix}'
pzmm.PickleModel.pickle_trained_model(trained_model=model,model_prefix=modelPrefix, pickle_path=zipFolder, mlflow_details=varDict)

## josinify inputs and outputs
J = pzmm.JSONFiles()
J.writeVarJSON(inputsDict, isInput=True, jPath=zipFolder)
J.writeVarJSON(outputsDict, isInput=False, jPath=zipFolder)

J.writeModelPropertiesJSON(modelName=modelPrefix,
                            modelDesc='MLFlow Model ',
                            targetVariable='',
                            modelType='RandomForestClassifier',
                            modelPredictors='',
                            targetEvent=1,
                            numTargetCategories=1,
                            eventProbVar='tensor',
                            jPath=zipFolder,
                            modeler='sasdemo')

# Write model metadata to a json file
J.writeFileMetadataJSON(modelPrefix, jPath=zipFolder)

This code block performs the following steps:

Pickles the trained model using the pickle_trained_model function from the PickleModel class in pzmm module. The pickled model is saved to a folder specified by zipFolder variable.
Serializes the inputs and outputs dictionaries to JSON files using the writeVarJSON function from the JSONFiles class in pzmm module. These JSON files are saved to the same folder specified by zipFolder.
Writes the model properties to a JSON file using the writeModelPropertiesJSON function from the JSONFiles class in pzmm module. The model properties include the name, description, type, predictors, target event, target categories, and event probability variable.
Writes the model metadata to a JSON file using the writeFileMetadataJSON function from the JSONFiles class in pzmm module. The model metadata includes the name of the pickled model and the folder where it is saved.

you will find files created in zipFolder path

## get username , password and host for sas server 
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")
host = getpass.getpass("Hostname: ")

sess = Session(host,username,password,verify_ssl=False)

This code snippet is used to create a Session object to connect to a SAS Viya server. It prompts the user to enter their username, password, and hostname (or IP address) of the SAS Viya server.

The getpass.getpass method is used to prompt the user for sensitive information like the username and password without echoing it back to the console. The values entered by the user are assigned to the username, password, and host variables respectively.

The Session object is created with these inputs to authenticate the user's credentials and establish a connection to the SAS Viya server. The verify_ssl=False parameter is used to disable SSL verification for cases where the SAS Viya server has a self-signed SSL certificate. This is generally not recommended for production environments.

## rigister model to sas model mamager 
I = pzmm.ImportModel()
I.pzmmImportModel(zipFolder, modelPrefix, 'MLFlowTest', inputsDict, None, '{}.predict({})', metrics=['tensor'], force=True)

This code is responsible for registering the MLFlow model to SAS Model Manager using the pzmm library.

The first step is to create an instance of the ImportModel class from the pzmm library by calling pzmm.ImportModel(). Then, the pzmmImportModel() method is called on the ImportModel instance, which takes the following parameters:

zipFolder: A path to the folder containing the zipped model artifact files.
modelPrefix: A prefix string for the name of the registered model in SAS Model Manager.
projectName: The name of the SAS Model Manager project to which the model should be added.
inputDict: A dictionary containing information about the input variables for the model.
outputDict: A dictionary containing information about the output variables for the model. In this case, it is set to None.
codeTemplate: A string template that specifies the code for invoking the model. Here, it is set to {}.predict({}), which means that the predict() method of the model will be called with input data.
metrics: A list of metrics to be logged for the model. Here, it is set to ['tensor'].
force: A boolean value indicating whether to overwrite an existing model with the same name.

This code essentially imports the MLFlow model into SAS Model Manager and sets up the necessary information about inputs, outputs, and metrics.

open SAS Model Manager, and you will find the RandomForest model

conclusion

In conclusion, registering MLflow models to SAS Model Manager using sasctl is an efficient and powerful way to manage machine learning models within an enterprise environment. By leveraging the sasctl Python library, users can easily deploy and manage their models on the SAS Viya platform, allowing for greater collaboration and efficiency in model deployment. The step-by-step guide provided in this blog offers a comprehensive overview of the process, from setting up a SAS Viya environment to registering a model in SAS Model Manager. By following this guide, data scientists can easily integrate their MLflow models with SAS Model Manager, streamlining the process of deploying and managing models in production environments.

you can find the notebook on GitHub Notebook