As companies continue to embrace machine learning and data science, managing and deploying machine learning models at scale has become a significant challenge. One of the most popular open-source tools for managing machine learning workflows is MLflow. It provides a platform-agnostic way to manage and deploy machine learning models across different platforms and languages. SAS Model Manager, on the other hand, is an enterprise-grade model management platform that provides comprehensive capabilities for model governance, deployment, and monitoring.
In this blog, we will explore how to register MLflow models to SAS Model Manager using sasctl, a Python package that provides an interface to SAS Viya for model deployment and management. We will cover the necessary steps to install and configure sasctl, register an MLflow model to SAS Model Manager, and deploy the model to SAS Viya for scoring. This guide aims to provide a comprehensive and practical guide for data scientists and engineers who want to integrate their MLflow models with SAS Model Manager.
pip install mlflow
pip install sasctl
mlflow server --backend-store-uri sqlite:///backend.db --default-artifact-root ./mlruns
This command starts an instance of the MLflow server with the following configurations:
--backend-store-uri sqlite:///backend.db
: specifies the backend store URI where the MLflow server should persist metadata related to experiments, runs, parameters, metrics, and artifacts. In this case, the backend store uses an SQLite database file named backend.db
.--default-artifact-root ./mlruns
: specifies the default artifact store location where the MLflow server should store artifacts generated by runs. In this case, the default artifact store location is the ./mlruns
directory relative to the current working directory.## setup mlflow experiment
import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:5000") # connects to a tracking URI.
mlflow.set_experiment("digits-classification-experiment_sasctl") ##
This code snippet is used to configure the MLflow client to connect to a tracking server running at the specified URL and to set the active experiment to “digits-classification-experiment_sasctl”.
The mlflow.set_tracking_uri
function specifies the tracking server URI that the client will use to communicate with the tracking server. In this case, it sets the tracking URI to "http://127.0.0.1:5000", which is a local server running on the same machine as the code.
The mlflow.set_experiment
function is used to set the active experiment for this client session. Experiments are used to group runs and artifacts in MLflow, making it easier to organize and track experiments. The set_experiment
function takes an experiment name as a parameter, and in this case, it sets the experiment name to "digits-classification-experiment_sasctl". This means that all subsequent runs and artifacts created by this MLflow client will be associated with this experiment.
### import libraries
from mlflow.models.signature import infer_signature
import mlflow
from sklearn import datasets
from sklearn import metrics
import requests
import json
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from pathlib import Path
# sasctl interface for importing models
import sasctl.pzmm as pzmm
from sasctl import Session
import warnings
import getpass
from sasctl import Session
warnings.filterwarnings("ignore")
#### load dataset
## split data to train and test
digits = datasets.load_digits() #dataset loading
x = digits.data #Features stored in X
y = digits.target
df = pd.DataFrame(data= np.c_[digits['data'], digits['target']],
columns= digits['feature_names'] + ['target'])
df.head()
x_train, x_test, y_train, y_test = train_test_split(df[digits['feature_names']], df['target'], test_size=0.2, random_state=42)
This code is loading the ‘digits’ dataset from the sklearn library which is a dataset of hand-written digits that are already flattened into an array. The dataset contains 64 features (8x8 image pixels) and 10 classes (0 to 9).
The ‘digits’ dataset is then split into input features (x) and target variables (y). A pandas dataframe is created from the input features (x) and the target variables (y). The ‘train_test_split’ function from the sklearn library is used to split the data into training and testing datasets. The training dataset is used to fit a machine learning model while the testing dataset is used to evaluate the performance of the model.
The ‘train_test_split’ function takes in the following arguments:
## define randomforest model
model = RandomForestClassifier(n_estimators=300).fit(x_train, y_train)
##Model signature defines schema of model input and output
signature = infer_signature(x_train, model.predict(x_train))
## log model score to mlflow
score = model.score(x_test, y_test)
print("Score: %s" % score)
mlflow.log_metric("score", score)
### log model
mlflow.sklearn.log_model(model, "model", signature=signature)
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
This code defines a Random Forest classification model using the RandomForestClassifier
algorithm from Scikit-learn. The n_estimators
parameter is set to 300, which determines the number of trees in the random forest.
After training the model with the training dataset (x_train
and y_train
), the code uses infer_signature
function to define a model signature that specifies the schema of the model's input and output. The signature is later used to log the model in the MLflow experiment.
The code then calculates the model score on the test dataset (x_test
and y_test
) using the score
method of the trained model. The score is then logged in MLflow as a metric with the name "score".
Finally, the code logs the trained model in the MLflow experiment using the mlflow.sklearn.log_model
method. The model is saved with the name "model" and the signature defined earlier. The code prints the ID of the run that contains the saved model. This will allow to track the model's performance and history of changes made to the model.
open localhost http://127.0.0.1:5000. you will find digits-classification-experiment created
mlPath = Path(f'./mlruns/1/{mlflow.active_run().info.run_uuid}/artifacts/model')
## get info aboud model variables ,input and output
varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file( mlPath)
This code is related to reading information about an MLflow model that was saved in SAS Model Manager.
mlPath
is a path to the MLflow model saved in the specified mlruns
directory for the active run.
pzmm.MLFlowModel
is a class from the sasctl
package that provides functionality for working with MLflow models in SAS Model Manager. read_mlflow_model_file
is a method of the MLFlowModel
class that takes the path to the MLflow model as an input and returns three dictionaries:
varDict
: a dictionary that contains the names of the input and output variables and their typesinputsDict
: a dictionary that maps the input variable names to their typesoutputsDict
: a dictionary that maps the output variable names to their typesThese dictionaries provide information about the structure of the MLflow model and its inputs and outputs, which is necessary for registering the model in SAS Model Manager.
## pickle model
modelPrefix = 'RandomForestClassifier'
zipFolder = Path.cwd() / f'MLFlowModels/{modelPrefix}'
pzmm.PickleModel.pickle_trained_model(trained_model=model,model_prefix=modelPrefix, pickle_path=zipFolder, mlflow_details=varDict)
## josinify inputs and outputs
J = pzmm.JSONFiles()
J.writeVarJSON(inputsDict, isInput=True, jPath=zipFolder)
J.writeVarJSON(outputsDict, isInput=False, jPath=zipFolder)
J.writeModelPropertiesJSON(modelName=modelPrefix,
modelDesc='MLFlow Model ',
targetVariable='',
modelType='RandomForestClassifier',
modelPredictors='',
targetEvent=1,
numTargetCategories=1,
eventProbVar='tensor',
jPath=zipFolder,
modeler='sasdemo')
# Write model metadata to a json file
J.writeFileMetadataJSON(modelPrefix, jPath=zipFolder)
This code block performs the following steps:
pickle_trained_model
function from the PickleModel
class in pzmm
module. The pickled model is saved to a folder specified by zipFolder
variable.writeVarJSON
function from the JSONFiles
class in pzmm
module. These JSON files are saved to the same folder specified by zipFolder
.writeModelPropertiesJSON
function from the JSONFiles
class in pzmm
module. The model properties include the name, description, type, predictors, target event, target categories, and event probability variable.writeFileMetadataJSON
function from the JSONFiles
class in pzmm
module. The model metadata includes the name of the pickled model and the folder where it is saved.you will find files created in zipFolder path
## get username , password and host for sas server
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")
host = getpass.getpass("Hostname: ")
sess = Session(host,username,password,verify_ssl=False)
This code snippet is used to create a Session
object to connect to a SAS Viya server. It prompts the user to enter their username, password, and hostname (or IP address) of the SAS Viya server.
The getpass.getpass
method is used to prompt the user for sensitive information like the username and password without echoing it back to the console. The values entered by the user are assigned to the username
, password
, and host
variables respectively.
The Session
object is created with these inputs to authenticate the user's credentials and establish a connection to the SAS Viya server. The verify_ssl=False
parameter is used to disable SSL verification for cases where the SAS Viya server has a self-signed SSL certificate. This is generally not recommended for production environments.
## rigister model to sas model mamager
I = pzmm.ImportModel()
I.pzmmImportModel(zipFolder, modelPrefix, 'MLFlowTest', inputsDict, None, '{}.predict({})', metrics=['tensor'], force=True)
This code is responsible for registering the MLFlow model to SAS Model Manager using the pzmm
library.
The first step is to create an instance of the ImportModel
class from the pzmm
library by calling pzmm.ImportModel()
. Then, the pzmmImportModel()
method is called on the ImportModel
instance, which takes the following parameters:
zipFolder
: A path to the folder containing the zipped model artifact files.modelPrefix
: A prefix string for the name of the registered model in SAS Model Manager.projectName
: The name of the SAS Model Manager project to which the model should be added.inputDict
: A dictionary containing information about the input variables for the model.outputDict
: A dictionary containing information about the output variables for the model. In this case, it is set to None
.codeTemplate
: A string template that specifies the code for invoking the model. Here, it is set to {}.predict({})
, which means that the predict()
method of the model will be called with input data.metrics
: A list of metrics to be logged for the model. Here, it is set to ['tensor']
.force
: A boolean value indicating whether to overwrite an existing model with the same name.This code essentially imports the MLFlow model into SAS Model Manager and sets up the necessary information about inputs, outputs, and metrics.
open SAS Model Manager, and you will find the RandomForest model
In conclusion, registering MLflow models to SAS Model Manager using sasctl is an efficient and powerful way to manage machine learning models within an enterprise environment. By leveraging the sasctl Python library, users can easily deploy and manage their models on the SAS Viya platform, allowing for greater collaboration and efficiency in model deployment. The step-by-step guide provided in this blog offers a comprehensive overview of the process, from setting up a SAS Viya environment to registering a model in SAS Model Manager. By following this guide, data scientists can easily integrate their MLflow models with SAS Model Manager, streamlining the process of deploying and managing models in production environments.
you can find the notebook on GitHub Notebook
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.