SAS Scoring code translator for Python (and R)

5 Likes

Hello everyone,

Analytic models come in many shapes, sizes, and complexities. A common SAS use case is to run through the entire Model Management lifecycle with the SAS software; however, we recognize that this is not always the case. You may need to work with SAS objects outside of SAS. Whether that be using models created in Python or R and deploying them to SAS or creating a model in SAS and needing to deploy it elsewhere.

To assist in the latter use case, I am here to present you to the new Open Source packages for Python, sas-scoring-translator-python (pysct), and for R, sas-scoring-translator-r (rsct), which translate SAS scoring code to those languages. This makes it easy to call SAS models from your application (or if you just want to learn a bit more of how the SWAT package works). The tools are available on sassoftware GitHub.

Get the SAS Scoring Code

As you might know by now, SAS has a lot of interfaces where you can build models (it can even build one for you with Auto ML). And, after you create your great models you want to use them, so what could you do? Export a scoring code, of course.

First, in Model Studio go to Pipeline comparison as seen below.

Next, select your model (which doesn't have to be published to MAS), and export its scoring code.

This results in a zip file with your SAS scoring code (save the file name dmcas_epsscorecode.sas in your head; we'll use it later on).

You don't have to unzip the file, but if you take a look inside, you will find something like the example below depending on which model you are using.

/*
* This score code file references one or more analytic stores that are located in the caslib "Models".
* This score code file references the following analytic-store tables:
* _28LWD4IVTS9I294A6F893FNBY_ast
*/
/*----------------------------------------------------------------------------------*/
/* Product: Visual Data Mining and Machine Learning */
/* Release Version: V2021.1.1 */
/* Component Version: V2020.1.5 */
/* CAS Version: V.04.00M0P05162021 */
/* SAS Version: V.04.00M0P051621 */
/* Site Number: 70180938 */
/* Host: sas-cas-server-default-client */
/* Encoding: utf-8 */
/* Java Encoding: UTF8 */
/* Locale: en_US */
/* Project GUID: f8b37e46-c893-4cf5-8183-0802c729b1b0 */
/* Node GUID: 25d24897-fad4-4e1d-bc94-1a7d93340ade */
/* Node Id: 28LWD4IVTS9I294A6F893FNBY */
/* Algorithm: Gradient Boosting */
/* Generated by: sasdemo */
/* Date: 26JUL2021:19:12:08 */
/*----------------------------------------------------------------------------------*/
data sasep.out;
dcl package score _28LWD4IVTS9I294A6F893FNBY();
dcl double "P_BAD1" having label n'Predicted: BAD=1';
dcl double "P_BAD0" having label n'Predicted: BAD=0';
dcl nchar(32) "I_BAD" having label n'Into: BAD';
dcl nchar(4) "_WARN_" having label n'Warnings';
dcl double EM_EVENTPROBABILITY;
dcl nchar(12) EM_CLASSIFICATION;
dcl double EM_PROBABILITY;
varlist allvars [_all_];

method init();
_28LWD4IVTS9I294A6F893FNBY.setvars(allvars);
_28LWD4IVTS9I294A6F893FNBY.setkey(n'C08002467175AC235F1C68321869975F6170F229');
end;

method post_28LWD4IVTS9I294A6F893FNBY();
dcl double _P_;

if "P_BAD0" = . then "P_BAD0" = 0.8005033557;
if "P_BAD1" = . then "P_BAD1" = 0.1994966443;
if MISSING("I_BAD") then do ;
_P_ = 0.0;
if "P_BAD1" > _P_ then do ;
_P_ = "P_BAD1";
"I_BAD" = ' 1';
end;
if "P_BAD0" > _P_ then do ;
_P_ = "P_BAD0";
"I_BAD" = ' 0';
end;
end;
EM_EVENTPROBABILITY = "P_BAD1";
EM_CLASSIFICATION = "I_BAD";
EM_PROBABILITY = MAX("P_BAD1", "P_BAD0");

end;

method run();
set SASEP.IN;
_28LWD4IVTS9I294A6F893FNBY.scoreRecord();
post_28LWD4IVTS9I294A6F893FNBY();
end;

method term();
end;

enddata;

This looks fine, as long you know enough SAS. However, if you don't, or if you want to score tables with the distributed power of SAS Viya using Python or R, this doesn't really help you. Of course, you could use SWAT and translate all this by hand, but this would not be efficient or quick. This is why we are here! The pysct - Python Scoring Code Translator (and rsct) will help you. It will read the zip file and translate it for you, let's look at it.

We will start with Python, but if you are interested only in R, feel free to jump down to the R section.

Python Scoring Code Translator

First, let's install the package from SAS GitHub:

## Install directly from git if you don't have it
pip install git+https://github.com/sassoftware/sas-scoring-translator-python.git

The tool is quite easy to use. Look at the reference table and check where your model came from and the scoring code type. In our example, we've got a model from Model Studio, and the scoring code type is the name I told you earlier to save, dmcas_epscorecode.sas. Now, we use the EPS_translate() function. Other model and scoring code combos are defined in the following table.

Interface	Code Type	Base File Name	Translation Function
Model Studio	DataStep	dmcas_scorecode.sas	`pysct.DS_translate()`
Model Studio	DS2	dmcas_epscorecode.sas	`pysct.EPS_translate()`
Visual Text Analytics	Sentiment - CAS Procedure	scoreCode.sas	`pysct.nlp_sentiment_translate()`
Visual Text Analytics	Categories - CAS Procedure	scoreCode.sas	`pysct.nlp_category_translate()`
Visual Text Analytics	Topics - CAS Procedure	AstoreScoreCode.sas	`pysct.nlp_topics_translate()`
Visual Text Analytics	Concepts - CAS Procedure	ScoreCode.sas	`pysct.nlp_concepts_translate()`

With just the following line, you will have everything you need. In my case it would look like the code below.

import pysct

out = pysct.EPS_translate(
            in_file = "C:/score_code_Gradient Boosting.zip",  ##  path to your file (yes, zipped, you don't have to worry)
            out_caslib = "casuser",                           ## the caslib of the output table (after data scored)
            out_castable = "hmeq",                            ## the table name of the output table (after data scored)
            in_caslib = "public",                             ## the caslib table you want to score
            in_castable = "hmeq",                             ## the table name of the table you want to score
            copyVars="ALL",                                   ## by default SAS only returns the scored output, use "ALL" if you want to copy all table vars, or just omit if you don't want to copy
            out_file="gradientBoosting.py"                    ## the output file path
)

out.keys()

Sample response:

The file was successfully written to gradientBoosting.py

dict_keys(['ds2_raw', 'py_code', 'out_caslib', 'out_castable', 'out_file'])

By default, pysct writes the file to your current working directory. All of the code can be found in the out object in case you want to see it, but lets take a look in the output gradientBoosting.py.

## SWAT package needed to run the codes, below the packages in pip and conda
# documentation: https://github.com/sassoftware/python-swat/
# pip install swat
# conda install -c sas-institute swat

import swat

## Defining tables and models variables
in_caslib = "public"
in_castable = "hmeq"
out_caslib = "casuser"
out_castable = "hmeq"
astore_name = "_28LWD4IVTS9I294A6F893FNBY_ast"
astore_file_name = "_28LWD4IVTS9I294A6F893FNBY_ast.sashdat"

## Connecting to SAS Viya
conn = swat.CAS(hostname = "myserver.com", ## change if needed
                port = 8777,
                protocol='http',  ## change protocol to cas and port to 5570 if using binary connection (unix)
                username='username', ## use your own credentials
                password='password') ## we encorage using .authinfo 


## Loading model to memory
## assuming the model is already inside the viya server

conn.table.loadTable(caslib= "Models",
                      path = astore_file_name, #case sensitive
                      casOut = {"name": astore_name,
                                "caslib": "Models"}
                                )

score_table = conn.CASTable(name = in_castable,
                            caslib = in_caslib
                                )

column_names = score_table.columns.tolist()


## loading astore actionset and scoring
conn.loadActionSet("astore")

conn.astore.score(table = {"caslib": in_caslib, "name": in_castable},
                   out = {"caslib": out_caslib, "name": out_castable, "replace": True},
                   copyVars = column_names,
                   rstore = {"name": astore_name, "caslib": "Models"}
              )

## Obtaining output/results table
scored_table = conn.CASTable(name = out_castable,
                              caslib = out_caslib)
                              
scored_table.head()

And the magic is done, you just have to edit the connection (swat.CAS) with your credentials and server name, and your code is ready to use in Python.

Even though it uses some default values (or copy from your scoring code file), you are free to change things as you fit. At this point though, you have a good starting point for better integration.

R Scoring Code Translator

If you read the Python section, there is not much to change, only in syntax. Get your <- keys ready.

First, we will install the package from SAS GitHub (it is not available on CRAN).

# Since the package is not available on cran, you have to install from our git
# we recommend using the remotes package
# install.packages("remotes") # uncomment if you don't have it yet

remotes::install_github("sassoftware/sas-scoring-translator-r")

To know exactly which function to use, take look at the reference table and check where you got your model from, and the scoring code type. In our example, we've got a model from Model Studio, and the scoring code type is the name I told your earlier to remember, dmcas_epscorecode.sas. So, we should just use the EPS_translate() function. Other model and scoring code combos are defined in the following table.

Interface	Code Type	Base File Name	Translation Function
Model Studio	DataStep	dmcas_scorecode.sas	`DS_translate()`
Model Studio	DS2	dmcas_epscorecode.sas	`EPS_translate()`
Visual Text Analytics	Sentiment - CAS Procedure	scoreCode.sas	`nlp_sentiment_translate()`
Visual Text Analytics	Categories - CAS Procedure	scoreCode.sas	`nlp_category_translate()`
Visual Text Analytics	Topics - CAS Procedure	AstoreScoreCode.sas	`nlp_topics_translate()`
Visual Text Analytics	Concepts - CAS Procedure	ScoreCode.sas	`nlp_concepts_translate()`

And with a couple of lines we will be able to translate our code. We don't even need to unzip our Scoring code.

## load the package
library("rsct")

output_infos <- EPS_translate(in_file = "C:/score_code_Gradient Boosting.zip",  ##  path to your file (yes, zipped, you don't have to worry)
                              out_caslib = "casuser",                     ## the caslib of the output table (after data scored)
                              out_castable = "hmeq_scored",               ## the table name of the output table (after data scored)
                              in_caslib = "public",                       ## the caslib table you want to score
                              in_castable = "hmeq",                       ## the table name of the table you want to score
                              copyVars = "ALL",                           ## by default SAS only returns the scored output, use "ALL" if you want to copy all table vars, or just omit if you don't want to copy
                              out_file = "gb_translated.R"                ## the output file path
)

names(output_infos)

Sample response:

File successfully written to gb_translated.R

[1] "r_code"       "out_file"     "out_caslib"   "out_castable"

The gb_translated.R was written to your working directory, but you could also set a full path. The output_infos is a list with details if you need to use the results somewhere else. Look at the output code:

## install swat package from github if needed, uncomment OS version
# install.packages('https://github.com/sassoftware/R-swat/releases/download/v1.6.1/R-swat-1.6.1-linux64.tar.gz',repos=NULL, type='file') ## linux
# install.packages('https://github.com/sassoftware/R-swat/releases/download/v1.6.1/R-swat-1.6.1-win64.tar.gz',repos=NULL, type='file') ## windows
# install.packages('https://github.com/sassoftware/R-swat/releases/download/v1.6.1/R-swat-1.6.1-REST-only-osx64.tar.gz',repos=NULL, type='file') ## osx

## Load library
library("swat")

## Defining tables and models variables
in_caslib <- "public"
in_castable <- "hmeq"
out_caslib <- "casuser"
out_castable <- "hmeq_scored"
astore_name <- "_28LWD4IVTS9I294A6F893FNBY_ast"
astore_file_name <- "_28LWD4IVTS9I294A6F893FNBY_ast.sashdat"

## Connecting to SAS Viya
conn <- CAS(hostname = "myserver.com", ## change if needed
						port = 8777,
						protocol='http',  ## change protocol to cas and port to 5570 if using binary connection (unix)
						username='sasusername', ## use your own credentials
						password='password') ## we encorage using  .authinfo

## Loading model to memory
cas.table.loadTable(conn,
                      caslib= "Models",
                      path = astore_file_name , #case sensitive
                      casOut = list(name = astore_name,
                                    caslib = "Models")
              )

## Defining scoring table obtaining column names
score_table <- defCasTable(conn,
                             tablename = in_castable,
                             caslib = in_caslib)

column_names <- names(score_table)

## loading astore actionset and scoring
loadActionSet(conn, "astore")

cas.astore.score(conn,
                   table = list(caslib= in_caslib, name = in_castable),
                   out = list(caslib = out_caslib, name = out_castable, replace = TRUE),
                   copyVars = column_names,
                   rstore = list(name = astore_name, caslib = "Models")
              )

## Obtaining output/results table
scored_table <- defCasTable(conn,
                            tablename = out_castable,
                            caslib = out_caslib)

head(scored_table)

As you can see, it is almost ready to use. You just have to edit the CAS connection (CAS) with your credentials and server name, and your code is ready to be used in your R environment. You are free to change as you please. This is a good way to start understanding how SAS and Open Source can work together.

Conclusion

The sooner we agree on it's not a question of Python, R or SAS, but rather with SAS, the sooner we can move on to creating integrated solutions. I hope that these tools can be useful and interesting for you. We will keep improving this package, specially if I know you are using it. Any bugs or requests, please feel free to open them on Github. Let's keep moving all integration capabilities forward.

SAS Communities Library