BookmarkSubscribeRSS Feed

How to score big data with your model developed in SAS Model Studio

Started ‎09-22-2020 by
Modified ‎09-22-2020 by
Views 6,799

The analytical lifecycle has many steps and every single one is very important if you want to develop a predictive model that can automatically make intelligent decisions as part of your business processes. Arguably, the step of executing your model in a decision process and making decisions based on the model’s prediction in an automated way is the most important step. That is because during this step, the often-significant investment in the development of the model is returned and a model that is not deployed will not create business benefit.

 

While in many cases the application of predictive models, especially artificial intelligence models, is targeted at real-time applications, today many use cases depend on the execution of predictive models in batch. Organizations apply predictive models to large databases for customer journey applications, such as credit application scoring, next best actions, churn predictions and others.

 

For these batch applications it is as important to quickly and easily migrate a model from the development to the production environment. In this blog, I would like to showcase how easy it is to score a predictive model in batch that was developed in SAS Model Studio, which is part of SAS Visual Data Mining and Machine Learning (SAS VDMML). From the very first design ideas of SAS VDMML, the integration of model development and model execution has always been a very important product development aspect.

 

So, let’s assume you have developed your favorite predictive model in VDMML using SAS Model Studio. Using the pipeline comparison facility, you decided which model won your model tournament and will be used to batch score new data. The video, How to compare models in SAS steps through this process. The figure below shows the selected champion model in the pipeline comparison tab of SAS Model Studio.

 

Blog1_Fig1.png

 

How to score your favorite model

You now have 3 options to score your favorite model in the SAS Viya environment:

  1. Download the batch scoring code
  2. Publish the model to the Cloud Analytics Server (CAS)
  3. Download an API endpoint to score the model in SAS Viya

 

Let’s describe the different options quickly. For this showcase we will use a model that provides scoring assets in a so-called analytical store or Astore. An Astore is a binary file that contains the state from a predictive analytic procedure. This state from a predictive analytic procedure, such as a random forest or gradient boosting, is created using the results from the training phase of model development. Astores can be created from predictive models developed in SAS VDMML or in SAS Enterprise Miner.

 

  1. Using Batch Scoring code

In the SAS Model Studio interface, download the Astore batch scoring code by clicking on the overflow menu in the Pipeline Comparison tab (see figure below).

 

miner40_1-1600675705054.png

 

The downloaded ZIP file will provide you with a program called dmcas_epscorecode.sas. This program is a DS2 program that will call the Astore to score new data. The download action also copies the model’s Astore file into the CAS Model library.

 

In the SAS Viya developer’s environment, SAS Studio, you can now open the program dmcas_epscorecode.sas. You will have to provide a few parameters to the program to adapt it to your selected execution environment, such as the name and the port number of your CAS server.

Then you load the Astore file into memory and use the provided program - dmcas_epscorecode.sas - to apply it to new data as shown in the figure below.

 

miner40_2-1600675705063.png

 

The result will be a CAS table that contains the scores for every record.

 

  1. Publish the model to the Cloud Analytics Server (CAS)

Another method to score Astore models in batch is to use the publishing facility in SAS Model Studio. In the Pipeline Comparison tab, select Publish Model from the overflow menu as shown in the figure below.

 

Blog1_Fig4.png

 

In the publishing wizard, select the publishing destination CAS for batch scoring and provide a name for the published model. The publishing creates an entry of the model in the destination table; by default, that table is called SAS_MODEL_TABLE. Publishing destinations are usually defined by your SAS Administrator in SAS Environment Manager. For more details. Please refer to the online documentation.

 

Again in SAS Studio, we can now use the published model for batch scoring using the CAS actions “runModel” or “runModelLocal

 

Blog1_Fig5.png

 

 

  1. Download an API endpoint to score the model in SAS Viya

And finally, we can use an API that is created automatically for batch scoring and can be called from different front ends, such as SAS, Python or REST. In the Pipeline Comparison tab of SAS Model Studio, select Download score API from the overflow menu. Then choose SAS as the front end.

 

Blog1_Fig6.png

 

Upload the provided SAS program to SAS Studio and insert the required parameters to run the program.

  • datasourceUri: the SAS Viya link to the input CAS table for the batch scoring
  • outputCasLib: the name of the output CAS library for the scoring output table 
  • outoutTable Name: the name of the scoring output table in CAS

 

Blog1_Fig7.png

 

Running this code will trigger the execution of the scoring in SAS Viya and creates the scoring output table in the CAS environment.

 

The examples in this blog hopefully demonstrated how easy it is to score Astore models in CAS using different batch scoring interfaces.

In a following blog, I will describe how to score Astore models created in SAS Model Studio from a Jupyter notebook using Python.

Finally, I would like to thank my colleagues at SAS who helped reviewing and publishing this blog.

Comments

Hello,

 

I have a forecasting model in which I first select the task "Diagnose" and then "Fit" to later save the results as the table name "X" in my caslib, I have not seen in the documentation how to run automatically that step of setting manually diagnose, run, then select fit and run, is that possible to execute? if it is could you tell me in what part of the code can I do that?

 

Thank you

Hello Experts,

 

I have a query regarding the procedure to use dmcas_epscorecode.sas for scoring in SAS Viya.

Suppose if I download the zip file from the Model manager I usually see only one .sas file within it named dmcas_epscorecode.sas. This program usually has an ID (e.g. _7IXE0CW8VV1DTXTFXT8SRMZET). The job of this program is to just score a model, (note there is no data transformation pipeline yet involved in dmcas). Now I go to SAS Viya and declare all required cas sessions and permanent libraries where my scoring data resides.

I use proc casutil as below to load the pickle file in the Public library

proc casutil;
load casdata="_7IXE0CW8VV1DTXTFXT8SRMZET_ast.sashdat"

incaslib="Models"
casout="my_Astore"
outcaslib=public replace;
run;

This would load the file in public and once I place the dmcas code inside the designated permanent library through win-scp I can use the below code

proc astore ;
score data =public.my_score_data_input
rstore=public.my_Astore
epcode="/folder1/folder1.1/folder1.1.1/folder1.1.1.1/dmcas_epscorecode.sas"
OUT=public.scored_output;
quit;

Now I have the scored output with all my independent attributes and predicted y_hat inside public.scored_output

------------------------------------------------------------------------------------------------------------------------------------------------------

There is a different scenario wherein I can also do several data transformations or combined predictive models and that would also allow me to download a dmcas_epscorecode.sas, however now in this file I would see several id(s) instead of one (mentioned in above example).

In this scenario, how to load the pickle file in public library, as in what would be the equivalent code for the below

proc casutil;
load casdata="?????????_ast.sashdat"

incaslib="Models"
casout="my_Astore"
outcaslib=public replace;
run;

Because now in my dmcas code I see several id(s) including a node id, which id to replace here?

Moreover, what would be the equivalent of proc store also?

Any help would be much appreciated, please. We keep getting the below error and we are unable to figure out the equivalent of proc casutil and proc astore

error.png

@Sascha_Schubert great article.

Implementing successfully the second option I experience that the runModel takes a long time to score the table because it runs on a single thread. 

Do you have any idea how to adapt the code to run it on multiple threads?

Thanks

Hello acordes,
Thanks for reaching out. I have to admit that I have not been very much involved in scoring Astore with multiple threads.
I found this information in the SAS online support documentation. I hope this is helpful.
https://go.documentation.sas.com/doc/en/pgmsascdc/v_025/caspg/n0kk5ezhtskpncn1vgfxrv6eobgt.htm[cid:i...]

Thanks
Sascha

Version history
Last update:
‎09-22-2020 10:52 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started