The purpose of this article is to show how to generate score code for unsupervised learning nodes in Model Studio like the Clustering node and score data in SAS Studio. Currently in a Data Mining and Machine Learning project in Model Studio, you can deploy the score code only for a predictive model (that is, a branch of the pipeline that includes a Supervised Learning node). But perhaps you want the score code from the Clustering node, or Anomaly Detection node which uses an unsupervised learning method (not involving the target variable). After clustering is performed using Clustering node and you determine the clusters, you would like to apply the "rules" to a different data set. For example, if it is computationally infeasible to perform the cluster analysis on the whole population in your system due to the large amount of data, you want to score all the observations and assign them to the preliminary clusters directly in the first stage. Or you might want to deploy your cluster analysis on an altogether different data set. In all these cases, no clustering iterations are performed to determine the cluster membership. Thus, it greatly reduces the need of computer resource and computation time.
The narrative that follows assumes that you have already created a pipeline in Model Studio. Pipelines are structured flows of analytic actions. These analytic actions are represented as individual nodes in a pipeline. (Learn more about Building Models with SAS Model Studio | SAS Viya Quick Start Tutorial.)
You may choose to score your data in Model Studio or outside of it. This may depend on the size of your scoring table, scoring environment and / or your preference of using a GUI based scoring or write your own program.
To score data in Model Studio, you don’t necessarily need to have the score code. Just connect a Score Data node to Clustering node as shown below:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The Score Data node is a Miscellaneous node that enables you to score a data table with the score code that was generated by the predecessor nodes in the pipeline. The scored table can be saved or promoted to a CAS library. This is a straightforward approach and doesn’t require any prior knowledge of coding.
For any reasons, if you wish to score data outside of Model Studio, say in SAS Studio then you need to have the score code in first place. Currently, in Viya 4 the Clustering node results does not include the score code. To obtain score code from Clustering node, you have following options –
Use a SAS code node to simulate a column of target predictions and move the SAS Code node to the Supervised Learning group. The steps to follow guide you through the process of extracting the score code and finally scoring the data in SAS Studio.
Now in the newly added SAS Code node, you can include code to simulate a column of target predictions.
Note: If you do not have a true target in your data, you can either create a pseudo one or use another variable in your data set that is not used as an input for the Clustering node. In PVA data set (used in this example) a target variable (Target_B) is already present.
You now can deploy your score code in various ways (register, publish, download from the Pipeline Comparison tab), just as you would for a supervised model.
When you download the score code from SAS Code node, the resulting zip file will be saved on the client computer and location depends on the browser used. For example, Google chrome will save the zip file to the system download folder on client machine. This zip file contains epscore code sas file that will be referenced while scoring a data set in SAS Studio.
Note that the score code is accessible in the Path EP score code window. It displays the SAS code that was created by the node if there are analytic stores that are generated in the pipeline. The score code can be used outside the Model Studio environment to score new data. The xxxxxxx.setKey in the method init method block contains a string that identifies an analytic store. In this case, the astore file '_B85BU2NJNVFZH8XF74QX4G6O5’ can be located in the Models library of your CAS server. The string will be different in your case
In the Models library, it is saved as ‘_B85BU2NJNVFZH8XF74QX4G6O5_AST.sashdat’.
Note: - The astore file automatically gets saved in Models library only when you choose to download the score code.
This analytic store binary table is combined with data in PROC ASTORE to perform scoring in SAS Studio.
%let homedir=%sysget(HOME); %put &homedir;
cas;
caslib _all_ assign;
proc casutil;
load casdata= “_B85BU2NJNVFZH8XF74QX4G6O5_AST.sashdat"
incaslib="Models" casout="cluster_astore" outcaslib=casuser;
quit;
proc astore;
score data=casuser.pva
rstore=casuser.cluster_astore
epcode= '/greenmonthly-export/ssemonthly/homes/a.b@sas.com/dmcas_epscorecode.sas'
out=casuser.cluster_scored;
run;
A snapshot of the output table is produced below. It contains _CLUSTER_ID_ column that holds the cluster membership of each record. Also, note the IMP_DemAge column that shows the imputed column for DemAge variable. It is this data preprocessing step (imputation in this example) that is accomplished through the epscore code file while scoring a new data.
To obtain the score code of a Clustering node in Viya 4, you can connect a Score Data node to the Clustering node and run your pipeline. Your completed pipeline should resemble the following:
The steps to follow shows how to extract score code from Score Data node and perform scoring in SAS Studio.
%let homedir=%sysget(HOME); %put &homedir;
cas;
caslib _all_ assign;
proc cas;
table.copyTable /
casout={promote=true,name="cluster_ast",
caslib="CASUSER"}
table={name="_B85BU2NJNVFZH8XF74QX4G6O5_AST",
caslib="Analytics_Project_6c4541e7-b4d1-412c-ad79-ffa8617ab294"};
run;
proc astore;
score data=casuser.pva
rstore=casuser.cluster_ast
epcode= '/greenmonthly-export/ssemonthly/homes/a.b@sas.com/Path EP Score Code.txt'
out=casuser.cluster_scored1;
run;
A snapshot of the output table is produced below:
Of the two options discussed for scoring data in SAS Studio, I find the first method to be efficient and less error prone. This is because the second method requires the name of project caslib which is tricky to find out. Also, there are chances that users may inadvertently delete the files within the project’s caslib directory.
Find more articles from SAS Global Enablement and Learning here.
Nice article, Manoj!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.