In this tip we cover scoring, which is key for model deployment. The goal of any model building process where you pick a champion model using training data is to make accurate predictions on unseen or new data. To achieve this goal, we need to accumulate the score code from all the nodes in the flow (that perform imputation, transformation, modeling and so on) and utilize it to score new data.
Let us back up a little and start by defining “Scoring” – in supervised learning, it is the process of computing the predicted value of the target, on new data, using a previously built model. When working with big data and High-Performance Data Mining (HPDM) nodes, remember that model training happens in the high-performance analytics (HPA) environment, and so does scoring of new data.
When a process flow diagram in SAS Enterprise Miner includes a HPDM modeling node, applying the flow's score code to the data that resides in the distributed environment is supported by SAS Scoring Accelerator, either directly or via SAS Model Manager. There are several ways to accomplish this when you attach a Score node to the end of a flow:
The table below lists the types of score code created by each HPDM node in SAS Enterprise Miner 14.1.
Node |
SAS DATA Step |
SAS Program |
PMML |
C |
Java |
Database (Teradata, Greenplum, DB2, Netezza, Oracle, Hadoop) |
HP BN Classifier |
Y |
N |
N |
N |
N |
Y |
HP Cluster |
Y |
N |
N |
N |
N |
Y |
HP Data Partition |
* |
* |
* |
* |
* |
* |
HP Explore |
* |
* |
* |
* |
* |
* |
HP Forest |
N |
Y |
N |
N |
N |
N |
HP GLM |
Y |
N |
N |
N |
N |
Y |
HP Impute |
Y |
N |
N |
N |
N |
Y |
HP Neural |
Y |
N |
N |
N |
N |
Y |
HP Principal Components |
Y |
N |
N |
N |
N |
Y |
HP Regression |
Y |
N |
N |
N |
N |
Y |
HP SVM |
Y** |
N** |
N |
N |
N |
Y** |
HP Text Miner |
N |
Y |
N |
N |
N |
N |
HP Transform |
Y |
N |
N |
N |
N |
Y |
HP Tree |
Y |
N |
N |
N |
N |
Y |
HP Variable Selection |
* |
* |
* |
* |
* |
* |
* The node does not produce this type of score code
** The HP SVM node produces SAS DATA Step score code when the Optimization Method property is set to Interior Point. Otherwise, for Active Set optimization, it produces non-DATA step (SAS Program) score code.
The HP Forest and HP SVM (Support Vector Machine) nodes do not always produce SAS DATA Step code and thus need further explanation about their scoring methodology.
HP Forest is an ensemble algorithm that aggregates results from potentially hundreds of decision tress. Since the SAS Data Step code becomes unmanageable when encoding rules from such large number of trees, an alternate procedure called HP4SCORE was developed to support the scoring of HP Forest node. The following tip explains the steps involved in applying score code when HP Forest node is part of the flow.
Tip: How Can I Apply HP Forest Score Code to Distributed Data with SAS Enterprise Miner 13.2
In SAS Enterprise Miner, the HP SVM node provides two types of optimization methods – Interior Point and Active Set. When this node is run in Massively Parallel Processing (MPP) mode, it supports Interior Point optimization only and produces SAS Data Step code similar to other nodes. The Active Set method supported in Symmetric Multiprocecssing (SMP) mode does not generate SAS Data Step code. Instead, it generates three data sets (Outclass, Outfit and Outest) that are later used by SVMSCORE procedure for scoring new data.
The next tip will wrap up scoring by explaining the Analytic Store (or ASTORE) format introduced in SAS Enterprise Minter 14.1 release to support scoring of complex models like Forest and SVM.
Earlier tips in this series are available at:
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.