SAS High-Performance Analytics tip #4: Scoring with SAS Enterprise Miner

4 Likes

In this tip we cover scoring, which is key for model deployment. The goal of any model building process where you pick a champion model using training data is to make accurate predictions on unseen or new data. To achieve this goal, we need to accumulate the score code from all the nodes in the flow (that perform imputation, transformation, modeling and so on) and utilize it to score new data.

Let us back up a little and start by defining “Scoring” – in supervised learning, it is the process of computing the predicted value of the target, on new data, using a previously built model. When working with big data and High-Performance Data Mining (HPDM) nodes, remember that model training happens in the high-performance analytics (HPA) environment, and so does scoring of new data.

When a process flow diagram in SAS Enterprise Miner includes a HPDM modeling node, applying the flow's score code to the data that resides in the distributed environment is supported by SAS Scoring Accelerator, either directly or via SAS Model Manager. There are several ways to accomplish this when you attach a Score node to the end of a flow:

Create a model package from the Score node and register it to the SAS Metadata Repository to be integrated into SAS Model Manager, which then publishes models to the distributed environment with SAS Scoring Accelerator.

Connect the Score node to a Register Model node (available in releases 13.1 and later) to register the model to the SAS Metadata Repository to be integrated into SAS Model Manager, which then publishes models to the distributed environment with SAS Scoring Accelerator.

Connect the Score node to a Score Code Export node to export the score files to a user-specified directory that SAS Scoring Accelerator can directly use to score data in the distributed environment.

The table below lists the types of score code created by each HPDM node in SAS Enterprise Miner 14.1.

Node	SAS DATA Step	SAS Program	PMML	C	Java	Database (Teradata, Greenplum, DB2, Netezza, Oracle, Hadoop)
HP BN Classifier	Y	N	N	N	N	Y
HP Cluster	Y	N	N	N	N	Y
HP Data Partition	*	*	*	*	*	*
HP Explore	*	*	*	*	*	*
HP Forest	N	Y	N	N	N	N
HP GLM	Y	N	N	N	N	Y
HP Impute	Y	N	N	N	N	Y
HP Neural	Y	N	N	N	N	Y
HP Principal Components	Y	N	N	N	N	Y
HP Regression	Y	N	N	N	N	Y
HP SVM	Y**	N**	N	N	N	Y**
HP Text Miner	N	Y	N	N	N	N
HP Transform	Y	N	N	N	N	Y
HP Tree	Y	N	N	N	N	Y
HP Variable Selection	*	*	*	*	*	*

* The node does not produce this type of score code

** The HP SVM node produces SAS DATA Step score code when the Optimization Method property is set to Interior Point. Otherwise, for Active Set optimization, it produces non-DATA step (SAS Program) score code.

The HP Forest and HP SVM (Support Vector Machine) nodes do not always produce SAS DATA Step code and thus need further explanation about their scoring methodology.

HP Forest node

HP Forest is an ensemble algorithm that aggregates results from potentially hundreds of decision tress. Since the SAS Data Step code becomes unmanageable when encoding rules from such large number of trees, an alternate procedure called HP4SCORE was developed to support the scoring of HP Forest node. The following tip explains the steps involved in applying score code when HP Forest node is part of the flow.

Tip: How Can I Apply HP Forest Score Code to Distributed Data with SAS Enterprise Miner 13.2

HP SVM node

In SAS Enterprise Miner, the HP SVM node provides two types of optimization methods – Interior Point and Active Set. When this node is run in Massively Parallel Processing (MPP) mode, it supports Interior Point optimization only and produces SAS Data Step code similar to other nodes. The Active Set method supported in Symmetric Multiprocecssing (SMP) mode does not generate SAS Data Step code. Instead, it generates three data sets (Outclass, Outfit and Outest) that are later used by SVMSCORE procedure for scoring new data.

The next tip will wrap up scoring by explaining the Analytic Store (or ASTORE) format introduced in SAS Enterprise Minter 14.1 release to support scoring of complex models like Forest and SVM.

Earlier tips in this series are available at: