BookmarkSubscribeRSS Feed

SAS High-Performance Analytics tip #4: Scoring with SAS Enterprise Miner

Started ‎02-24-2016 by
Modified ‎02-01-2017 by
Views 4,440

In this tip we cover scoring, which is key for model deployment. The goal of any model building process where you pick a champion model using training data is to make accurate predictions on unseen or new data. To achieve this goal, we need to accumulate the score code from all the nodes in the flow (that perform imputation, transformation, modeling and so on) and utilize it to score new data.

 

Let us back up a little and start by defining “Scoring” – in supervised learning, it is the process of computing the predicted value of the target, on new data, using a previously built model. When working with big data and High-Performance Data Mining (HPDM) nodes, remember that model training happens in the high-performance analytics (HPA) environment, and so does scoring of new data.

 

When a process flow diagram in SAS Enterprise Miner includes a HPDM modeling node, applying the flow's score code to the data that resides in the distributed environment is supported by SAS Scoring Accelerator, either directly or via SAS Model Manager. There are several ways to accomplish this when you attach a Score node to the end of a flow:

 

  • Create a model package from the Score node and register it to the SAS Metadata Repository to be integrated into SAS Model Manager, which then publishes models to the distributed environment with SAS Scoring Accelerator.
  • Connect the Score node to a Register Model node (available in releases 13.1 and later) to register the model to the SAS Metadata Repository to be integrated into SAS Model Manager, which then publishes models to the distributed environment with SAS Scoring Accelerator.
  • Connect the Score node to a Score Code Export node to export the score files to a user-specified directory that SAS Scoring Accelerator can directly use to score data in the distributed environment. 

 

The table below lists the types of score code created by each HPDM node in SAS Enterprise Miner 14.1.

Node

SAS DATA Step

SAS Program

PMML

C

Java

Database (Teradata, Greenplum, DB2, Netezza, Oracle, Hadoop)

HP BN Classifier

Y

N

N

N

N

Y

HP Cluster

Y

N

N

N

N

Y

HP Data Partition

*

*

*

*

*

*

HP Explore

*

*

*

*

*

*

HP Forest

N

Y

N

N

N

N

HP GLM

Y

N

N

N

N

Y

HP Impute

Y

N

N

N

N

Y

HP Neural

Y

N

N

N

N

Y

HP Principal Components

Y

N

N

N

N

Y

HP Regression

Y

N

N

N

N

Y

HP SVM

Y**

N**

N

N

N

Y**

HP Text Miner

N

Y

N

N

N

N

HP Transform

Y

N

N

N

N

Y

HP Tree

Y

N

N

N

N

Y

HP Variable Selection

*

*

*

*

*

*

* The node does not produce this type of score code                                                                                                      

** The HP SVM node produces SAS DATA Step score code when the Optimization Method property is set to Interior Point.  Otherwise, for Active Set optimization, it produces non-DATA step (SAS Program) score code.

 

The HP Forest and HP SVM (Support Vector Machine) nodes do not always produce SAS DATA Step code and thus need further explanation about their scoring methodology.

 

HP Forest node

HP Forest is an ensemble algorithm that aggregates results from potentially hundreds of decision tress. Since the SAS Data Step code becomes unmanageable when encoding rules from such large number of trees, an alternate procedure called HP4SCORE was developed to support the scoring of HP Forest node. The following tip explains the steps involved in applying score code when HP Forest node is part of the flow.

 

Tip: How Can I Apply HP Forest Score Code to Distributed Data with SAS Enterprise Miner 13.2

 

HP SVM node

In SAS Enterprise Miner, the HP SVM node provides two types of optimization methods – Interior Point and Active Set. When this node is run in Massively Parallel Processing (MPP) mode, it supports Interior Point optimization only and produces SAS Data Step code similar to other nodes. The Active Set method supported in Symmetric Multiprocecssing (SMP) mode does not generate SAS Data Step code. Instead, it generates three data sets (Outclass, Outfit and Outest) that are later used by SVMSCORE procedure for scoring new data.

 

The next tip will wrap up scoring by explaining the Analytic Store (or ASTORE) format introduced in SAS Enterprise Minter 14.1 release to support scoring of complex models like Forest and SVM.

 

Earlier tips in this series are available at:

Version history
Last update:
‎02-01-2017 07:33 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags