The SAS Code Node in SAS Model Studio versus SAS Enterprise Miner

5 Likes

The SAS Code node in SAS Model Studio is extremely versatile. If you want to accomplish something that can’t easily be done with the out-of-the-box Model Studio nodes, try using the SAS Code node! It can be used to run pretty much any SAS code. This blog will illustrate using the SAS Code node for:

Pre-processing
Supervised Learning
Post-processing

SAS Code Node Interface

Some of you will remember the SAS Code Node in SAS Enterprise Miner. The SAS Code Node in SAS Model Studio is fairly similar, in that you run SAS code in a node within your pipeline. The interface, however, is quite a bit different.

In SAS Model Studio, you will find the SAS Code Node is found under “Miscellaneous.” After you add the SAS Code Node to your pipeline and the node is selected, you will see on the right a button to Open code editor.

Within the code editor are two panes:

Training code
Scoring code

The SAS Code Node does NOT change the original training data set in the pipeline. Therefore, if you want to change values of a variable, create new variables, or delete observations, you need to do this in the Scoring Code pane.

For you Enterprise Miner users, recall that EM also had a Training Code pane and Score Code pane.

Training code pane

Score code pane

SAS Code Node processing

Some of you remember the SAS code node in SAS Enterprise Miner. Processing works differently in SAS Model Studio from how it works in SAS Enterprise Miner. Below are a few differences between how the SAS code node works in SAS Enterprise Miner versus SAS Model Studio:

SAS CODE NODES EXAMPLES Let’s look at some examples using the SAS Code Node:

Pre-processing
Supervised Learning
Post-processing

Preprocessing Example

Let’s say you want to accomplish some pre-processing in the SAS Code Node. Examples might include:

Engineering features. For example you might want to:
- Create a new variable that is some function of a variable that already exists in your pipeline. Remember that new variables must be created in the Scoring Code pane.
- Selectively apply imputations or transformations. For example, you could write logic to:
  1. Calculate each input variable’s skewness of inputs.
  2. Log transform those input variables that have a skewness > 3.14.
Select only a subset of the data, for example, only include observations where home values are greater than 300,000.
Modify the metadata, for example, you might want to change a variable’s role or level. You CANNOT, however, change the target role.

Remember that some of this pre-processing can be done from Model Studio’s Data Tab or with the Manage Variables node. Check first to see if the functionality you need is available there, before you reinvent the wheel.

In order to change metadata, use %dmcas_metaChange. For example, as shown below, we set VALUE (home value) to REJECTED. This variable will not be deleted from the data set, but will not be used in modeling. We set NewValue to INPUT, so that NewValue will be considered as an input variable in our models.

Remember! Modification of the data that you want to pass on to subsequent nodes and to publishing must be done within the scoring code.

Below is the simple code example.

/* Training Code */

/* Replace variable VALUE with NewValue */

%dmcas_metaChange(NAME= VALUE, ROLE=REJECTED, LEVEL=INTERVAL);

%dmcas_metaChange(NAME= NewValue, ROLE=INPUT, LEVEL=INTERVAL);

/* Scoring Code */

length NewValue 8;

if 'VALUE'n < 100000 then NewValue = VALUE * 2;

else NewValue = VALUE * 2.1;

Supervised Learning Example

There may be a supervised learning algorithm or options that you cannot accomplish with the existing nodes. The SAS Code Node lets you use SAS code to run any supervised learning algorithm (or option). Once you have created the SAS Code Node you can Move it to Supervised Learning, and then it is treated as any other supervised learning node. It will be compared to the other models in your model comparison node and you can publish, deploy, etc. the model. You can even get interpretability graphs, such as PD plots, LIME, etc!

/* Training Code */

proc gradboost data=&dm_data
numBin=20 maxdepth=6 maxbranch=2 minleafsize=5
minuseinsearch=1 ntrees=10 learningrate=0.1 samplingrate=0.5 lasso=0 ridge=0 seed=1234;
%if &dm_num_interval_input %then %do;
input %dm_interval_input / level=interval;
%end;

%if &dm_num_class_input %then %do;
input %dm_class_input/ level=nominal;
%end;

%if “&dm_dec_level”=”INTERVAL” %then %do;
target %dm_dec_target / level=interval ;
%end;

%else %do;

target %dm_dec_target / level=nominal;
%end;

&dm_partition_statement;
ods output
VariableImportance = &dm_lib..VarImp
Fitstatistics = &dm_data_outfit
;

savestate rstore=&dm_data_rstore;
run;

%dmcas_report(dataset=VarImp, reportType=Table, description=%nrbquote(Variable Importance));
%dmcas_report(dataset=VarImp, reportType=BarChart, category=Variable, response=RelativeImportance, description=%nrbquote(Relative Importance Plot));
run;

Post-Processing Example

During post-processing you may wish to:

summarize data
create tailored graphs or tables from modeling results using dmcas_report macro
generate ODS output

/* Training Code */

data &dm_lib..bethsamp;
set &dm_data(obs=500);
residual = BAD1 – P_BAD1;
run;

%dmcas_report(dataset=bethsamp,
reportType=ScatterPlot,
x=P_BAD1,
y=residual,
description=%nrbquote(Scatter Plot),
yref=0);

Macros and macro variables

Similar to SAS Enterprise Miner, SAS Model Studio has a bunch of pre-built macros and macro variables included in the software out-of-the-box.

Macros and Macro Variables in General

Macros

Start with %
Can be:
- Global or
- Local (defined inside a macro and used inside a macro)
Macro examples:

%LET iterations = 10;

%LET singer = Taylor Swift;

Macro Variables

Macro variables
- Start with &
- Let you substitute into your program
- Could be a variable name, a numeral, or any text string
Macro variables example:

do i = 1 to &iterations;

Title “Performed by &singer”;

Creating your own macros:

MACRO EXAMPLE

%MACRO printYearlyElectric (datayear=);
proc print data = bethdata.electric&datayear;
title “Electricity Generation Data for &datayear”;
var generation_mwh energysource state
oilproduction oilimports price;
format price dollar6.;
run;
%MEND printYearlyElectric;

TO INVOKE MACRO

%printYearlyElectric (datayear=2020)

Macros and Macro Variables in EM and VDMML

In SAS Enterprise Miner and SAS Model Studio, use the macro variables and the variables macros to reference information about:

imported data sets
target and input variables
exported data sets
files that store the scoring code
et cetera

Use the utility macros to manage data and format output. Utility macros accept arguments.

EM versus VDMML Macros and Macro Variables

More extensive list of SAS Code Node macros:

Extensive list of SAS Code Node macro variables:

Summary:

The SAS Code node lets you bring the flexibility of your own SAS code into your SAS Model Studio pipeline. The SAS Code node extends SAS Model Studio’s functionality by including any SAS procedure or any SAS DATA steps. Examples of tasks you can accomplish with the SAS Code node (and these are just a few examples) are:

Create custom score code
Munge data
Build custom predictive models
Format SAS output
Create graphs, plots, and tables to meet your exact specifications
Modify metadata of individual variables

Data exported by the SAS Code node can be used by subsequent nodes in your SAS Model Studio pipeline.

Keep in mind that there are already a ton of tasks available to you with the out-of-the-box nodes in SAS Model Studio. If you can’t find exactly what you need there, well, I’m not saying you’re high maintenance, but hey…if the expensive Italian leather shoe fits, then wear it! I’m not here to judge!

But seriously, sometimes you just need to do something that’s not already built into a node. This is where the complete versatility of the SAS Code Node will really help you out!

For More Information:

Selected examples in github

Model Studio Documentation

SAS Code Node in Enterprise Miner

Macro Primers

SAS Communities Library