turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Communities Library
- /
- Create ROC Curves & Lift Charts for Models Built i...

- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

Labels:

SAS Data Mining and Machine Learning (DMML) on Viya includes a procedure for assessing model performance called PROC ASSESS. You can take the output data set generated by PROC ASSESS and use PROC SGPANEL to create ROC curves or lift charts. This gives you plots similar to what you would see generated by Enterprise Miner’s Model Comparison node. I built the graph below (and all of the graphs in this blog) in SAS Studio, but you will notice that it looks very similar to an Enterprise Miner graph.

An ROC (receiver operating characteristic) curve lets you compare different model results. The true positive rate (sensitivity) is plotted on the vertical axis, and the false positive rate (1 minus specificity) is plotted on the horizontal axis. The better the model performance, the farther up and to the left the curve will be, maximizing the true positive rate and minimizing the false positive rate. Above we see that the gradient boosting model performed best (green line), followed by the random forest model (red line). The logistic model (blue line) performed the worst of the three models.

**1. Start with the Supervised Learning Snippet (ships with the software)**

We will start in SAS Studio on Viya by expanding the Snippets tab. Then navigate to Snippets/Machine Learning and double-clicking on the Supervised Learning snippet to add code to your SAS Studio code pane, as shown below.

The Supervised Learning snippet code uses the SAS sample data HMEQ, which is a home equity loan historic data set used to train models. The target variable is BAD. If the individual defaulted on their loan, BAD = 1.

If we scroll to the bottom of this snippet, we will see that code already exists to create an ROC curve and list chart for a single model. But we want to create instead an ROC curve and lift chart that compares multiple models. So we will delete the code from line 166 to the end of the snippet.

**2. Add Two Models**

In addition to the random forest model, which is already included in the snippet, we will run two more models. First, we run and score a gradient boosting model using PROC GRADBOOST as follows:

`/****************************************************/`

/* Build a predictive model using Gradient Boosting */

/****************************************************/

proc gradboost data=&caslibname.._prepped ntrees=50 intervalbins=20 maxdepth=5;

input &interval_inputs. / level = interval;

input &class_inputs. / level = nominal;

target bad / level = nominal;

partition rolevar=_partind_(train='1' validate='0');

code file="&outdir./gradboost.sas";

run;

`/********************************************/`

/* Score the data using the generated model */

/********************************************/

data &caslibname.._scored_gradboost;

set &caslibname.._prepped;

%include "&outdir./gradboost.sas";

run;

Next, we run and score a logistic regression model using PROC LOGSELECT and a data step as follows:

`/******************************************************/`

/* Build a predictive model using Logistic Regression */

/******************************************************/

proc logselect data=&caslibname.._prepped;

class bad &class_inputs.;

model bad(event='1')=&class_inputs. &interval_inputs.;

selection method=forward;

partition rolevar=_partind_(train='1' validate='0');

code file="&outdir./logselect.sas";

run;

`/*********************************************/`

/* Score the data using the generated model */

/********************************************/

data &caslibname.._scored_logselect;

set &caslibname.._prepped;

%include "&outdir./logselect.sas";

p_bad0=1-p_bad;

run;

**3. Assess the Three Models: Random Forest, Gradient Boosting and Logistic Regression**

Now we are ready to assess our three models. We will create a macro to do this, so that we don’t have to write the PROC ASSESS code three times. Our target is the variable BAD.

`/****************************/`

/* Assess model performance */

/****************************/

libname BethWork "/home/sasdemo";

`%macro assess_model (prefix=, var_evt=, var_nevt=);`

`proc assess data=&caslibname.._scored_&prefix.;`

input &var_evt.;

target bad / level=nominal event='1';

fitstat pvar=&var_nevt. / pevent='0';

by _partind_;

ods output fitstat = BethWork.&prefix._fitstat

rocinfo = BethWork.&prefix._rocinfo

liftinfo = BethWork.&prefix._liftinfo;

run;

`%mend assess_model;`

Now we can call the macro, filling in the arguments for the prefix, the variable that indicates the probability of the event loan default (VAR_EVT,) and the variable that indicates the probability of a non-event (NVAR_EVT), i.e., no default.

` `

`%assess_model(prefix=forest, var_evt=p_bad1,var_nevt=p_bad0); %assess_model(prefix=gradboost,var_evt=p_bad1,var_nevt=p_bad0); %assess_model(prefix=logselect,var_evt=p_bad, var_nevt=p_bad0);`

**4. Combine the ROC Results into a Single Data Set**

As shown below, we use PROC FORMAT to create a format to use on the partition indicator. And we combine our data sets, adding a model variable to distinguish results from the logistic, random forest, or gradient boosting model.

`/*******************************************/`

/* Analyze model using ROC and Lift charts */

/*******************************************/

ods graphics on;

proc format;

value partindlbl

0 = 'Validation'

1 = 'Training';

run;

`data BethWork.all_rocinfo;`

set

BethWork.logselect_rocinfo(keep=sensitivity fpr _partind_ in=l)

BethWork.forest_rocinfo (keep=sensitivity fpr _partind_ in=f)

BethWork.gradboost_rocinfo(keep=sensitivity fpr _partind_ in=g);

` length model $ 16;`

select;

when (l) model = 'Logistic';

when (f) model = 'Forest';

when (g) model = 'GradientBoosting';

end;

run;

**5. Create ROC Curves**

Finally we are ready to create some charts. First, we will plot validation and training data together group=_partind_ on a separate graph for each of the 3 models using panelby model. So we get three side by side graphs.

/* Plot Validition and Training Together on a Separate ROC Graph for Each Model */

proc sgpanel data=BethWork.all_rocinfo aspect=1;

panelby model / layout=columnlattice spacing=5;

title "ROC Curve Panel";

rowaxis label="True positive rate" values=(0 to 1 by 0.25) grid offsetmin=0.05 offsetmax=0.05;

colaxis label="False positive rate" values=(0 to 1 by 0.25) grid offsetmin=0.05 offsetmax=0.05;

lineparm x=0 y=0 slope=1 / transparency=0.7;

series x=fpr y=sensitivity /group=_partind_;

format _partind_ partindlbl.;

run;

This gives us the following graph, with validation data results in blue and training data results in red for each model in a different graph.

But perhaps we want to see all of the models on the same graph so that we can easily compare them. We will now change panelby to _partind_ and group=model as shown below.

/* Plot ROC Curves for All Models Together */

proc sgpanel data=BethWork.all_ROCinfo;

panelby _partind_ / layout=columnlattice spacing=5;

title "ROC Curve Models Overlain";

rowaxis label="True Positive Rate";

colaxis label="False Positive Rate" grid;

lineparm x=0 y=0 slope=1 / transparency=0.7;

series x=fpr y=sensitivity / group=model;

format _partind_ partindlbl.;

run;

This gives us the three ROC curves for the random forest model, the gradient boosting model, and the logistic regression model overlain on the same graph as shown below. I have separated the graphs for Validation and Training data.

Maybe we decide that we want to add markers, so we add markers markerattrs=(symbol=circlefilled) as shown in the code and graph below.

/* Plot ROC Curves for All Models Together With Markers */

proc sgpanel data=BethWork.all_ROCinfo;

panelby _partind_ / layout=columnlattice spacing=5;

title "ROC Curve Models Overlain With Markers";

rowaxis label="True Positive Rate";

colaxis label="False Positive Rate" grid;

lineparm x=0 y=0 slope=1 / transparency=0.7;

series x=fpr y=sensitivity / group=model markers markerattrs=(symbol=circlefilled);

format _partind_ partindlbl.;

run;

**6. Combine the Lift Results into a Single Data Set**

Similarly, we can create lift charts. Again we start by combining the data results as shown below.

`/* Create lift charts */`

data BethWork.all_liftinfo;

set BethWork.logselect_liftinfo(keep=depth lift cumlift _partind_ in=l)

BethWork.forest_liftinfo (keep=depth lift cumlift _partind_ in=f)

BethWork.gradboost_liftinfo(keep=depth lift cumlift _partind_ in=g);

`length model $ 16;`

select;

when (l) model = 'Logistic';

when (f) model = 'Forest';

when (g) model = 'GradientBoosting';

end;

run;

**7. Create Lift Charts**

And again we use PROC SGPANEL to create the charts. In the example below, we create separate charts for the validation and training data, but overlay information from each of the three models on each chart.

`proc sgpanel data=BethWork.all_liftinfo;`

panelby _partind_ / layout=columnlattice spacing=5;

title "Lift Chart All 3 Models Overlain";

rowaxis label="Lift";

colaxis label="Depth" grid;

series x=depth y=lift / group=model markers markerattrs=(symbol=circlefilled);

format _partind_ partindlbl.;

run;

`title;`

ods graphics off;

Lift charts indicate how well the model did compared to no model by plotting the ratio between the result predicted by the model and the result using no model. As you can see above, on the vertical axis lift is plotted, and on the horizontal axis depth is plotted. Here we see that the gradient boosting model does well at low depths. For example, if we use the model to reject 10% of the loan applicants, we will appropriately reject almost 50% of the default applicants.

In this article I showed you how to use PROC SGPANEL to create ROC curves and lift charts, which make it easy to graphically compare the performance of multiple models created in SAS Studio on Viya. I hope this is helpful!

If you want to learn more about VDMML and the procedures it includes see my other articles: Discover SAS Visual Data Mining and Machine Learning Procedures , Unsupervised Learning in SAS Visual Data Mining and Machine Learning, Regression Methods: Supervised Learning in SAS Visual Data Mining and Machine Learning, Neural Network Models: Supervised Learning in SAS Visual Data Mining and Machine Learning, Tree-related Models: Supervised Learning in SAS Visual Data Mining and Machine Learning, Support Vector Machine Models: Supervised Learning in SAS Visual Data Mining and Machine Learning, Model Assessment in SAS Visual Data Mining and Machine Learning

Your turn

Sign In!

Want to write an article? Sign in with your profile.

Looking for the **Ask the Expert** series? Find it in its new home: communities.sas.com/askexpert.