- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I am using SAS EG 8.3 Update 3 (8.3.3.181) (32-bit) to estimate three prediction models.
Each model is based on the same outcome, but each model is based on different types of models (ANOVA fixed effect, a multilevel model, and a decision tree machine learning model). Each of the models are also based on the same dataset but the models are based on different variables (see code below)
I can successfully construct each model, and for two of the models (ANOVA fixed effect and multilevel) I can get a ROC plot using the plots(only)=roc statement. The decision tree output by default returns a ROC plot. For a more convenient comparison in a research paper I would like the three ROC plots to be in the same figure so that it is easy for the reader to compare each graph.
The closest thing to a solution to this issue I could find online was this one comparing a test and a validation dataset. In other words not completely the same issue as mine in which I try to compare three models based on the same data, but fairly close: https://support.sas.com/kb/52/973.html
My SAS skill level is not high enough to understand how I should moderate the suggested code to get a plot of the three models I wish to compare. Therefore, I hope someone can help me do this based on the code and the referenced test data.
Below I have inserted my code, based on test data I found online because I don't have access to the data that I will be using for the real analysis (a colleague will get the code).
Best,
Jacob
/***************************************************************************/
/* Import data for three models (ANOVA, multilevel and decision tree) */
/***************************************************************************/
* test data used ;
* https://pages.stern.nyu.edu/~wgreene/Econometrics/oldPanelDataSets.htm ;
* I have added some variables Inst and Org3 to prepare analysis for the actual analysis
* Inst and Org3 represent two organisational levels (class variables) for the multilevel model in which I just inserted 5 random values (1,2,3,4,5) and (a,b,c,d,e) in each;
LIBNAME OUT "/sasfolders/user/jb/sasuser.v94" ;
PROC IMPORT
DATAFILE="H:/sas_predict/farm.xlsx"
OUT=work.farm
DBMS=XLSX;
SHEET='dairy';
GETNAMES=YES;
RUN;
proc contents data=work.farm; run;
* sort data;
proc sort data=work.farm;
by farm year;
run;
* milk median;
proc means data=work.farm median;
var milk;
run;
data work.FARM;
set work.FARM;
milk_dummy=.;
run;
data work.farm;
set work.farm;
if milk > 110236 then milk_dummy=1;
else milk_dummy=0;
run;
/***************************************************************************/
/* ANOVA fixed effect model */
/***************************************************************************/
Title 'ANOVA';
ODS word file="farm_predict.docx";
proc panel data=work.farm;
id farm year;
model milk_dummy =
cows land labor feed
/FIXTWO vcomp=fb ;
OUTPUT OUTOUT=out.p;
RUN;
ods word close;
* Plotting ROC curve for ANOVA fixed effect;
* https://www.youtube.com/watch?v=EKN17dhtC0E;
Title 'Model fit including ROC curve for the model';
ods graphics on;
ODS word file="farm_predict_roc.docx";
proc logistic data=work.farm desc plots(only)=roc;
class farm year;
model milk_dummy =
cows land labor feed;
run;
ods word close;
quit;
/*************************************************************************/
/* Multilevel model */
/*************************************************************************/
Title 'Multilevel';
ODS word file="farm_predict_multilevel.docx";
proc mixed data=work.farm;
class Inst org3; * 'farm' havde ikke organisatoriske niveaudata, så jeg har lagt fiktive Inst og org3 ind;
Model milk_dummy = Inst org3;
LSMeans Inst/pdiff cl adjust=Tukey;
CONTRAST "2-0" org3 0;
RUN;
ods word close;
*
Title 'ROC curve for multilevel model';
ods graphics on;
proc logistic data=work.farm desc plots(only)=roc;
class farm year;
model milk_dummy =
cows land;
run;
quit;
/**********************************************************************************/
/* decision tree prædiktionsmodel
/**********************************************************************************/
/* https://www.youtube.com/watch?v=ps3ruAk-DNI */
DATA desicion; set work.farm;
PROC SORT; BY farm year;
* ODS = Output Delivery System - manages displays in html;
Title 'Decision tree model';
ods graphics on;
ODS word file="farm_predict_decisiontree.docx";
proc hpsplit seed=13289; /* hpsplit for tree-based statistical models. Seed is a pseudo-random number generator */
class milk_dummy2 farm year cows land labor;
model milk_dummy2 = farm year cows land labor;
grow entropy;
prune costcomplexity;
RUN;
ods word close;
/************************************************************************/
/* Multiple ROC curves in one plot - ROCPLOT */
/************************************************************************/
* a guide for two ROC curves based on training and validation data:
* https://support.sas.com/kb/52/973.html ;
* what I want to do is not compare different sets of data but compare different prediction models;
*Outcome = MULTILEVEL;
proc logistic data = work.farm descending;
class org3;
model milk_dummy = Inst org3 / outroc=ROC1;
run; quit;
*Outcome = ANOVA FIXED EFFECT;
proc logistic data = work.farm descending;
model milk_dummy = cows land labor feed / outroc=ROC2;
*Outcome = DECISION TREE;
proc logistic data = work.farm descending;
model milk_dummy = farm year cows land labor / outroc=ROC3;
run; quit;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The basic ideas is in this note. You need to create a data set that contains your actual binary responses and 3 variables which are the predicted event probabilities from each of the 3 models. Then in PROC LOGISTIC, specify the 3 variables in 3 ROC statements, each using one of the variables in the PRED= option. Then specify the ROCCONTRAST statement to compare and plot them. For example:
proc logistic data=allpreds;
model milk_dummy(event='1')= / nofit;
roc 'model 1' pred=p1;
roc 'model 2' pred=p2;
roc 'model 3' pred=p3;
roccoontrast;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The basic ideas is in this note. You need to create a data set that contains your actual binary responses and 3 variables which are the predicted event probabilities from each of the 3 models. Then in PROC LOGISTIC, specify the 3 variables in 3 ROC statements, each using one of the variables in the PRED= option. Then specify the ROCCONTRAST statement to compare and plot them. For example:
proc logistic data=allpreds;
model milk_dummy(event='1')= / nofit;
roc 'model 1' pred=p1;
roc 'model 2' pred=p2;
roc 'model 3' pred=p3;
roccoontrast;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content