BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jbrau123
Obsidian | Level 7

Hello 

 

I am using SAS EG 8.3 Update 3 (8.3.3.181) (32-bit) to estimate three prediction models. 

 

Each model is based on the same outcome, but each model is based on different types of models (ANOVA fixed effect, a multilevel model, and a decision tree machine learning model). Each of the models are also based on the same dataset but the models are based on different variables (see code below)

 

I can successfully construct each model, and for two of the models (ANOVA fixed effect and multilevel) I can get a ROC plot using the plots(only)=roc statement. The decision tree output by default returns a ROC plot. For a more convenient comparison in a research paper I would like the three ROC plots to be in the same figure so that it is easy for the reader to compare each graph. 

 

The closest thing to a solution to this issue I could find online was this one comparing a test and a validation dataset. In other words not completely the same issue as mine in which I try to compare three models based on the same data, but fairly close: https://support.sas.com/kb/52/973.html

My SAS skill level is not high enough to understand how I should moderate the suggested code to get a plot of the three models I wish to compare. Therefore, I hope someone can help me do this based on the code and the referenced test data.

 

Below I have inserted my code, based on test data I found online because I don't have access to the data that I will be using for the real analysis (a colleague will get the code). 

 

Best,

Jacob

 

 

/***************************************************************************/
/*   Import data for three models (ANOVA, multilevel and decision tree)    */     
/***************************************************************************/

* test data used ;
* https://pages.stern.nyu.edu/~wgreene/Econometrics/oldPanelDataSets.htm ;

* I have added some variables Inst and Org3 to prepare analysis for the actual analysis
* Inst and Org3 represent two organisational levels (class variables) for the multilevel model in which I just inserted 5 random values (1,2,3,4,5) and (a,b,c,d,e) in each;

LIBNAME OUT  "/sasfolders/user/jb/sasuser.v94" ;


PROC IMPORT
	DATAFILE="H:/sas_predict/farm.xlsx"
	OUT=work.farm
	DBMS=XLSX;
	SHEET='dairy';
	GETNAMES=YES;
RUN;

proc contents data=work.farm; run;


* sort data; 
proc sort data=work.farm;
by farm year;
run;

* milk median;
proc means data=work.farm median;
    var milk;
run;

data work.FARM;
set work.FARM;
milk_dummy=.;
run;

data work.farm;
set work.farm;
	if milk > 110236 then milk_dummy=1;
	else milk_dummy=0;
	run;


/***************************************************************************/
/*                        ANOVA fixed effect model                         */
/***************************************************************************/

Title 'ANOVA';
ODS word file="farm_predict.docx";
	proc panel data=work.farm;
	id farm year;
	model milk_dummy = 
	cows land labor feed
 	/FIXTWO vcomp=fb ; 
	OUTPUT OUTOUT=out.p;
RUN;
ods word close;

* Plotting ROC curve for ANOVA fixed effect;
* https://www.youtube.com/watch?v=EKN17dhtC0E;

Title 'Model fit including ROC curve for the model';
ods graphics on;
ODS word file="farm_predict_roc.docx";
proc logistic data=work.farm desc plots(only)=roc;
class farm year;
model milk_dummy = 
	cows land labor feed; 
run;
ods word close;
quit;

/*************************************************************************/
/*                         Multilevel model                              */
/*************************************************************************/

Title 'Multilevel';
ODS word file="farm_predict_multilevel.docx";
proc mixed data=work.farm;
class Inst org3; * 'farm' havde ikke organisatoriske niveaudata, så jeg har lagt fiktive Inst og org3 ind;
Model milk_dummy = Inst org3;
LSMeans Inst/pdiff cl adjust=Tukey;
CONTRAST "2-0" org3 0;
RUN;
ods word close;

* 
Title 'ROC curve for multilevel model';
ods graphics on;
proc logistic data=work.farm desc plots(only)=roc;
class farm year;
model milk_dummy = 
	cows land; 
run;
quit;


/**********************************************************************************/
/*                    decision tree prædiktionsmodel                    
/**********************************************************************************/

/* https://www.youtube.com/watch?v=ps3ruAk-DNI */
DATA desicion; set work.farm;
PROC SORT; BY farm year;

* ODS = Output Delivery System - manages displays in html;
Title 'Decision tree model';
ods graphics on; 
ODS word file="farm_predict_decisiontree.docx";
proc hpsplit seed=13289; /* hpsplit for tree-based statistical models. Seed is a pseudo-random number generator */
class milk_dummy2 farm year cows land labor;
model milk_dummy2 = farm year cows land labor;
grow entropy;

prune costcomplexity;

RUN;
ods word close;


/************************************************************************/
/*                Multiple ROC curves in one plot - ROCPLOT             */
/************************************************************************/

* a guide for two ROC curves based on training and validation data:
* https://support.sas.com/kb/52/973.html ;
* what I want to do is not compare different sets of data but compare different prediction models;

*Outcome = MULTILEVEL;
proc logistic data = work.farm descending;
 class org3;
 model milk_dummy = Inst org3 / outroc=ROC1;
 run; quit;
*Outcome = ANOVA FIXED EFFECT;
proc logistic data = work.farm descending;
 model milk_dummy = cows land labor feed / outroc=ROC2;
*Outcome = DECISION TREE;
proc logistic data = work.farm descending;
 model milk_dummy = farm year cows land labor / outroc=ROC3;
run; quit;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The basic ideas is in this note. You need to create a data set that contains your actual binary responses and 3 variables which are the predicted event probabilities from each of the 3 models. Then in PROC LOGISTIC, specify the 3 variables in 3 ROC statements, each using one of the variables in the PRED= option. Then specify the ROCCONTRAST statement to compare and plot them. For example:

proc logistic data=allpreds;
model milk_dummy(event='1')= / nofit;
roc 'model 1' pred=p1;
roc 'model 2' pred=p2;
roc 'model 3' pred=p3;
roccoontrast;
run;

View solution in original post

2 REPLIES 2
StatDave
SAS Super FREQ

The basic ideas is in this note. You need to create a data set that contains your actual binary responses and 3 variables which are the predicted event probabilities from each of the 3 models. Then in PROC LOGISTIC, specify the 3 variables in 3 ROC statements, each using one of the variables in the PRED= option. Then specify the ROCCONTRAST statement to compare and plot them. For example:

proc logistic data=allpreds;
model milk_dummy(event='1')= / nofit;
roc 'model 1' pred=p1;
roc 'model 2' pred=p2;
roc 'model 3' pred=p3;
roccoontrast;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 356 views
  • 4 likes
  • 2 in conversation