BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jorgesribeiro
Calcite | Level 5

Hi 
I need to simulate simulate 4 ROC curves going from 0.15 0.25 0.3 0.35 are above the reference line of a random model.

Could you help me with this ? I need to create these graphs to illustrate a logistic regression model.
Thanks

Jorge Ribeiro

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I don't know what you mean by "going from 0.15 0.25 0.3 0.35", but you can simulate data for a logistic regression model by using the example at "Simulate many samples from a logistic regression model," which is taken from the book Simulating Data with SAS (Wicklin 2013).  There are lots of ways to get distinct ROC curves. One way is to misspecify the model in the simulation.  That's what the following code does: the model is constructed to depend on X1, X2, and X3, but the MODEL statement only specifies X1 and X2.

 

/* program from 
   http://blogs.sas.com/content/iml/2014/06/27/simulate-many-samples-from-a-logistic-regression-model.html */
%let N = 150;                                   /* N = sample size */
%let NumSamples = 4;                            /* number of samples */
proc iml;
call randseed(1);
X = j(&N, 4, 1);                               /* X[,1] is intercept */
/* 1. Read design matrix for X or assign X randomly.
   For this example, x1 ~ U(0,1) and x2 ~ N(0,2)  */
X[,2] = randfun(&N, "Uniform"); 
X[,3] = randfun(&N, "Normal", 0, 2);
error = randfun(&N, "Normal", 0, 1); 
/* Logistic model with parameters {2, -4, 1} */
beta = {2, -4, 1, -1};
 
/* create output data set. Output data during each loop iteration.     */
varNames = {"SampleID" "y"} || ("x1":"x3");           /* 1st col is ID */
Out = j(&N,1) || X;                /* matrix to store simulated data   */
create LogisticData from Out[c=varNames];
y = j(&N,1);                       /* allocate response vector         */
do i = 1 to &NumSamples;           /* the simulation loop              */
   X[,4] = i*error;                    /* magnitude of error */
   eta = X*beta;                       /* 2. linear model */
   mu = logistic(eta);                 /* 3. transform by inverse logit */
   Out[,1] = i;                         /* update value of SampleID    */
   call randgen(y, "Bernoulli", mu);    /* 4. simulate binary response */
   Out[,2] = y;
   append from Out;                     /* 5. output this sample       */
end;                               /* end loop                         */
close LogisticData; 
quit;

proc logistic data=LogisticData noprint outest=PE;
   by SampleId;
   model y(Event='1') = x1 x2 / outroc = outROC;
run;

proc sgplot data=outROC;
   series x=_1MSPEC_ y=_SENSIT_ / group=SampleID;
   lineparm slope=1 x=0 y=0;    /* reference line */
run;

SGPlot5.png

You could do something similar with real data: leave out variables that explain the model to get different ROC curves.

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

I don't know what you mean by "going from 0.15 0.25 0.3 0.35", but you can simulate data for a logistic regression model by using the example at "Simulate many samples from a logistic regression model," which is taken from the book Simulating Data with SAS (Wicklin 2013).  There are lots of ways to get distinct ROC curves. One way is to misspecify the model in the simulation.  That's what the following code does: the model is constructed to depend on X1, X2, and X3, but the MODEL statement only specifies X1 and X2.

 

/* program from 
   http://blogs.sas.com/content/iml/2014/06/27/simulate-many-samples-from-a-logistic-regression-model.html */
%let N = 150;                                   /* N = sample size */
%let NumSamples = 4;                            /* number of samples */
proc iml;
call randseed(1);
X = j(&N, 4, 1);                               /* X[,1] is intercept */
/* 1. Read design matrix for X or assign X randomly.
   For this example, x1 ~ U(0,1) and x2 ~ N(0,2)  */
X[,2] = randfun(&N, "Uniform"); 
X[,3] = randfun(&N, "Normal", 0, 2);
error = randfun(&N, "Normal", 0, 1); 
/* Logistic model with parameters {2, -4, 1} */
beta = {2, -4, 1, -1};
 
/* create output data set. Output data during each loop iteration.     */
varNames = {"SampleID" "y"} || ("x1":"x3");           /* 1st col is ID */
Out = j(&N,1) || X;                /* matrix to store simulated data   */
create LogisticData from Out[c=varNames];
y = j(&N,1);                       /* allocate response vector         */
do i = 1 to &NumSamples;           /* the simulation loop              */
   X[,4] = i*error;                    /* magnitude of error */
   eta = X*beta;                       /* 2. linear model */
   mu = logistic(eta);                 /* 3. transform by inverse logit */
   Out[,1] = i;                         /* update value of SampleID    */
   call randgen(y, "Bernoulli", mu);    /* 4. simulate binary response */
   Out[,2] = y;
   append from Out;                     /* 5. output this sample       */
end;                               /* end loop                         */
close LogisticData; 
quit;

proc logistic data=LogisticData noprint outest=PE;
   by SampleId;
   model y(Event='1') = x1 x2 / outroc = outROC;
run;

proc sgplot data=outROC;
   series x=_1MSPEC_ y=_SENSIT_ / group=SampleID;
   lineparm slope=1 x=0 y=0;    /* reference line */
run;

SGPlot5.png

You could do something similar with real data: leave out variables that explain the model to get different ROC curves.

StatDave
SAS Super FREQ

Assuming you are talking abou the area under the ROC curve being those amounts above the reference line (where AUC=0.5), you can do pretty well by using the following data set from one of the examples in the PROC LOGISTIC documentation and, as Rick suggests, using various models as shown:

 

data Neuralgia;
input Treatment $ Sex $ Age Duration Pain $ @@;
datalines;
Placebo Female 68 1 No TrtA Female 69 12 No
Placebo Male 66 26 Yes TrtB Male 67 23 No
TrtA Female 71 12 No TrtB Male 77 1 Yes
TrtA Male 71 17 Yes Placebo Female 65 29 No
TrtB Female 66 12 No TrtB Male 75 21 Yes
TrtA Female 64 17 No Placebo Female 70 13 Yes
Placebo Male 70 1 Yes Placebo Female 68 27 Yes
TrtA Female 64 30 No TrtB Male 70 22 No
TrtB Female 78 1 No TrtA Male 67 10 No
TrtB Male 75 30 Yes TrtB Male 80 21 Yes
TrtA Male 70 12 No Placebo Female 67 30 No
TrtB Male 70 1 No TrtB Female 77 16 No
Placebo Male 78 12 Yes TrtB Female 76 9 Yes
Placebo Male 66 4 Yes TrtA Female 69 18 Yes
TrtA Male 78 15 Yes Placebo Female 64 1 Yes
Placebo Female 72 27 No TrtA Female 72 25 No
TrtB Female 65 7 No TrtB Male 59 29 No
Placebo Male 67 17 Yes TrtA Male 69 1 No
Placebo Female 67 1 Yes TrtB Female 69 42 No
TrtA Female 74 1 No Placebo Female 79 20 Yes
TrtB Male 74 16 No TrtB Female 65 14 No
TrtB Female 67 28 No TrtA Male 76 25 Yes
TrtB Female 72 50 No TrtB Female 69 24 No
TrtA Female 63 27 No Placebo Male 60 26 Yes
TrtA Male 62 42 No TrtA Female 67 11 No
Placebo Male 74 4 No TrtA Male 75 6 Yes
TrtB Male 66 19 No Placebo Male 68 11 Yes
TrtA Male 70 28 No TrtA Male 65 15 No
Placebo Male 83 1 Yes Placebo Female 72 11 Yes
Placebo Male 77 29 Yes TrtA Female 69 3 No
;
ods select roccurve(persist);
proc logistic data=Neuralgia plots(only)=roc;
class Treatment Sex;
model Pain(event="No") = Treatment Sex Treatment*Sex Age Duration ;
run;
proc logistic data=Neuralgia plots(only)=roc;
class Treatment Sex;
model Pain(event="No") = Treatment Sex Treatment*Sex ;
run;
proc logistic data=Neuralgia plots(only)=roc;
class Treatment Sex;
model Pain(event="No") = Treatment ;
run;
proc logistic data=Neuralgia plots(only)=roc;
class Treatment Sex;
model Pain(event="No") = Sex ;
run;
ods select all;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1592 views
  • 0 likes
  • 3 in conversation