BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Dear all,

I'm currently a little confused with the association statistic produced by the logistic procedure in SAS 9.2. If I ask for the ROC plot and I activate ODS GRAPHICS I receive slightly different values for the association statistics (e.g., c-value, Sommer's D, etc) as if I turn off the graphics.

For illustration, please find attached an example data set (SAS/STAT UG Example 51.2) and code with and without ROC curve. If you compare the association statistics between both codes, you'll see the slight differences in the output of the association statistics. In my "real life" data set the difference is much more pronounced.

Does anyone of you know where the difference comes from? Is this difference documented, e.g. in the user's guide?

Any help is highly appreciated.

Thanks,
Thorsten

Data Neuralgia;
input Treatment $ Sex $ Age Duration Pain $ @@;
datalines;
P F 68 1 No B M 74 16 No P F 67 30 No
P M 66 26 Yes B F 67 28 No B F 77 16 No
A F 71 12 No B F 72 50 No B F 76 9 Yes
A M 71 17 Yes A F 63 27 No A F 69 18 Yes
B F 66 12 No A M 62 42 No P F 64 1 Yes
A F 64 17 No P M 74 4 No A F 72 25 No
P M 70 1 Yes B M 66 19 No B M 59 29 No
A F 64 30 No A M 70 28 No A M 69 1 No
B F 78 1 No P M 83 1 Yes B F 69 42 No
B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes
A M 70 12 No A F 69 12 No B F 65 14 No
B M 70 1 No B M 67 23 No A M 76 25 Yes
P M 78 12 Yes B M 77 1 Yes B F 69 24 No
P M 66 4 Yes P F 65 29 No P M 60 26 Yes
A M 78 15 Yes B M 75 21 Yes A F 67 11 No
P F 72 27 No P F 70 13 Yes A M 75 6 Yes
B F 65 7 No P F 68 27 Yes P M 68 11 Yes
P M 67 17 Yes B M 70 22 No A M 65 15 No
P F 67 1 Yes A M 67 10 No P F 72 11 Yes
A F 74 1 No B M 80 21 Yes A F 69 3 No
;

proc logistic data=Neuralgia ;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;

ods graphics on;
proc logistic data=Neuralgia plots(only)=(roc ) ;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
ods graphics off;
3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12
That is probably a question for tech support.
Dale
Pyrite | Level 9
Thorsten,

Strange that you should mention it today as I just sent a bug report to SI on Friday about this very issue. I have not yet heard back from them except for the automatically generated form which indicates that they received the report.

If you want to see just how bad the problem can be, take a look at the following code which I sent to SI. Depending on whether or not an ROC curve is requested (including the associated ODS GRAPHICS statement), the AUC reported by PROC LOGISTIC is 0.770 or 0.839 - a huge difference. The skewness of the predictor variable seems to play a role in the discrepancy between AUC estimates. The more skewed is the predictor variable, the more discrepant are the association table statistics.


data test;
  b0 = 0;
  b1 = 0.5;
  b2 = 0.2;
  do i=1 to 1000;
      u1 = rannor(1234579);
      u2 = rannor(1234579);
      ln_x1 = 4*(0.8*u1 + 0.2*u2);
      x1 = exp(ln_x1);
      x2 = 0.2*u1 + 0.8*u2;
      eta = b0 + b1*ln_x1 + b2*x2;
      p = 1 / (1 + exp(-eta));
      y = ranbin(1234579,1,p);
      output;
  end;
  keep x1 ln_x1 x2 y;
run;


proc univariate data=test plot;
  var x1 ln_x1;
run;




title "Predictor is X1 which is highly skewed: logit(p) is linear against log(X1)";
title2 "Results generated when logistic procedure IS NOT sandwiched by ODS GRAPHICS statements";
footnote "Association Statistics table will be different when we sandwich the LOGISTIC procedure";
footnote2 "by ODS GRAPHICS statements AND request that the ROC curve be generated";
proc logistic data=test descending;
  model y = x1;
run;

ods graphics on;
title2 "Results generated when logistic procedure IS sandwiched by ODS GRAPHICS statements";
title3 "No request for producing the ROC curve specified on the LOGISTIC procedure invocation";
proc logistic data=test descending;
  model y = x1;
run;

title3 "Addition of request for ROC curve on the LOGISTIC procedure invocation";
footnote "The Association Statistics table indicates better predictiveness of X1 when the ROC curve is generated";
footnote2 "Essentially the same Association Statistics table is obtained when log(X1) is the predictor";
footnote3 "and without the ODS GRAPHICS statements and ROC plot request (see below)";
proc logistic data=test plots(only)=roc descending;
  model y = x1;
run;
ods graphics off;


title2 "No ODS GRAPHICS statement sandwiching";
title2 "Logistic procedure invocation includes request for plotting of ROC curve";
footnote "Association Statistics table does not simply respond to the ROC plot request";
footnote2 "when ODS GRAPHICS statement sandwiching is not included";
proc logistic data=test plots(only)=roc descending;
  model y = x1;
run;


title "Predictor is log(X1): logit(p) is linear against this transformation of X1";
title2 "No ODS GRAPHICS statement sandwiching";
title3 "No ROC curve plot request";
footnote "Association Statistics table results for ln_x1 are essentially the same as for X1";
footnote2 "when the regression against X1 is sandwiched by ODS GRAPHICS statements and ROC curve is requested";
proc logistic data=test descending;
  model y = ln_x1;
run;
Dale
Pyrite | Level 9
SI responded pointing out the BINWIDTH and ROCEPS options on the MODEL statement. If BINWIDTH and ROCEPS are both set to zero, then you should get the same association table statistics. For my simulated data, the association table statistics are almost, but not quite, identical when BINWIDTH=0 and ROCEPS=0 options are specified. Specifying these two options with the neuralgia data and logistic regression model shown above, all statistics are identical regardless of the request for the ROC plot.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1263 views
  • 0 likes
  • 3 in conversation