turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc Logistic - Different Association Statistics w...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2010 07:13 AM

Dear all,

I'm currently a little confused with the association statistic produced by the logistic procedure in SAS 9.2. If I ask for the ROC plot and I activate ODS GRAPHICS I receive slightly different values for the association statistics (e.g., c-value, Sommer's D, etc) as if I turn off the graphics.

For illustration, please find attached an example data set (SAS/STAT UG Example 51.2) and code with and without ROC curve. If you compare the association statistics between both codes, you'll see the slight differences in the output of the association statistics. In my "real life" data set the difference is much more pronounced.

Does anyone of you know where the difference comes from? Is this difference documented, e.g. in the user's guide?

Any help is highly appreciated.

Thanks,

Thorsten

Data Neuralgia;

input Treatment $ Sex $ Age Duration Pain $ @@;

datalines;

P F 68 1 No B M 74 16 No P F 67 30 No

P M 66 26 Yes B F 67 28 No B F 77 16 No

A F 71 12 No B F 72 50 No B F 76 9 Yes

A M 71 17 Yes A F 63 27 No A F 69 18 Yes

B F 66 12 No A M 62 42 No P F 64 1 Yes

A F 64 17 No P M 74 4 No A F 72 25 No

P M 70 1 Yes B M 66 19 No B M 59 29 No

A F 64 30 No A M 70 28 No A M 69 1 No

B F 78 1 No P M 83 1 Yes B F 69 42 No

B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes

A M 70 12 No A F 69 12 No B F 65 14 No

B M 70 1 No B M 67 23 No A M 76 25 Yes

P M 78 12 Yes B M 77 1 Yes B F 69 24 No

P M 66 4 Yes P F 65 29 No P M 60 26 Yes

A M 78 15 Yes B M 75 21 Yes A F 67 11 No

P F 72 27 No P F 70 13 Yes A M 75 6 Yes

B F 65 7 No P F 68 27 Yes P M 68 11 Yes

P M 67 17 Yes B M 70 22 No A M 65 15 No

P F 67 1 Yes A M 67 10 No P F 72 11 Yes

A F 74 1 No B M 80 21 Yes A F 69 3 No

;

proc logistic data=Neuralgia ;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / expb;

run;

ods graphics on;

proc logistic data=Neuralgia plots(only)=(roc ) ;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / expb;

run;

ods graphics off;

I'm currently a little confused with the association statistic produced by the logistic procedure in SAS 9.2. If I ask for the ROC plot and I activate ODS GRAPHICS I receive slightly different values for the association statistics (e.g., c-value, Sommer's D, etc) as if I turn off the graphics.

For illustration, please find attached an example data set (SAS/STAT UG Example 51.2) and code with and without ROC curve. If you compare the association statistics between both codes, you'll see the slight differences in the output of the association statistics. In my "real life" data set the difference is much more pronounced.

Does anyone of you know where the difference comes from? Is this difference documented, e.g. in the user's guide?

Any help is highly appreciated.

Thanks,

Thorsten

Data Neuralgia;

input Treatment $ Sex $ Age Duration Pain $ @@;

datalines;

P F 68 1 No B M 74 16 No P F 67 30 No

P M 66 26 Yes B F 67 28 No B F 77 16 No

A F 71 12 No B F 72 50 No B F 76 9 Yes

A M 71 17 Yes A F 63 27 No A F 69 18 Yes

B F 66 12 No A M 62 42 No P F 64 1 Yes

A F 64 17 No P M 74 4 No A F 72 25 No

P M 70 1 Yes B M 66 19 No B M 59 29 No

A F 64 30 No A M 70 28 No A M 69 1 No

B F 78 1 No P M 83 1 Yes B F 69 42 No

B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes

A M 70 12 No A F 69 12 No B F 65 14 No

B M 70 1 No B M 67 23 No A M 76 25 Yes

P M 78 12 Yes B M 77 1 Yes B F 69 24 No

P M 66 4 Yes P F 65 29 No P M 60 26 Yes

A M 78 15 Yes B M 75 21 Yes A F 67 11 No

P F 72 27 No P F 70 13 Yes A M 75 6 Yes

B F 65 7 No P F 68 27 Yes P M 68 11 Yes

P M 67 17 Yes B M 70 22 No A M 65 15 No

P F 67 1 Yes A M 67 10 No P F 72 11 Yes

A F 74 1 No B M 80 21 Yes A F 69 3 No

;

proc logistic data=Neuralgia ;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / expb;

run;

ods graphics on;

proc logistic data=Neuralgia plots(only)=(roc ) ;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / expb;

run;

ods graphics off;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2010 08:42 AM

That is probably a question for tech support.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2010 01:04 PM

Thorsten,

Strange that you should mention it today as I just sent a bug report to SI on Friday about this very issue. I have not yet heard back from them except for the automatically generated form which indicates that they received the report.

If you want to see just how bad the problem can be, take a look at the following code which I sent to SI. Depending on whether or not an ROC curve is requested (including the associated ODS GRAPHICS statement), the AUC reported by PROC LOGISTIC is 0.770 or 0.839 - a huge difference. The skewness of the predictor variable seems to play a role in the discrepancy between AUC estimates. The more skewed is the predictor variable, the more discrepant are the association table statistics.

data test;

b0 = 0;

b1 = 0.5;

b2 = 0.2;

do i=1 to 1000;

u1 = rannor(1234579);

u2 = rannor(1234579);

ln_x1 = 4*(0.8*u1 + 0.2*u2);

x1 = exp(ln_x1);

x2 = 0.2*u1 + 0.8*u2;

eta = b0 + b1*ln_x1 + b2*x2;

p = 1 / (1 + exp(-eta));

y = ranbin(1234579,1,p);

output;

end;

keep x1 ln_x1 x2 y;

run;

proc univariate data=test plot;

var x1 ln_x1;

run;

title "Predictor is X1 which is highly skewed: logit(p) is linear against log(X1)";

title2 "Results generated when logistic procedure IS NOT sandwiched by ODS GRAPHICS statements";

footnote "Association Statistics table will be different when we sandwich the LOGISTIC procedure";

footnote2 "by ODS GRAPHICS statements AND request that the ROC curve be generated";

proc logistic data=test descending;

model y = x1;

run;

ods graphics on;

title2 "Results generated when logistic procedure IS sandwiched by ODS GRAPHICS statements";

title3 "No request for producing the ROC curve specified on the LOGISTIC procedure invocation";

proc logistic data=test descending;

model y = x1;

run;

title3 "Addition of request for ROC curve on the LOGISTIC procedure invocation";

footnote "The Association Statistics table indicates better predictiveness of X1 when the ROC curve is generated";

footnote2 "Essentially the same Association Statistics table is obtained when log(X1) is the predictor";

footnote3 "and without the ODS GRAPHICS statements and ROC plot request (see below)";

proc logistic data=test plots(only)=roc descending;

model y = x1;

run;

ods graphics off;

title2 "No ODS GRAPHICS statement sandwiching";

title2 "Logistic procedure invocation includes request for plotting of ROC curve";

footnote "Association Statistics table does not simply respond to the ROC plot request";

footnote2 "when ODS GRAPHICS statement sandwiching is not included";

proc logistic data=test plots(only)=roc descending;

model y = x1;

run;

title "Predictor is log(X1): logit(p) is linear against this transformation of X1";

title2 "No ODS GRAPHICS statement sandwiching";

title3 "No ROC curve plot request";

footnote "Association Statistics table results for ln_x1 are essentially the same as for X1";

footnote2 "when the regression against X1 is sandwiched by ODS GRAPHICS statements and ROC curve is requested";

proc logistic data=test descending;

model y = ln_x1;

run;

Strange that you should mention it today as I just sent a bug report to SI on Friday about this very issue. I have not yet heard back from them except for the automatically generated form which indicates that they received the report.

If you want to see just how bad the problem can be, take a look at the following code which I sent to SI. Depending on whether or not an ROC curve is requested (including the associated ODS GRAPHICS statement), the AUC reported by PROC LOGISTIC is 0.770 or 0.839 - a huge difference. The skewness of the predictor variable seems to play a role in the discrepancy between AUC estimates. The more skewed is the predictor variable, the more discrepant are the association table statistics.

data test;

b0 = 0;

b1 = 0.5;

b2 = 0.2;

do i=1 to 1000;

u1 = rannor(1234579);

u2 = rannor(1234579);

ln_x1 = 4*(0.8*u1 + 0.2*u2);

x1 = exp(ln_x1);

x2 = 0.2*u1 + 0.8*u2;

eta = b0 + b1*ln_x1 + b2*x2;

p = 1 / (1 + exp(-eta));

y = ranbin(1234579,1,p);

output;

end;

keep x1 ln_x1 x2 y;

run;

proc univariate data=test plot;

var x1 ln_x1;

run;

title "Predictor is X1 which is highly skewed: logit(p) is linear against log(X1)";

title2 "Results generated when logistic procedure IS NOT sandwiched by ODS GRAPHICS statements";

footnote "Association Statistics table will be different when we sandwich the LOGISTIC procedure";

footnote2 "by ODS GRAPHICS statements AND request that the ROC curve be generated";

proc logistic data=test descending;

model y = x1;

run;

ods graphics on;

title2 "Results generated when logistic procedure IS sandwiched by ODS GRAPHICS statements";

title3 "No request for producing the ROC curve specified on the LOGISTIC procedure invocation";

proc logistic data=test descending;

model y = x1;

run;

title3 "Addition of request for ROC curve on the LOGISTIC procedure invocation";

footnote "The Association Statistics table indicates better predictiveness of X1 when the ROC curve is generated";

footnote2 "Essentially the same Association Statistics table is obtained when log(X1) is the predictor";

footnote3 "and without the ODS GRAPHICS statements and ROC plot request (see below)";

proc logistic data=test plots(only)=roc descending;

model y = x1;

run;

ods graphics off;

title2 "No ODS GRAPHICS statement sandwiching";

title2 "Logistic procedure invocation includes request for plotting of ROC curve";

footnote "Association Statistics table does not simply respond to the ROC plot request";

footnote2 "when ODS GRAPHICS statement sandwiching is not included";

proc logistic data=test plots(only)=roc descending;

model y = x1;

run;

title "Predictor is log(X1): logit(p) is linear against this transformation of X1";

title2 "No ODS GRAPHICS statement sandwiching";

title3 "No ROC curve plot request";

footnote "Association Statistics table results for ln_x1 are essentially the same as for X1";

footnote2 "when the regression against X1 is sandwiched by ODS GRAPHICS statements and ROC curve is requested";

proc logistic data=test descending;

model y = ln_x1;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2010 05:24 PM

SI responded pointing out the BINWIDTH and ROCEPS options on the MODEL statement. If BINWIDTH and ROCEPS are both set to zero, then you should get the same association table statistics. For my simulated data, the association table statistics are **almost, but not quite,** identical when BINWIDTH=0 and ROCEPS=0 options are specified. Specifying these two options with the neuralgia data and logistic regression model shown above, all statistics are identical regardless of the request for the ROC plot.