BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yu_Bo
Calcite | Level 5

We could create a classification table in two ways:

1. Using proc logistic with ctable pprob=xxx

Example:

proc logistic desc data=mmse ;

model fn= lhippoc lmidtemp  eicv c_age_a c_age_b ss/ ctable pprob=0.32;

run;


2. Using output and manipulating with data:

proc logistic desc data=mmse ;

model fn= lhippoc lmidtemp  eicv c_age_a c_age_b ss/ ctable pprob=0.32;

output out=ci_with_outl p=rsk;

run;

data ci_with_outl;

set ci_with_outl;

if rsk >=0.32 then pos=1; else pos=0;

run;

proc sort;

by descending pos descending fn;

proc freq order=data;

table pos*fn;

run;


The problem is following. I have received two different classification table, with different numbers of true/faux positives and negatives.

The question is: what is an algorithm of calculation of true/faux positives and negatives in proc logistic ctable?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Well,

Re #1, SAS's argument is that the prediction method that uses estimates with the data included in the model is biased. To obtain less biased results they use a different method.

You say SAS says to use proc freq, can you reference that somewhere?

From what I understand, the suggestion is to use proc freq on the ctable output to obtain estimates of the CI.

For smaller samples the sensitivity and specificity will vary more I'm assuming.

Re #2 See post above.

View solution in original post

16 REPLIES 16
Reeza
Super User

Pretty sure its the same algorithm, check the documentation under classification table and details.

How close are they? One possibility is that they are different due to rounding, the second is that the probability is the opposite of what you expect ie if you modeled a binary such as 0/1 the event is considered 0, not 1, unless specified otherwise.

.

Yu_Bo
Calcite | Level 5

I checked the pos variable in Excel. It was correct, pos=1 where risk was 0.32 and higher Smiley Sad

Yu_Bo
Calcite | Level 5

They are pretty close. From 147 subjects (47 in desease) 26 of diases and 92 with no disease were correctly classified by proc logistic (true positive/true negative), while 27 and 95 respectively were correctly classified with proc freq. It takes me a difference of 3% for sensitivity and for specificity,

Reeza
Super User

According to the doc >= (GE) is the correct comparison.

If the predicted event probability exceeds or equals some cutpoint value $z \in [0,1]$, the observation is predicted to be an event observation; otherwise, it is predicted as a nonevent. A $2\times 2$ frequency table can be obtained by cross-classifying the observed and predicted responses. The CTABLE option produces this table, and the PPROB= option selects one or more cutpoints. Each cutpoint generates a classification table. If the PEVENT= option is also specified, a classification table is produced for each combination of PEVENT= and PPROB= values.

I can't see that you're missing anything. In fact I can reproduce this with the sample data. I would expect this to work and it doesn't, but that doesn't mean I'm not missing something or doing something wrong. 

Consider opening a track with tech support, an example to replicate the issue is below:

data Remission;

   input remiss cell smear infil li blast temp;

   label remiss='Complete Remission';

   datalines;

1   .8   .83  .66  1.9  1.1     .996

1   .9   .36  .32  1.4   .74    .992

0   .8   .88  .7    .8   .176   .982

0  1     .87  .87   .7  1.053   .986

1   .9   .75  .68  1.3   .519   .98

0  1     .65  .65   .6   .519   .982

1   .95  .97  .92  1    1.23    .992

0   .95  .87  .83  1.9  1.354  1.02

0  1     .45  .45   .8   .322   .999

0   .95  .36  .34   .5  0      1.038

0   .85  .39  .33   .7   .279   .988

0   .7   .76  .53  1.2   .146   .982

0   .8   .46  .37   .4   .38   1.006

0   .2   .39  .08   .8   .114   .99

0  1     .9   .9   1.1  1.037   .99

1  1     .84  .84  1.9  2.064  1.02

0   .65  .42  .27   .5   .114  1.014

0  1     .75  .75  1    1.322  1.004

0   .5   .44  .22   .6   .114   .99

1  1     .63  .63  1.1  1.072   .986

0  1     .33  .33   .4   .176  1.01

0   .9   .93  .84   .6  1.591  1.02

1  1     .58  .58  1     .531  1.002

0   .95  .32  .3   1.6   .886   .988

1  1     .6   .6   1.7   .964   .99

1  1     .69  .69   .9   .398   .986

0  1     .73  .73   .7   .398   .986

;

proc logistic data=Remission outest=betas covout;

   model remiss(event='1')=cell smear infil li blast temp

                /ctable pprob=0.5 ;

   output out=pred p=phat lower=lcl upper=ucl

          predprob=(individual crossvalidate);

run;

data ctable;

    set pred;

    if phat>=0.5 then test=1;

    else test=0;

run;

proc freq data =ctable;

table remiss*test;

run;

Yu_Bo
Calcite | Level 5

It is the same things. The different modes take different results.

With proc logistic:

Classification correct

Event (test 1 remiss 1) = 4

Non event (test 0 remiss 0) = 15

Classification uncorrect

Event (test 1 remiss 0) = 3

Non event - (test 0 remiss 1) = 5

With proc freq

Classification correct

Event (test 1 remiss 1) = 5 :smileyalert:

Non event (test 0 remiss 0) = 15

Classification uncorrect

Event (test 1 remiss 0) = 3

Non event - (test 0 remiss 1) = 4

:

Yu_Bo
Calcite | Level 5

Ok, I have found how the SAS calculates a classificaiton table:

SAS/STAT(R) 9.2 User's Guide, Second Edition

Now I need to know how it calculates confidence intervals for sensitivity ans specificty, LR- and LR- using the proc logistic.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I don't have the documentation with me, but I think the ctable option is doing a cross-validation. Each prediction is based on omitting that observation, fitting the model, and predicting the deleted value. This won't be exactly the same as the straight predictions that are in the output table.

Yu_Bo
Calcite | Level 5

Yes, thank you, I completely agree with you.

I cannot understand why SAS propose to calculate confidence limits with proc freq IF it's clear that the results would be different.

And I cannot understand how can I now obtain my CLs for specificity and sensitivity. Should I/Could I use an online-calculator?

I don't know.

Reeza
Super User

You can take the output from the ctable and put that into proc freq to obtain confidence intervals. See the link at the end of this post.

You can get the classification table out with following ODS statement before your proc logistic, though it will need reformatting to meet the type required for the proc freq.

ods table Classification=classOut;

proc logistic data=Remission outest=betas covout;

   model remiss(event='1')=cell smear infil li blast temp

                /ctable pprob=0.5 ;

   output out=pred p=phat;

run;

24170 - Estimating sensitivity, specificity, positive and negative predictive values, and other stat...

Yu_Bo
Calcite | Level 5

It's a point of discussion. This takes other sensitivity and sensibility. In my case, the differences are over 3% each.

Reeza
Super User

Well,

Re #1, SAS's argument is that the prediction method that uses estimates with the data included in the model is biased. To obtain less biased results they use a different method.

You say SAS says to use proc freq, can you reference that somewhere?

From what I understand, the suggestion is to use proc freq on the ctable output to obtain estimates of the CI.

For smaller samples the sensitivity and specificity will vary more I'm assuming.

Re #2 See post above.

H
Pyrite | Level 9 H
Pyrite | Level 9

This may have been tacitly alluded to here, but I was thinking that SAS used a leave-one-out (LOO) method for calculating the SEN and SPEC in the ctable option.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

It does, and this is what is meant by my earlier response about cross validation. Leave one out.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 15788 views
  • 1 like
  • 5 in conversation