We could create a classification table in two ways:
1. Using proc logistic with ctable pprob=xxx
Example:
proc logistic desc data=mmse ;
model fn= lhippoc lmidtemp eicv c_age_a c_age_b ss/ ctable pprob=0.32;
run;
2. Using output and manipulating with data:
proc logistic desc data=mmse ;
model fn= lhippoc lmidtemp eicv c_age_a c_age_b ss/ ctable pprob=0.32;
output out=ci_with_outl p=rsk;
run;
data ci_with_outl;
set ci_with_outl;
if rsk >=0.32 then pos=1; else pos=0;
run;
proc sort;
by descending pos descending fn;
proc freq order=data;
table pos*fn;
run;
The problem is following. I have received two different classification table, with different numbers of true/faux positives and negatives.
The question is: what is an algorithm of calculation of true/faux positives and negatives in proc logistic ctable?
Well,
Re #1, SAS's argument is that the prediction method that uses estimates with the data included in the model is biased. To obtain less biased results they use a different method.
You say SAS says to use proc freq, can you reference that somewhere?
From what I understand, the suggestion is to use proc freq on the ctable output to obtain estimates of the CI.
For smaller samples the sensitivity and specificity will vary more I'm assuming.
Re #2 See post above.
Pretty sure its the same algorithm, check the documentation under classification table and details.
How close are they? One possibility is that they are different due to rounding, the second is that the probability is the opposite of what you expect ie if you modeled a binary such as 0/1 the event is considered 0, not 1, unless specified otherwise.
.
I checked the pos variable in Excel. It was correct, pos=1 where risk was 0.32 and higher 
They are pretty close. From 147 subjects (47 in desease) 26 of diases and 92 with no disease were correctly classified by proc logistic (true positive/true negative), while 27 and 95 respectively were correctly classified with proc freq. It takes me a difference of 3% for sensitivity and for specificity,
According to the doc >= (GE) is the correct comparison.
If the predicted event probability exceeds or equals some cutpoint value ![$z \in [0,1]$](/thread/images/statug_logistic0405.png) , the observation is predicted to be an event observation; otherwise, it is predicted as a nonevent. A
, the observation is predicted to be an event observation; otherwise, it is predicted as a nonevent. A  frequency table can be obtained by cross-classifying the observed and predicted responses. The CTABLE option produces this table, and the PPROB= option selects one or more cutpoints. Each cutpoint generates a classification table. If the PEVENT= option is also specified, a classification table is produced for each combination of PEVENT= and PPROB= values.
 frequency table can be obtained by cross-classifying the observed and predicted responses. The CTABLE option produces this table, and the PPROB= option selects one or more cutpoints. Each cutpoint generates a classification table. If the PEVENT= option is also specified, a classification table is produced for each combination of PEVENT= and PPROB= values.
I can't see that you're missing anything. In fact I can reproduce this with the sample data. I would expect this to work and it doesn't, but that doesn't mean I'm not missing something or doing something wrong.
Consider opening a track with tech support, an example to replicate the issue is below:
data Remission;
input remiss cell smear infil li blast temp;
label remiss='Complete Remission';
datalines;
1 .8 .83 .66 1.9 1.1 .996
1 .9 .36 .32 1.4 .74 .992
0 .8 .88 .7 .8 .176 .982
0 1 .87 .87 .7 1.053 .986
1 .9 .75 .68 1.3 .519 .98
0 1 .65 .65 .6 .519 .982
1 .95 .97 .92 1 1.23 .992
0 .95 .87 .83 1.9 1.354 1.02
0 1 .45 .45 .8 .322 .999
0 .95 .36 .34 .5 0 1.038
0 .85 .39 .33 .7 .279 .988
0 .7 .76 .53 1.2 .146 .982
0 .8 .46 .37 .4 .38 1.006
0 .2 .39 .08 .8 .114 .99
0 1 .9 .9 1.1 1.037 .99
1 1 .84 .84 1.9 2.064 1.02
0 .65 .42 .27 .5 .114 1.014
0 1 .75 .75 1 1.322 1.004
0 .5 .44 .22 .6 .114 .99
1 1 .63 .63 1.1 1.072 .986
0 1 .33 .33 .4 .176 1.01
0 .9 .93 .84 .6 1.591 1.02
1 1 .58 .58 1 .531 1.002
0 .95 .32 .3 1.6 .886 .988
1 1 .6 .6 1.7 .964 .99
1 1 .69 .69 .9 .398 .986
0 1 .73 .73 .7 .398 .986
;
proc logistic data=Remission outest=betas covout;
model remiss(event='1')=cell smear infil li blast temp
/ctable pprob=0.5 ;
output out=pred p=phat lower=lcl upper=ucl
predprob=(individual crossvalidate);
run;
data ctable;
set pred;
if phat>=0.5 then test=1;
else test=0;
run;
proc freq data =ctable;
table remiss*test;
run;
It is the same things. The different modes take different results.
With proc logistic:
Classification correct
Event (test 1 remiss 1) = 4
Non event (test 0 remiss 0) = 15
Classification uncorrect
Event (test 1 remiss 0) = 3
Non event - (test 0 remiss 1) = 5
With proc freq
Classification correct
Event (test 1 remiss 1) = 5 :smileyalert:
Non event (test 0 remiss 0) = 15
Classification uncorrect
Event (test 1 remiss 0) = 3
Non event - (test 0 remiss 1) = 4
:
Ok, I have found how the SAS calculates a classificaiton table:
SAS/STAT(R) 9.2 User's Guide, Second Edition
Now I need to know how it calculates confidence intervals for sensitivity ans specificty, LR- and LR- using the proc logistic.
I don't have the documentation with me, but I think the ctable option is doing a cross-validation. Each prediction is based on omitting that observation, fitting the model, and predicting the deleted value. This won't be exactly the same as the straight predictions that are in the output table.
Yes, thank you, I completely agree with you.
I cannot understand why SAS propose to calculate confidence limits with proc freq IF it's clear that the results would be different.
And I cannot understand how can I now obtain my CLs for specificity and sensitivity. Should I/Could I use an online-calculator?
I don't know.
You can take the output from the ctable and put that into proc freq to obtain confidence intervals. See the link at the end of this post.
You can get the classification table out with following ODS statement before your proc logistic, though it will need reformatting to meet the type required for the proc freq.
ods table Classification=classOut;
proc logistic data=Remission outest=betas covout;
model remiss(event='1')=cell smear infil li blast temp
/ctable pprob=0.5 ;
output out=pred p=phat;
run;
It's a point of discussion. This takes other sensitivity and sensibility. In my case, the differences are over 3% each.
Well,
Re #1, SAS's argument is that the prediction method that uses estimates with the data included in the model is biased. To obtain less biased results they use a different method.
You say SAS says to use proc freq, can you reference that somewhere?
From what I understand, the suggestion is to use proc freq on the ctable output to obtain estimates of the CI.
For smaller samples the sensitivity and specificity will vary more I'm assuming.
Re #2 See post above.
SAS suggestions for sens-spec CI:
This may have been tacitly alluded to here, but I was thinking that SAS used a leave-one-out (LOO) method for calculating the SEN and SPEC in the ctable option.
It does, and this is what is meant by my earlier response about cross validation. Leave one out.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
