Hi All,
I have used the CTABLE options in Logistic Regression to specify a cutoff value. It has produced the table below but I am struggling to interpret it..
Your help will be much appreciated. For example if we want to target customers who are more likely bto respond, do I select anyone with a prob > 0.2..As with a cut off of >0.5 , I don't get many people.
Many Thanks
Classification Table | |||||||||
Prob | Correct | Incorrect | Percentages | ||||||
Level | Event | Non- | Event | Non- | Correct | Sensi- | Speci- | FALSE | FALSE |
Event | Event | tivity | ficity | POS | NEG | ||||
0.05 | 43,841 | 426,000 | 356,000 | 12,420 | 56.1 | 77.9 | 54.5 | 89 | 2.8 |
0.1 | 24,715 | 658,000 | 125,000 | 31,546 | 81.4 | 43.9 | 84.1 | 83.5 | 4.6 |
0.15 | 16,000 | 722,000 | 59,899 | 40,261 | 88.1 | 28.4 | 92.3 | 78.9 | 5.3 |
0.2 | 9,516 | 753,000 | 29,047 | 46,745 | 91 | 16.9 | 96.3 | 75.3 | 5.8 |
0.25 | 6,078 | 768,000 | 13,932 | 50,183 | 92.4 | 10.8 | 98.2 | 69.6 | 6.1 |
0.3 | 2,867 | 777,000 | 5,708 | 53,394 | 93 | 5.1 | 99.3 | 66.6 | 6.4 |
0.35 | 1,660 | 779,000 | 2,846 | 54,601 | 93.1 | 3 | 99.6 | 63.2 | 6.5 |
0.4 | 833 | 781,000 | 1,050 | 55,428 | 93.3 | 1.5 | 99.9 | 55.8 | 6.6 |
0.45 | 377 | 782,000 | 372 | 55,884 | 93.3 | 0.7 | 100 | 49.7 | 6.7 |
0.5 | 205 | 782,000 | 150 | 56,056 | 93.3 | 0.4 | 100 | 42.3 | 6.7 |
0.55 | 127 | 782,000 | 80 | 56,134 | 93.3 | 0.2 | 100 | 38.6 | 6.7 |
0.6 | 93 | 782,000 | 51 | 56,168 | 93.3 | 0.2 | 100 | 35.4 | 6.7 |
0.65 | 71 | 782,000 | 42 | 56,190 | 93.3 | 0.1 | 100 | 37.2 | 6.7 |
0.7 | 58 | 782,000 | 28 | 56,203 | 93.3 | 0.1 | 100 | 32.6 | 6.7 |
0.75 | 47 | 782,000 | 23 | 56,214 | 93.3 | 0.1 | 100 | 32.9 | 6.7 |
0.8 | 36 | 782,000 | 15 | 56,225 | 93.3 | 0.1 | 100 | 29.4 | 6.7 |
0.85 | 23 | 782,000 | 9 | 56,238 | 93.3 | 0 | 100 | 28.1 | 6.7 |
0.9 | 13 | 782,000 | 3 | 56,248 | 93.3 | 0 | 100 | 18.8 | 6.7 |
0.95 | 5 | 782,000 | 1 | 56,256 | 93.3 | 0 | 100 | 16.7 | 6.7 |
1 | 0 | 782,000 | 0 | 56,261 | 93.3 | 0 | 100 | . | 6.7 |
Hi.
See page 8-10 of this document
http://www.ats.ucla.edu/stat/sas/library/ts274.pdf
maybe it will clear up your question.
I think asking a question such as "do I select anyone >0.2" is not something anyone can help.
Anca.
At and above a specific cutoff value, sensitivity is the percentage of those with the outcome of interest that are detected using your logistic model: the percentage of customers who are more likely to respond. Sensitivity is the percentage of those without the outcome of interest that are detected using the model: the percentage of customers who are not likely to response. The percentage of false positives is the percentage of customers that your model predicts as more likely to respond who in fact do not respond. The percentage of false negatives is the percentage of customers that your model predicts as not likely to respond who do in fact respond. In your example at a cutoff of 0.20 or more, your model picks up only 16.9% [=sensitivity] of customers who are more likely to respond, and 3.7% [100% - 96.3% (=specificity)] of customers who are not likely to respond. However, 75.3% [=% of false positives] of those your model predicts as likely to respond will in fact not respond, though 94.2% [=100% - 5.8% (% of false negatives)] of those your model predicts as not likely to respond will in fact not respond. You can visualize this better in a two-by-two table like the following: Test prediction % likely to respond % not likely to respond Total 0.20 or more 9,516 29,047 38,563 < 0.20 46,745 753,000 799,745 Total 56,261 782,047 838,308 Sensitivity = 9,516 / 56,261 = 16.9% Specificity = 753,000 / 782,047 = 96.3% False positives = 29,047 / 38,563 = 75.3% False negatives = 46,745 / 799,745 = 5.8% To pick an appropriate test prediction cutoff, you have to balance the costs vs. the benefits. Using the % false positives as one criterion, of every four customers you tried to contact, on average only one of them would be likely to respond. If contacting custormers is relatively cheap, you might not worry so much about this false positive % and prefer to increase the percentage of those likely to respond who in fact do respond (that is, to increase the sensitivity). At a test prediction cutoff of 0.05 or above, only one of nine customers you tried to contact would be likely to respond [=100%-89% false positives] but you would in fact be able to detect more than three-quarters of those customers that would likely respond [sensitivity=77.9%].
Many Thanks, that's really helpful..In my case contacting custormers is relatively cheap as it is by email. So does it mean, that I could pick up any with 0.1 or more, to increase the sensitivity?
Cheers
Yes. At a cutoff of 0.10 or more, the sensitivity would rise to 43.9% from 16.9% at a cutoff of 0.20 or more, so that you would reach slightly less than one-half of the customers who would be likely to respond. Since the % of false positives at a cutoff of 0.10 or more would equal 83.5%, only one of six customers you tried to contact would be likely to respond.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.