Re: Logistic Regression CTABLE options, Please Help. Many Thanks

Question · Posted 06-18-2013 06:12 AM

Hi All,

I have used the CTABLE options in Logistic Regression to specify a cutoff value. It has produced the table below but I am struggling to interpret it..

Your help will be much appreciated. For example if we want to target customers who are more likely bto respond, do I select anyone with a prob > 0.2..As with a cut off of >0.5 , I don't get many people.

Many Thanks

Classification Table
Prob	Correct		Incorrect		Percentages
Level	Event	Non-	Event	Non-	Correct	Sensi-	Speci-	FALSE	FALSE
	Event	Event	Event	Event	Correct	tivity	ficity	POS	NEG
0.05	43,841	426,000	356,000	12,420	56.1	77.9	54.5	89	2.8
0.1	24,715	658,000	125,000	31,546	81.4	43.9	84.1	83.5	4.6
0.15	16,000	722,000	59,899	40,261	88.1	28.4	92.3	78.9	5.3
0.2	9,516	753,000	29,047	46,745	91	16.9	96.3	75.3	5.8
0.25	6,078	768,000	13,932	50,183	92.4	10.8	98.2	69.6	6.1
0.3	2,867	777,000	5,708	53,394	93	5.1	99.3	66.6	6.4
0.35	1,660	779,000	2,846	54,601	93.1	3	99.6	63.2	6.5
0.4	833	781,000	1,050	55,428	93.3	1.5	99.9	55.8	6.6
0.45	377	782,000	372	55,884	93.3	0.7	100	49.7	6.7
0.5	205	782,000	150	56,056	93.3	0.4	100	42.3	6.7
0.55	127	782,000	80	56,134	93.3	0.2	100	38.6	6.7
0.6	93	782,000	51	56,168	93.3	0.2	100	35.4	6.7
0.65	71	782,000	42	56,190	93.3	0.1	100	37.2	6.7
0.7	58	782,000	28	56,203	93.3	0.1	100	32.6	6.7
0.75	47	782,000	23	56,214	93.3	0.1	100	32.9	6.7
0.8	36	782,000	15	56,225	93.3	0.1	100	29.4	6.7
0.85	23	782,000	9	56,238	93.3	0	100	28.1	6.7
0.9	13	782,000	3	56,248	93.3	0	100	18.8	6.7
0.95	5	782,000	1	56,256	93.3	0	100	16.7	6.7
1	0	782,000	0	56,261	93.3	0	100	.	6.7

AncaTilea · Posted 06-18-2013 08:15 AM

Hi.

See page 8-10 of this document

http://www.ats.ucla.edu/stat/sas/library/ts274.pdf

maybe it will clear up your question.

I think asking a question such as "do I select anyone >0.2" is not something anyone can help.

Anca.

1zmm · Posted 06-20-2013 08:46 AM

At and above a specific cutoff value, sensitivity is the percentage of those with the outcome of interest that are detected using your logistic model: the percentage of customers who are more likely to respond. Sensitivity is the percentage of those without the outcome of interest that are detected using the model: the percentage of customers who are not likely to response. The percentage of false positives is the percentage of customers that your model predicts as more likely to respond who in fact do not respond. The percentage of false negatives is the percentage of customers that your model predicts as not likely to respond who do in fact respond. In your example at a cutoff of 0.20 or more, your model picks up only 16.9% [=sensitivity] of customers who are more likely to respond, and 3.7% [100% - 96.3% (=specificity)] of customers who are not likely to respond. However, 75.3% [=% of false positives] of those your model predicts as likely to respond will in fact not respond, though 94.2% [=100% - 5.8% (% of false negatives)] of those your model predicts as not likely to respond will in fact not respond. You can visualize this better in a two-by-two table like the following: Test prediction % likely to respond % not likely to respond Total 0.20 or more 9,516 29,047 38,563 < 0.20 46,745 753,000 799,745 Total 56,261 782,047 838,308 Sensitivity = 9,516 / 56,261 = 16.9% Specificity = 753,000 / 782,047 = 96.3% False positives = 29,047 / 38,563 = 75.3% False negatives = 46,745 / 799,745 = 5.8% To pick an appropriate test prediction cutoff, you have to balance the costs vs. the benefits. Using the % false positives as one criterion, of every four customers you tried to contact, on average only one of them would be likely to respond. If contacting custormers is relatively cheap, you might not worry so much about this false positive % and prefer to increase the percentage of those likely to respond who in fact do respond (that is, to increase the sensitivity). At a test prediction cutoff of 0.05 or above, only one of nine customers you tried to contact would be likely to respond [=100%-89% false positives] but you would in fact be able to detect more than three-quarters of those customers that would likely respond [sensitivity=77.9%].

Question · Posted 06-20-2013 08:59 AM

Many Thanks, that's really helpful..In my case contacting custormers is relatively cheap as it is by email. So does it mean, that I could pick up any with 0.1 or more, to increase the sensitivity?

Cheers

1zmm · Posted 06-20-2013 09:04 AM

Yes. At a cutoff of 0.10 or more, the sensitivity would rise to 43.9% from 16.9% at a cutoff of 0.20 or more, so that you would reach slightly less than one-half of the customers who would be likely to respond. Since the % of false positives at a cutoff of 0.10 or more would equal 83.5%, only one of six customers you tried to contact would be likely to respond.

Logistic Regression CTABLE options, Please Help. Many Thanks