Solved: Re: Logistic regression & classification table

AlainX · Posted 08-11-2022 06:05 AM

Dear Friends,

I have just started using SAS On Demand for Academics, and I would like to have a small clarification regarding the interpretation of the "Classification Table", when running a logistic regression model. I apologise in advance, since my question may be so naive.

I run my model in both SPSS and SAS On Demand, and I obtain exactly the same coefficients/results.

However, the classification table I get in SPSS is the following one:

And the classification results I get in SAS are:

My question:

May you please help me with the interpretation of the above 2 SAS tables? I understand that we are talking about somehow difference results in comparison to the typical SPSS classification table? Should I take into account the 0.50 Prob Level in order to compare? And why we say "Percent Conc/Disc", we are talking about something else? (the percentages are not the same like the ones in the classification table)

Millions of thanks in advance.

Alain

FreelanceReinh · Posted 08-11-2022 03:47 PM

Hello @AlainX,

@AlainX wrote:

Should I take into account the 0.50 Prob Level in order to compare?

Yes, I think this comes closest to the SPSS table. In both cases the known "(non-)responses" (i.e., values of variable Coupon) in the analysis dataset are compared to the predicted (non-)responses, where a subject is predicted to "respond" if the response probability according to the logistic regression model is >=0.5. I suspect that SPSS uses a different definition of the response probability than SAS and that this explains the small differences in the results: SPSS counts 20 correctly predicted responders, whereas SAS shows 19 in column "Correct Event" for Prob Level 0.500.

Indeed, the definition used by SAS, as described in Predicted Probability of an Event for Classification, does not simply use the estimated coefficients from the "Analysis of Maximum Likelihood Estimates" table. Compare this to the SPSS documentation https://www.ibm.com/docs/en/spss-statistics/SaaS?topic=ucslracr-classification mentioning the "model-predicted logit" (possibly rather the simple definition) and that "[c]ases are weighted by finalweight" (not sure if this applies to your data at all, unless you're dealing with complex survey data).

By using an OUTPUT statement like

output out=pred predprobs=(i x);

in your PROC LOGISTIC step, you can create a dataset PRED that contains both types of response probabilities (variables IP_... use the simple definition, variables XP_... the other one) for each subject.Then you can count the observations where these response probabilities are >=0.5.

@AlainX wrote:

And why we say "Percent Conc/Disc", we are talking about something else?

Exactly. These percentages belong to a different consideration: Each of the possible 40*60=2400 pairs of the 40 responders and the 60 non-responders is assessed with respect to the (simply) predicted response probability: whether that probability is greater for the responder of the pair (meaning "concordance") or for the non-responder ("discordance") or whether there is no difference ("tie"). So the denominator of these percentages is the number of pairs (here: 2400), not a number of subjects as in the classification tables.

View solution in original post

PaigeMiller · Posted 08-11-2022 07:46 AM

The SPSS output and the SAS output do not show the same tables, they are different information (even though they both have the name "Classification Table"). You can get SAS to produce the same table as SPSS, as shown here: https://support.sas.com/kb/22/603.html.

I don't know if SAS On Demand for Academics can produce this table via menu commands, but it certainly can take the output from your Logistic regression and via the program at that link (the PREDS data set), you can produce the equivalent of the SPSS Classification table.

--
Paige Miller

AlainX · Posted 08-11-2022 09:32 AM

Thanks so much for your reply. However, I am too newbie in SAS to understand this code and parameterise it. If someone knows how to create (if it's possible actually) the typical confusion/classification matrix for a logit problem in SAS ODA, I would be grateful. And any info regarding the interpretation of SAS ODA classification table and association of pred. probs., would also be welcome. Huge thanks.

FreelanceReinh · Posted 08-11-2022 03:47 PM

Hello @AlainX,

@AlainX wrote:

Should I take into account the 0.50 Prob Level in order to compare?

Yes, I think this comes closest to the SPSS table. In both cases the known "(non-)responses" (i.e., values of variable Coupon) in the analysis dataset are compared to the predicted (non-)responses, where a subject is predicted to "respond" if the response probability according to the logistic regression model is >=0.5. I suspect that SPSS uses a different definition of the response probability than SAS and that this explains the small differences in the results: SPSS counts 20 correctly predicted responders, whereas SAS shows 19 in column "Correct Event" for Prob Level 0.500.

Indeed, the definition used by SAS, as described in Predicted Probability of an Event for Classification, does not simply use the estimated coefficients from the "Analysis of Maximum Likelihood Estimates" table. Compare this to the SPSS documentation https://www.ibm.com/docs/en/spss-statistics/SaaS?topic=ucslracr-classification mentioning the "model-predicted logit" (possibly rather the simple definition) and that "[c]ases are weighted by finalweight" (not sure if this applies to your data at all, unless you're dealing with complex survey data).

By using an OUTPUT statement like

output out=pred predprobs=(i x);

in your PROC LOGISTIC step, you can create a dataset PRED that contains both types of response probabilities (variables IP_... use the simple definition, variables XP_... the other one) for each subject.Then you can count the observations where these response probabilities are >=0.5.

@AlainX wrote:

And why we say "Percent Conc/Disc", we are talking about something else?

Exactly. These percentages belong to a different consideration: Each of the possible 40*60=2400 pairs of the 40 responders and the 60 non-responders is assessed with respect to the (simply) predicted response probability: whether that probability is greater for the responder of the pair (meaning "concordance") or for the non-responder ("discordance") or whether there is no difference ("tie"). So the denominator of these percentages is the number of pairs (here: 2400), not a number of subjects as in the classification tables.

Logistic regression & classification table

Re: Logistic regression & classification table

Re: Logistic regression & classification table

Re: Logistic regression & classification table

Re: Logistic regression & classification table

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!