BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AlainX
Fluorite | Level 6

Dear Friends,

 

I have just started using SAS On Demand for Academics, and I would like to have a small clarification regarding the interpretation of the "Classification Table", when running a logistic regression model. I apologise in advance, since my question may be so naive.

 

I run my model in both SPSS and SAS On Demand, and I obtain exactly the same coefficients/results.

 

However, the classification table I get in SPSS is the following one:

 

Screenshot 2022-08-11 at 12.54.03 PM.png

 

And the classification results I get in SAS are:

 

SAS a.pngSAS b.png

 

My question:

 

May you please help me with the interpretation of the above 2 SAS tables? I understand that we are talking about somehow difference results in comparison to the typical SPSS classification table? Should I take into account the 0.50 Prob Level in order to compare? And why we say "Percent Conc/Disc", we are talking about something else? (the percentages are not the same like the ones in the classification table)

 

Millions of thanks in advance.

 

Alain

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @AlainX,

 


@AlainX wrote:

Should I take into account the 0.50 Prob Level in order to compare?


Yes, I think this comes closest to the SPSS table. In both cases the known "(non-)responses" (i.e., values of variable Coupon) in the analysis dataset are compared to the predicted (non-)responses, where a subject is predicted to "respond" if the response probability according to the logistic regression model is >=0.5. I suspect that SPSS uses a different definition of the response probability than SAS and that this explains the small differences in the results: SPSS counts 20 correctly predicted responders, whereas SAS shows 19 in column "Correct Event" for Prob Level 0.500.

 

Indeed, the definition used by SAS, as described in Predicted Probability of an Event for Classification, does not simply use the estimated coefficients from the "Analysis of Maximum Likelihood Estimates" table. Compare this to the SPSS documentation https://www.ibm.com/docs/en/spss-statistics/SaaS?topic=ucslracr-classification mentioning the "model-predicted logit" (possibly rather the simple definition) and that "[c]ases are weighted by finalweight" (not sure if this applies to your data at all, unless you're dealing with complex survey data).

 

By using an OUTPUT statement like

output out=pred predprobs=(i x);

in your PROC LOGISTIC step, you can create a dataset PRED that contains both types of response probabilities (variables IP_... use the simple definition, variables XP_... the other one) for each subject.Then you can count the observations where these response probabilities are >=0.5.

 


@AlainX wrote:

And why we say "Percent Conc/Disc", we are talking about something else?


Exactly. These percentages belong to a different consideration: Each of the possible 40*60=2400 pairs of the 40 responders and the 60 non-responders is assessed with respect to the (simply) predicted response probability: whether that probability is greater for the responder of the pair (meaning "concordance") or for the non-responder ("discordance") or whether there is no difference ("tie"). So the denominator of these percentages is the number of pairs (here: 2400), not a number of subjects as in the classification tables.

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

The SPSS output and the SAS output do not show the same tables, they are different information (even though they both have the name "Classification Table"). You can get SAS to produce the same table as SPSS, as shown here: https://support.sas.com/kb/22/603.html.

 

I don't know if SAS On Demand for Academics can produce this table via menu commands, but it certainly can take the output from your Logistic regression and via the program at that link (the PREDS data set), you can produce the equivalent of the SPSS Classification table.

--
Paige Miller
AlainX
Fluorite | Level 6

Thanks so much for your reply. However, I am too newbie in SAS to understand this code and parameterise it. If someone knows how to create (if it's possible actually) the typical confusion/classification matrix for a logit problem in SAS ODA, I would be grateful. And any info regarding the interpretation of SAS ODA classification table and association of pred. probs., would also be welcome. Huge thanks.

FreelanceReinh
Jade | Level 19

Hello @AlainX,

 


@AlainX wrote:

Should I take into account the 0.50 Prob Level in order to compare?


Yes, I think this comes closest to the SPSS table. In both cases the known "(non-)responses" (i.e., values of variable Coupon) in the analysis dataset are compared to the predicted (non-)responses, where a subject is predicted to "respond" if the response probability according to the logistic regression model is >=0.5. I suspect that SPSS uses a different definition of the response probability than SAS and that this explains the small differences in the results: SPSS counts 20 correctly predicted responders, whereas SAS shows 19 in column "Correct Event" for Prob Level 0.500.

 

Indeed, the definition used by SAS, as described in Predicted Probability of an Event for Classification, does not simply use the estimated coefficients from the "Analysis of Maximum Likelihood Estimates" table. Compare this to the SPSS documentation https://www.ibm.com/docs/en/spss-statistics/SaaS?topic=ucslracr-classification mentioning the "model-predicted logit" (possibly rather the simple definition) and that "[c]ases are weighted by finalweight" (not sure if this applies to your data at all, unless you're dealing with complex survey data).

 

By using an OUTPUT statement like

output out=pred predprobs=(i x);

in your PROC LOGISTIC step, you can create a dataset PRED that contains both types of response probabilities (variables IP_... use the simple definition, variables XP_... the other one) for each subject.Then you can count the observations where these response probabilities are >=0.5.

 


@AlainX wrote:

And why we say "Percent Conc/Disc", we are talking about something else?


Exactly. These percentages belong to a different consideration: Each of the possible 40*60=2400 pairs of the 40 responders and the 60 non-responders is assessed with respect to the (simply) predicted response probability: whether that probability is greater for the responder of the pair (meaning "concordance") or for the non-responder ("discordance") or whether there is no difference ("tie"). So the denominator of these percentages is the number of pairs (here: 2400), not a number of subjects as in the classification tables.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1539 views
  • 0 likes
  • 3 in conversation