BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HI

Can someone help on this please. I run freq and logestic procedure on same data but Somer's D produced is differnt. My code is below:

proc freq data=develop noprint;
tables p6*new_rating / measures;
output out=somersd(keep=_SMDCR_ ) smdcr;
run;

Somer's D = -0.42526

proc logistic data=develop desc;
class p6;
model new_rating=p6;
run;

Somer's D =0.17017


Thanks
1 ACCEPTED SOLUTION

Accepted Solutions
Dale
Pyrite | Level 9
The SAS documentation appears to have things turned around. We can examine this by obtaining the two variants of Somers' D for an asymmetric 2x2 frequency table. We can then compute Somers' D from PROC LOGISTIC using the row variable as the response and the column variable as predictor. Then try using the column variable as the response and the row variable as the predictor. What is reported by the FREQ procedure as Somers' D C|R is the same as Somers' D returned by the LOGISTIC procedure when the row variable is employed as the response and the column variable is the predictor. Somers' D R|C is the same as Somers' D returned by the LOGISTIC procedure when the column variable is the response and the row variable is the predictor. The code below demonstrates:

data test;
  row=1; col=1; freq=120; output;
  row=1; col=2; freq=5; output;
  row=2; col=1; freq=15; output;
  row=2; col=2; freq=80; output;
run;


proc freq data=test;
  weight freq;
  tables row*col / measures;
run;

proc logistic data=test;
  freq freq;
  model row=col;
run;

proc logistic data=test;
  freq freq;
  model col=row;
run;

View solution in original post

7 REPLIES 7
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Mansoor,

Proc FREQ produces two Somer's D. Did you try to compare the other one?

SPR
deleted_user
Not applicable
Yes I did comppare the SMDRC and they are the same. But as 'new_rating' is independent variable in this example therefore, I think the SMDCR from Freq procedure shouold match the one from logistic procedure?? Message was edited by: mansoor
Dale
Pyrite | Level 9
The FREQ procedure documentation indicates that both row and column measures must have ordinal properties for interpretation of Somers' D computed by the FREQ procedure. Note that a binary variable can always be assumed to have ordinal properties. So, the variable NEW_RATING meets the requirements for use in computing Somers' D with the FREQ procedure. But in specifying the predictor, p6, on a CLASS statement in the LOGISTIC procedure, you are indicating that p6 is a nominal variable. Thus, there is a mismatch of assumptions here.

It should be noted that a predictor variable in the LOGISTIC procedure can have either nominal or interval measurement level, but not ordinal. Somers' D returned by the LOGISTIC procedure does not, indeed cannot, be based on an assumption of ordinality of all variables. When you have a predictor variable which has more than two levels, you should rarely, if ever, obtain the same Somers' D from the FREQ and LOGISTIC procedures.

Which Somers' D computation is correct depends on what your assumptions are about the measurement level of the variable p6. However, my guess is that the Somers' D returned by the FREQ procedure is NOT the correct statistic if only because the value of Somers' D returned by PROC FREQ is negative. Also, the fact that you specified p6 as a categorical variable in the logistic regression model also indicates that it would not be appropriate to assume that p6 is ordinal.
deleted_user
Not applicable
Thanks Dale. It's really helpful.

However, Is it correct, with my definition of Table statement in Freq procedure, to compute _SMDCR_ and not _SMDRC_ ?

Can you also please elaborate the following statement a bit?
"However, my guess is that the Somers' D returned by the FREQ procedure is NOT the correct statistic if only because the value of Somers' D returned by PROC FREQ is negative."

Thanks
Dale
Pyrite | Level 9
Unless you can assume that BOTH variables are ordinal, it would not be appropriate to compute either version of Somers' D using the FREQ procedure. If you can assume that both variables are ordinal, then _SMDRC_ is the appropriate statistic if the row variable represents the predictor variable and the column variable represents the response. If row and column interpretations as to predictor and response are turned around, then you would want to compute and interpret _SMDCR_.

Please disregard the comment about the negative Somers' D. I was thinking in terms of Somers' D from a logistic regression model where we would always expect a positive value. But for ordinal variables employed in PROC FREQ, the value of Somers' D can be negative if the frequency table has large values in the lower left and upper right portions of the table.
deleted_user
Not applicable
Thanks a lot for detailed answer.

One more question please, the following statement is an extract from SAS Help File, explaining Somers'D:

"Somers' D(C|R) and Somers' D(R|C) are asymmetric modifications of tau-b. C|R denotes that the row variable X is regarded as an independent variable, while the column variable Y is regarded as dependent. Similarly, R|C denotes that the column variable Y is regarded as an independent variable, while the row variable X is regarded as dependent."

I think, it could be only me, but what do you think of above statement?
Dale
Pyrite | Level 9
The SAS documentation appears to have things turned around. We can examine this by obtaining the two variants of Somers' D for an asymmetric 2x2 frequency table. We can then compute Somers' D from PROC LOGISTIC using the row variable as the response and the column variable as predictor. Then try using the column variable as the response and the row variable as the predictor. What is reported by the FREQ procedure as Somers' D C|R is the same as Somers' D returned by the LOGISTIC procedure when the row variable is employed as the response and the column variable is the predictor. Somers' D R|C is the same as Somers' D returned by the LOGISTIC procedure when the column variable is the response and the row variable is the predictor. The code below demonstrates:

data test;
  row=1; col=1; freq=120; output;
  row=1; col=2; freq=5; output;
  row=2; col=1; freq=15; output;
  row=2; col=2; freq=80; output;
run;


proc freq data=test;
  weight freq;
  tables row*col / measures;
run;

proc logistic data=test;
  freq freq;
  model row=col;
run;

proc logistic data=test;
  freq freq;
  model col=row;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 28124 views
  • 2 likes
  • 3 in conversation