It is very interesting to identify a problem regarding the PROC LOGISTIC. Hope this can be confirmed and corrected.
For the following data, I used PROC LOGISTIC to run a regression analysis. To check the results, I used the options XBETA which is simply the sum of the coefficients (+intercept). Please see the problem at the bottom.
data work.a;
input y x1 $ x2 $;
datalines;
0 a a
1 a a
1 a b
0 a b
1 b a
0 b a
1 b b
0 b b
1 c a
1 c a
0 c b
1 c b
1 a b
0 a a
1 b a
1 a a
0 c b
0 b b
1 b a
0 c a
;
proc logistic data=work.a outest=work.coeff descending; /*or use ODS: ods output ParameterEstimates=work.coeff*/
class x1 x2;
model y=x1 x2;
output out=work.pred01(drop=_level_) xbeta=beta;
run;
/*Results of paramter estimates*/
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 0.1612 0.4606 0.1226 0.7263
x1 a 1 0.0806 0.6426 0.0157 0.9002
x1 b 1 0.0806 0.6426 0.0157 0.9002
x2 a 1 0.3852 0.4602 0.7006 0.4026
/*Results of predictions*/
Obs y x1 x2 beta pred
1 0 a a 0.62706 0.65182
2 1 a a 0.62706 0.65182
3 1 a b -0.14333 0.46423
4 0 a b -0.14333 0.46423
5 1 b a 0.62706 0.65182
6 0 b a 0.62706 0.65182
7 1 b b -0.14333 0.46423
8 0 b b -0.14333 0.46423
9 1 c a 0.38519 0.59512
10 1 c a 0.38519 0.59512
11 0 c b -0.38519 0.40488
12 1 c b -0.38519 0.40488
13 1 a b -0.14333 0.46423
14 0 a a 0.62706 0.65182
15 1 b a 0.62706 0.65182
16 1 a a 0.62706 0.65182
17 0 c b -0.38519 0.40488
18 0 b b -0.14333 0.46423
19 1 b a 0.62706 0.65182
20 0 c a 0.38519 0.59512
*/
For obs=1, beta is correct which is calculated as: 0.1612+0.0806+0.3852=0.62706. For obs=3, beta is negative? How can this value be negative, based on the fact that all coefficicents are positive (as shown in the part of parameter estimates)! The only way to get this value is: 0.1612+0.0806-0.3852=0.1434. So how do we interpret 0.3852 here? For the variable x2, the default type is b with coefficient 0.
Switch to GLM CLASS Variable Parameterization and you will see all is well with your formula for XBETA. You should be able to figure out the correct XBETA formula for the default CLASS Variable Parameterization.
I just answered your post in the Statistical Procedures forum. SAS is giving correct results (you just have to be careful how you use the parameters for the EFFECT type of parameterization). Interestingly, there was another post about the parameter differences between LOGISTIC and GENMOD. The answer is basically the same (has to do with the parameterization), and Dale gave a thorough answer.
Hi Ivm,
Many thanks for your help.
It seemed thast SAS has evolved a lot in the past 10 years. The classic book that we (as new starters) use for logistic regression analysis is: Logistic Regression Using SAS: Theory and Application (author: Paul D. Allison). The book is very well written. But the problem is the book was published in 1999 and after that it never gets updated with new versions. So many new SAS features and changes are not reflected in this book. For example, the PROC LOGISTIC has no such parameter options or CLASS statement.
While SAS also highly recommend this book. I hope this book can be updated in the short future.
Thanks again.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.