Statistical Procedures

Ruth · Posted 07-07-2011 02:44 PM

I just tried to run a logistic model. But the two procedures produced different parameter estimates for intercept and coefficients. The TYPE3 result is also slighly different.

My understanding is that the result from PROC GENMOD is correct. But don't understand the output from PROC LOGISTIC. They should not really product different results.

data work.a;
input y x1 $ x2 $;
datalines;
0 a a
1 a a
1 a b
0 a b
1 b a
0 b a
1 b b
0 b b
1 c a
1 c a
0 c b
1 c b
1 a b
0 a a
1 b a
1 a a
0 c b
0 b b
1 b a
0 c a
;

proc logistic data=work.a outest=work.coeff descending;
class x1 x2;
model y=x1 x2;
run;

proc genmod data=work.a descending;
class x1 x2;
model y=x1 x2 / D=b type3;
ods output ParameterEstimates=work.coeff2(drop=lowerwaldcl upperwaldcl);
run;

proc print data=work.coeff;
run;

proc print data=work.coeff2;
run;

/*Results:*/

/*proc logistic*/

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept       1      0.1612      0.4606        0.1226        0.7263
x1        a     1      0.0806      0.6426        0.0157        0.9002
x1        b     1      0.0806      0.6426        0.0157        0.9002
x2        a     1      0.3852      0.4602        0.7006        0.4026

/*proc genmod*/
Prob
Obs Parameter Level1 DF Estimate StdErr ChiSq ChiSq

1     Intercept               1     -0.3852      0.9505       0.16    0.6853
2     x1             a        1      0.2419      1.1394       0.05    0.8319
3     x1             b        1      0.2419      1.1394       0.05    0.8319
4     x1             c        0      0.0000      0.0000        .       .
5     x2             a        1      0.7704      0.9204       0.70    0.4026
6     x2             b        0      0.0000      0.0000        .       .

Dale · Posted 07-07-2011 04:12 PM

Ruth,

The results are consistent with each other. Both are correct. However, they have been obtained using different expansions of the categorical variables. The GENMOD procedure employs an overparameterized model in which a set of k binary variables are produced when the number of levels of a categorical variable is k. SAS refers to this as the GLM parameterization. By default, the LOGISTIC procedure employs a model with k-1 variables in the design matrix. Moreover, the k-1 variables are not binary, but can take on one of three values: -1, 0, or 1. This sort of parameterization is referred to as effect coding.

For variable X1, columns of the design matrix given GLM coding and effect coding are as follows:

GLM coding

X1 X1_1 X1_2 X1_3

a 1 0 0

b 0 1 0

c 0 0 1

Effect coding

X1 X1_1 X1_2

a 1 0

b 0 1

c -1 -1

It can be shown that these two parameterizations will yield the same predicted response values. But the parameters do have to be interpreted differently.

If you prefer the GLM parameterization (a lot of people do), you can request that parameterization in the LOGISTIC procedure. All you have to do is change your class statement to:

class x1 x1 / param=glm;

HTH

View solution in original post

Dale · Posted 07-07-2011 04:12 PM

Ruth,

The results are consistent with each other. Both are correct. However, they have been obtained using different expansions of the categorical variables. The GENMOD procedure employs an overparameterized model in which a set of k binary variables are produced when the number of levels of a categorical variable is k. SAS refers to this as the GLM parameterization. By default, the LOGISTIC procedure employs a model with k-1 variables in the design matrix. Moreover, the k-1 variables are not binary, but can take on one of three values: -1, 0, or 1. This sort of parameterization is referred to as effect coding.

For variable X1, columns of the design matrix given GLM coding and effect coding are as follows:

GLM coding

X1 X1_1 X1_2 X1_3

a 1 0 0

b 0 1 0

c 0 0 1

Effect coding

X1 X1_1 X1_2

a 1 0

b 0 1

c -1 -1

It can be shown that these two parameterizations will yield the same predicted response values. But the parameters do have to be interpreted differently.

If you prefer the GLM parameterization (a lot of people do), you can request that parameterization in the LOGISTIC procedure. All you have to do is change your class statement to:

class x1 x1 / param=glm;

HTH

Ruth · Posted 07-08-2011 04:42 AM

Hi Dale,

It is a thorough and clear answer. Thanks a lot!

It seemed thast SAS has evolved a lot in the past 10 years. The classic book that we (as new starters) use for logistic regression analysis is: Logistic Regression Using SAS: Theory and Application (author: Paul D. Allison). The book is very well written. But the problem is the book was published in 1999 and after that it never gets updated with new versions. So many new SAS features and changes are not reflected in this book. For example, the PROC LOGISTIC has no such parameter options or CLASS statement.

While SAS also highly recommend this book. I hope this book can be updated in the short future.

Thanks again.

Statistical Procedures

PROC LOGISTIC vs PROC GENMOD (different results)

Re: PROC LOGISTIC vs PROC GENMOD (different results)

Re: PROC LOGISTIC vs PROC GENMOD (different results)

Re: PROC LOGISTIC vs PROC GENMOD (different results)

Follow Us

What is...

Statistical Procedures

PROC LOGISTIC vs PROC GENMOD (different results)

Re: PROC LOGISTIC vs PROC GENMOD (different results)

Re: PROC LOGISTIC vs PROC GENMOD (different results)

Re: PROC LOGISTIC vs PROC GENMOD (different results)

Our biggest data and AI event of the year.

Follow Us

What is...