BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruth
Fluorite | Level 6

I just tried to run a logistic model. But the two procedures produced different parameter estimates for intercept and coefficients. The TYPE3 result is also slighly different.

My understanding is that the result from PROC GENMOD is correct. But don't understand the output from PROC LOGISTIC. They should not really product different results.

data work.a;
input y x1 $ x2 $;
datalines;
0 a a
1 a a
1 a b
0 a b
1 b a
0 b a
1 b b
0 b b
1 c a
1 c a
0 c b
1 c b
1 a b
0 a a
1 b a
1 a a
0 c b
0 b b
1 b a
0 c a
;


proc logistic data=work.a outest=work.coeff descending;
  class x1 x2;
  model y=x1 x2;
run;

proc genmod data=work.a descending;
  class x1 x2;
  model y=x1 x2 / D=b type3;
  ods output ParameterEstimates=work.coeff2(drop=lowerwaldcl upperwaldcl);
run;

proc print data=work.coeff;
run;

proc print data=work.coeff2;
run;


/*Results:*/

/*proc logistic*/

                                 Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1      0.1612      0.4606        0.1226        0.7263
x1        a     1      0.0806      0.6426        0.0157        0.9002
x1        b     1      0.0806      0.6426        0.0157        0.9002
x2        a     1      0.3852      0.4602        0.7006        0.4026

/*proc genmod*/
                                                                         Prob
Obs    Parameter    Level1    DF    Estimate      StdErr      ChiSq     ChiSq

1     Intercept               1     -0.3852      0.9505       0.16    0.6853
2     x1             a        1      0.2419      1.1394       0.05    0.8319
3     x1             b        1      0.2419      1.1394       0.05    0.8319
4     x1             c        0      0.0000      0.0000        .       .
5     x2             a        1      0.7704      0.9204       0.70    0.4026
6     x2             b        0      0.0000      0.0000        .       .

1 ACCEPTED SOLUTION

Accepted Solutions
Dale
Pyrite | Level 9

Ruth,

The results are consistent with each other.  Both are correct.  However, they have been obtained using different expansions of the categorical variables.  The GENMOD procedure employs an overparameterized model in which a set of k binary variables are produced when the number of levels of a categorical variable is k.  SAS refers to this as the GLM parameterization.  By default, the LOGISTIC procedure employs a model with k-1 variables in the design matrix.  Moreover, the k-1 variables are not binary, but can take on one of three values: -1, 0, or 1.  This sort of parameterization is referred to as effect coding.

For variable X1, columns of the design matrix given GLM coding and effect coding are as follows:

             GLM coding

   X1    X1_1    X1_2    X1_3

    a     1       0       0

    b     0       1       0

    c     0       0       1

        Effect coding

   X1    X1_1    X1_2

    a     1       0

    b     0       1

    c    -1      -1

It can be shown that these two parameterizations will yield the same predicted response values.  But the parameters do have to be interpreted differently.

If you prefer the GLM parameterization (a lot of people do), you can request that parameterization in the LOGISTIC procedure.  All you have to do is change your class statement to:

class x1 x1 / param=glm;

HTH

View solution in original post

2 REPLIES 2
Dale
Pyrite | Level 9

Ruth,

The results are consistent with each other.  Both are correct.  However, they have been obtained using different expansions of the categorical variables.  The GENMOD procedure employs an overparameterized model in which a set of k binary variables are produced when the number of levels of a categorical variable is k.  SAS refers to this as the GLM parameterization.  By default, the LOGISTIC procedure employs a model with k-1 variables in the design matrix.  Moreover, the k-1 variables are not binary, but can take on one of three values: -1, 0, or 1.  This sort of parameterization is referred to as effect coding.

For variable X1, columns of the design matrix given GLM coding and effect coding are as follows:

             GLM coding

   X1    X1_1    X1_2    X1_3

    a     1       0       0

    b     0       1       0

    c     0       0       1

        Effect coding

   X1    X1_1    X1_2

    a     1       0

    b     0       1

    c    -1      -1

It can be shown that these two parameterizations will yield the same predicted response values.  But the parameters do have to be interpreted differently.

If you prefer the GLM parameterization (a lot of people do), you can request that parameterization in the LOGISTIC procedure.  All you have to do is change your class statement to:

class x1 x1 / param=glm;

HTH

Ruth
Fluorite | Level 6

Hi Dale,

It is a thorough and clear answer. Thanks a lot!

It seemed thast SAS has evolved a lot in the past 10 years. The classic book that we (as new starters) use for logistic regression analysis is: Logistic Regression Using SAS: Theory and Application (author: Paul D. Allison). The book is very well written. But the problem is the book was published in 1999 and after that it never gets updated with new versions. So many new SAS features and changes are not reflected in this book. For example, the PROC LOGISTIC has no such parameter options or CLASS statement.

While SAS also highly recommend this book. I hope this book can be updated in the short future.

Thanks again.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 19601 views
  • 1 like
  • 2 in conversation