BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kastchei
Pyrite | Level 9

I have a logistic model that will require a random effect, requiring GLIMMIX.  However, before adding the random effect, I wanted to make sure I was specifying my model correctly without a random effect in GLIMMIX by comparing it to the output from GENMOD and LOGISTIC.  My code is below.  All variables are dichotomous: random (ACASI vs. FTF), time (2 vs 1), discTrt (Yes vs No).  The three procedures all give the exact same parameter estimates and standard errors.  GENMOD and LOGISTIC use Wald Chi-square to give p-values and these are identical.  GLIMMIX uses t tests instead, but the p-values are extremely similar.  Great!

There are differences in the Type 3 Analyses.  The differences between GENMOD and LOGISTIC are extremely small and are due to GENMOD using LR test and LOGISTIC using Wald test.  As expected with all dichotomous variables, in LOGISTIC, the parameter estimate p-values equal the Type 3 p-values, because they are both doing the same Wald test.  The parameter estimate p-values in GENMOD equal the parameter estimate p-values from LOGISTIC and the Type 3 LOGISTIC, again, because they are all using the same Wald test.  The only odd man out is the Type 3 GENMOD due to the use of LR.  I'm comfortable with this.

However, GLIMMIX instead performs an F test.  There are several unexpected things with this F test.  Perhaps they are obvious and I'm just overthinking it.  First, with dichotomous variables, I would expect the Type 3 F test statistic to simply be the parameter t statistic squared, as the F stat has 1 num df and the same denom df as the t stat.  This is not the case for either main effect, but is the case (considering rounding) for the interaction.  Thus, the p-values for the parameter estimates and the Type 3 tests, which I expect to be the same for dichotomous predictors, are different.  If I remove the interaction from the model, then the F stats do equal the t stats squared.

Second, as a consequence of the first problem, the Type 3 p-values for the two main effects are no longer similar to the Type 3 p-values from GENMOD or LOGISTIC.  Again, if I remove the interaction, then all three procedures agree.

Something is going on with GLIMMIX when an interaction is included that is different from GENMOD and LOGISTIC.  I can conceptually understand Wald Chi-sq and LR test, so perhaps I'm just no understanding the F test GLIMMIX is using and how the interaction would affect it.  In my results, also below, this doesn't much matter, as the interaction is not significant and can be removed from the model.  However, I am worried that there could be a case where the interaction is significant and must be kept.  Could there be a circumstance where a main effect in LOGISTIC/GENMOD is significant but the main effect is not significant in GLIMMIX, or vice versa? This would change the interpretation quite a bit.  Any insight would be appreciated.

ods select Tests3 ParameterEstimates;

proc glimmix data = aq;

  class random (ref = 'FTF') time (ref = '1') discTrt (ref = 'No');

  model discTrt = random|time / dist = binary solution chisq;

run;

ods select Type3 ParameterEstimates;

proc genmod data = aq descending;

  class random (ref = 'FTF') time (ref = '1') discTrt (ref = 'No') / param = ref;

  model discTrt = random|time / dist = binomial type3;

run;

ods select Type3 ParameterEstimates;

proc logistic data = aq;

  class random (ref = 'FTF') time (ref = '1') discTrt (ref = 'No') / param = ref;

  model discTrt = random|time;

run;

GLIMMIX

Parameter Estimates

EffectrandomtimeEstimateStandard ErrorDFt ValuePr > |t|
Intercept -1.10810.1949741-5.68<.0001
randomACASI 0.43600.26367411.650.0985
randomFTF 0....
time 2-0.92570.2800741-3.310.0010
time 10....
random*timeACASI2-0.41200.3918741-1.050.2933
random*timeACASI10....
random*timeFTF20....
random*timeFTF10....

Type III Tests of Fixed Effects

EffectNum DFDen DFChi-SquareF Value

Pr > chiSq

Pr > F
random17411.381.380.24030.2407
time174133.3833.38<.0001<.0001
random*time17411.111.110.29300.2933

GENMOD

Analysis of Maximum Likelihood Parameter Estimates

ParameterDFEstimateStandard ErrorWald 95% Confidence LimitsWald Chi-SquarePr > ChiSq
Intercept 1-1.10810.1949-1.4902-0.726032.31<.0001
randomACASI 10.43600.2636-0.08060.95262.740.0981
time2 1-0.92570.2800-1.4744-0.376910.930.0009
random*timeACASI21-0.41200.3918-1.17980.35591.110.2930
Scale 01.00000.00001.00001.0000


Note:The scale parameter was held fixed.

LR Statistics For Type 3 Analysis
SourceDFChi-SquarePr > ChiSq
random12.760.0966
time111.000.0009
random*time11.110.2924

LOGISTIC

Type 3 Analysis of Effects

EffectDFWald Chi-SquarePr > ChiSq
random12.73610.0981
time110.93060.0009
random*time11.10590.2930

Analysis of Maximum Likelihood Estimates

ParameterDFEstimateStandard ErrorWald Chi-SquarePr > ChiSq
Intercept 1-1.10810.194932.3077<.0001
randomACASI 10.43600.26362.73610.0981
time2 1-0.92570.280010.93060.0009
random*timeACASI21-0.41200.39181.10590.2930

1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I am glad you figured some things out. Great. I want to re-emphasize a few general points. My first two points are still relevant for anyone looking at this. There will often be differences between LR and Wald test results. This is not a property of any computer procedure, but of the statistical methods. There is no general consensus on the best test. But Wald give the most flexibility. With large complex modeling problems, Wald is the only practical testing method. Second, in general, one cannot equate the square of a single parameter t (or z) statistic with the Wald test statistic for a main effect (in factorials). This is because interaction terms must be accounted for in the main effect tests when testing the equality of means (more on this below). For some factor parameterizations, squaring a t or z will give the global type 3 test statistic (with two levels).

I had not thought much about alternative parameterization before, since I have accepted the value of "GLM" parametrization. With GLM parameterization, the test for A main effect is testing mu_1. = mu_2. (for the two level problem, where the dot is for the average of B). Without thinking about it, I had assumed this would carry over to other parameterizations. Now that I am thinking about it in more detail, this won't necessarily be so. I need to work through the math, but I am pretty sure that with reference parameterization (as an example), the global Wald statistic for  an A main effect does not have an easy interpretation in terms of expected values; it is not testing Ho: mu_1. = mu_2. . . Here's a hint, with the reference parameterization in GENMOD, put in a LSMEANS A; statement. You will get a warning that LSMEANS, LSMESTIMATES, TEST, and SLICE statements only apply to GLM parameterization, and no output.

With reference parameterization and 2-level factor, the global type 3 test is just giving you another way of testing the equality of the (single) parameter to 0 (for a main effect). With 3 or more levels of the factor, then the global test would be simultaneous testing the equality of all of these parameters (for main effect) to 0 (H0: A_1 = A_2 =0). This may be of interest, of course. However, it is not testing the equality of expected values (means), even though the means are often of primary interest. My original response was focusing on means (expected values).

Now I am realizing why GENMOD does not automatically give the table of type 3 test statistics. Interpretation can be strained unless the GLM parameterization is used. I hope this helps you.

View solution in original post

10 REPLIES 10
SteveDenham
Jade | Level 19

Interesting, and maybe a little disturbing as it doesn't seem to fit with all of my preconceived notions on fitting with GLIMMIX in GLM mode.  I would be curious if the results change if you switch the reference level, say for time.  Type 3 is supposed to account for imbalance, but there is something here I am clearly missing.  I am going to tag this response with in the hope that he has some insight.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I was originally quite surprised by your results, so I generated a binary data set last night with two factors. I did not have your problem: I got identical chi-square results for GLIMMIX and GENMOD (I did not use LOGISTIC). Your problem is that you used LR based Type 3 testing in GENMOD and Wald (chi-squared or F) testing in GLIMMIX (at least that is what you showed). (Note that the F statistic is just a scaled Wald statistic). It is well known that LR and Wald tests can give different results. With very large sample sizes (depending on the complexity of the model, structure of the data, etc.), they can give the same results, but I would not assume this for most situations. Often they give reasonably close p values at moderate to large sample sizes, but there are exceptions. The Wald statistics work better at smaller n because LR is based on asymptotic properties of the test statistic. To get Wald type 3 statistics in GENMOD, add wald as an option on the model statement.

You also made the following statement: "First, with dichotomous variables, I would expect the Type 3 F test statistic to simply be the parameter t statistic squared, as the F stat has 1 num df and the same denom df as the t stat.". This is not true, in general, but it is an easy misunderstanding to have. For a simple regression problem, where binary variable is not an indicator for the factor level, the square of the t statistic (est./se) is the F Wald statistic. But it is a different situation when the binary variables are being used to give the expected values or other estimable functions for levels of a factor (especially in a factorial). With a factorial, the test statistic for the main effect of A involves more than just the A parameters. In particular, the main effect mean for A1 is based on a linear combination of A1, the average of the B terms, and the average of the interaction terms that involve A1. So, the type 3 test statistics are based on more complicated contrasts than you are supposing. To see this, add the e option to the model statement in GLIMMIX to see the coefficients used for generating the tests of A, B, and A*B. You can see these involve more than just single terms for the main effects.

One final note. Although this is not affecting your results at all (I don't think at this point), you are using different parametrizations for the class variables in GLIMMIX compared to the others. That is, GLIMMIX only uses the so-called GLM parameterization, where there is always a 0 for the last level. You have no choice for this procedure. The reference parametrization is slightly different, which you elected to use for the other two procedures. Here, there is one less level for each factor. The GENMOD documentation describes the many different ways of parameterizing factors with the CLASS statement. This won't affect your global type 3 results, but it is important to be aware of the differences.The developers of linear model procedures going back to the 1970s have made it clear why they favor the "GLM" parameterization. That is why you only find this in GLM, MIXED, and GLIMMIX.

SteveDenham
Jade | Level 19

Thanks, lvm.  I was thinking I was on crazy pills, since the programs I had all matched as well.  Then I used the first example in the GENMOD documentation, and there the differences were.  Thanks to this, I noticed that all of my GENMOD programs used the Wald option--something from back when, and I had forgotten about it.  Today, I just feel old...

Steve Denham

Kastchei
Pyrite | Level 9

Thanks, Ivm for the long reply.  You have, unknowningly, solved my problem!

The issue did not reside primarily with LR vs. Wald, and indirectly with the coefficients for the Type 3 F tests.  Btw, thank you very much for that theory regarding the coefficients - it makes perfect sense.

The issue actually is the parameterization!  When I change to param = glm for both GENMOD with wald and LOGISTIC and run GLIMMIX with chisq option, all my Type 3 Chi-sq statistics and p-value match perfectly.  Also, if I use param = effect, I also get Type 3 statistics that match each other and match param = GLM.


In summary, either parameterization produces identical parameter estimates and standard errors.  The p-values for either parameterization are extremely similar, only varying due to whether a chi-sq (GENMOD, LOGISTIC) or t (GLIMMIX) stat was used.  However, they produce greatly different statistics and p-values for Type 3 tests.  It appears that GLM parameterization is the preferred method (Steve, do you agree?), and so it looks like I should be including param = glm on my future PROC GENMOD and LOGISTIC procedures.


Does it make sense for the parameterization to affect the Type 3 results so much?  You seemed to indicate that it wouldn't affect them at all.  If I am thinking correctly and understanding your comments about L matrix coefficients, param = ref is basically looking to see if any of the coefficients for the non-reference groups are different from 0, aka the reference group.  Depending on the reference group, some could have OR below 1 and some above 1 but with none being significantly different from 1.  Therefore, Type 3 will not be significant.  param = glm or effect is comparing all groups to each other or to an average response, respectively, across all the data (including interactions).  Type 3 could be significant here when it wasn't with ref, because it's comparing, from ref, the ones with OR below 1 to the the ones with OR above 1, which may be a large enough distance to be significant.  Thus glm or effect will be picking up any differences between groups or any difference from the average, respectively, whereas ref only picks up differences from the control group (perhaps this has some conceptual similarity to Tukey vs. Dunnett adjustments).  Is that about right?


Thanks so much.  I appreciate your time to look at my rather lengthy posts Smiley Happy

Michael


Below is the output for a different dependent variable, but it shows the agreement within a parameterization, but the disagreement in main effects between parameterizations.

param = glm: These three agree.

GLIMMIX with chisq

Type III Tests of Fixed Effects

EffectNum DFDen DFChi-SquareF ValuePr > ChiSqPr > F
random17400.170.170.68020.6803
time17408.328.320.00390.0040
random*time17400.750.750.38610.3864

GENMOD with param = glm and wald

Wald Statistics For Type 3 Analysis

SourceDFChi-SquarePr > ChiSq
random10.170.6802
time18.320.0039
random*time10.750.3861

LOGISTIC with param = glm

Type 3 Analysis of Effects

Effect

DF

Wald

Chi-Square

Pr > ChiSq

random10.16990.6802
time18.31690.0039
random*time10.75100.3861



param = effect: These two agree, and also agree with the previous three.

GENMOD with param = effect and wald

Wald Statistics For Type 3 Analysis

SourceDFChi-SquarePr > ChiSq
random10.170.6802
time18.320.0039
random*time10.750.3861


LOGISTIC with param = effect

Type 3 Analysis of Effects

Effect

DF

Wald

Chi-Square

Pr > ChiSq

random10.16990.6802
time18.31690.0039
random*time10.75100.3861



param = ref: These two agree, but are different from previous three.

GENMOD with param = ref and wald

Wald Statistics For Type 3 Analysis

SourceDFChi-SquarePr > ChiSq
random10.110.7384
time12.210.1370
random*time10.750.3861


LOGISTIC with param = ref

Type 3 Analysis of Effects

Effect

DF

Wald

Chi-Square

Pr > ChiSq

random10.11150.7384
time12.21080.1370
random*time10.75100.3861


SteveDenham
Jade | Level 19

My answer to all of this is that I just use GLIMMIX for all of my logistic regression situations.  I almost never need to calculate AUCs or cutoffs (PROC LOGISTIC), and I came to generalized models from PROC MIXED so I never really used GENMOD.  That's why I was freaking out over potentially different results.  I am still a bit at odds about the parameterization having an effect on the Type 3 tests.  I can see it affecting solution vector tests, but the quadratic forms shouldn't differ for the F or chi-squared tests (note shouldn't in italics).

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I am glad you figured some things out. Great. I want to re-emphasize a few general points. My first two points are still relevant for anyone looking at this. There will often be differences between LR and Wald test results. This is not a property of any computer procedure, but of the statistical methods. There is no general consensus on the best test. But Wald give the most flexibility. With large complex modeling problems, Wald is the only practical testing method. Second, in general, one cannot equate the square of a single parameter t (or z) statistic with the Wald test statistic for a main effect (in factorials). This is because interaction terms must be accounted for in the main effect tests when testing the equality of means (more on this below). For some factor parameterizations, squaring a t or z will give the global type 3 test statistic (with two levels).

I had not thought much about alternative parameterization before, since I have accepted the value of "GLM" parametrization. With GLM parameterization, the test for A main effect is testing mu_1. = mu_2. (for the two level problem, where the dot is for the average of B). Without thinking about it, I had assumed this would carry over to other parameterizations. Now that I am thinking about it in more detail, this won't necessarily be so. I need to work through the math, but I am pretty sure that with reference parameterization (as an example), the global Wald statistic for  an A main effect does not have an easy interpretation in terms of expected values; it is not testing Ho: mu_1. = mu_2. . . Here's a hint, with the reference parameterization in GENMOD, put in a LSMEANS A; statement. You will get a warning that LSMEANS, LSMESTIMATES, TEST, and SLICE statements only apply to GLM parameterization, and no output.

With reference parameterization and 2-level factor, the global type 3 test is just giving you another way of testing the equality of the (single) parameter to 0 (for a main effect). With 3 or more levels of the factor, then the global test would be simultaneous testing the equality of all of these parameters (for main effect) to 0 (H0: A_1 = A_2 =0). This may be of interest, of course. However, it is not testing the equality of expected values (means), even though the means are often of primary interest. My original response was focusing on means (expected values).

Now I am realizing why GENMOD does not automatically give the table of type 3 test statistics. Interpretation can be strained unless the GLM parameterization is used. I hope this helps you.

Kastchei
Pyrite | Level 9

Yes, I understand and agree with everything you said.  It makes sense.

To rephrase and make sure I have it correct, Type 3 with reference coding is testing H0a: βa1 = βa2 = βa3 = ... = βa(k-1) = 0, not H0b: μa1 = μa2 = μa3 = ... = μak.  When there are no interactions, these two hypotheses are equivalent.  When there are interactions, then support for H0a (all main effect coefficients = 0) does not imply that the interaction coefficients are also 0.  Since the means are the combinations of the appropriate main effect level with the appropriate interactive effect levels (average over all the other effects in the model), the means could still be different even if the main effect coefficients are all 0 - that is, H0a is true while H0b is false.  This makes sense from the standpoint of interpretation, as one needs to include an estimate statement or manually add up the interactive terms to correctly determine ORs, etc.

(And it could be the case where the main effect and the interactive effect parameter estimates are both significant, but opposite in magnitude which leaves the actual means of those two groups equal, though different from other possible groups).

Regardless of parameterization, because a dichotomous variable has only one level, this simplifies to H0a: βa = 0.  Without interaction, this is equivalent to H0b: μa1 = μa2.  The F-test for H0b: μa1 = μa2 is thus equivalently an F-test for H0a: βa = 0.  H0a: βa = 0 is also the hypothesis for the t-test of the parameter estimate.  So in this case only, dichotomous variable with no interaction, the F-test of H0b: μa1 = μa2 and the t-test of H0a: βa = 0 are equivalent, and thus the F-statistic will be the square of the t-statistic.  As soon as more levels of A are included or interaction with A are included, this is no longer the case.

With GLM coding, Type 3 is always testing H0b: μa1 = μa2 = μa3 = ... = μak, not simply testing the main effect coefficients (H0a).  This is because, as this paper states (https://support.sas.com/documentation/onlinedoc/v82/techreport_r101.pdf), with GLM, the main effects vectors are linear functions of the the interactive terms, whereas with reference parameterization, they are not.  Thus, testing H0a for Type 3 with reference coding is (somewhat) valid, but testing H0a for Type 3 with GLM coding is not.

Since Type 3 with GLM coding always must test H0b to be valid, the Type 3 F-tests of dichotomous effects will never be equivalent to the parameter t-tests when interactions are involved, unlike the Type 3 reference parameterization test of H0a.

Phew!  I think I understand.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I will have to study your detailed note later to see if I agree with everything. But I think you have captured many important details. I must point out that I am not an expert on the parameterizations other than the GLM. This is the main one I use because of its value in hypothesis testing of expected values (main effects and interactions) and in calculating estimable functions. There are many ways of parameterizing factor level effects, and they all have value for different applications. The individual parameters may all have direct interpretation on an individual basis, but tests of groups of them (say, for a collection of terms making up a main effect) may be hard to interpret. If you are going to be adding random effects into a model, you will have to get used to the GLM parameterization, the only choice available.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Just a short follow up. Be warned that hypothesis testing for the GLM (or other) parameterization is a huge topic, and there are many issues. With the GLM parameterization, the type 3 tests for main effects are tests of marginal means (means averaged over the levels of the other factors). SAS has detailed discussions in the GLM PROC chapter and in a general chapter on type 1- 4 hypothesis tests. For the other parameterizations, the documentation is less extensive.

The chapter for SURVEYLOGISTIC has some useful text: For full-rank parameterization (such as reference, effect, and about everything except GLM),  "The Type 3 test of an effect of interest is the joint test that the parameters associated with that effect are zero.". This is what we discussed already. So all is clear. But then it expands on this, "For a model that uses reference parameterization (as specified by the PARAM=REF option in the CLASS statement), the Type 3 test is [also] a test of the equality of cell means at the reference level of the other model effects". I had to think about this for a while, but I see this now for simple situations. Note that  these are the cell means, not the marginal means, This is important. A test of A is for the equality of the A cell means at the chosen reference level of B. I would call these slices of interaction means (or simple effects); these are GLM-based terms. But the parameters don't allow you to test the equality of cell means of A at the non-reference level (I don't think these can be defined). The presence of an interaction makes this even harder to interpret. I think all is fine if you stick with the simpler interpretation of a "joint test that the parameters associated with the effect are zero". Use GLM parameterization if you want straight-forward interpretation of main effects and interactions.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Forgot... I won't be available for further comments for a few days. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 6022 views
  • 10 likes
  • 3 in conversation