BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Henry
Calcite | Level 5

I am learning generalized linear mixed models using Proc Glimmix in SAS. I use an example from SAS website:

 

http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_glimmix_gett...

 

 

data multicenter;

   input center group$ n sideeffect;

   datalines;

 1  A  32  14

 1  B  33  18

 2  A  30   4

 2  B  28   8

 3  A  23  14

 3  B  24   9

 4  A  22   7

 4  B  22  10

 5  A  20   6

 5  B  21  12

 6  A  19   1

 6  B  20   3

 7  A  17   2

 7  B  17   6

 8  A  16   7

 8  B  15   9

 9  A  13   1

 9  B  14   5

10  A  13   3

10  B  13   1

11  A  11   1

11  B  12   2

12  A  10   1

12  B   9   0

13  A   9   2

13  B   9   6

14  A   8   1

14  B   8   1

15  A   7   1

15  B   8   0

;

run;

 

data multicenter2;

        set multicenter;

        do i=1 to sideeffect;

               y=1; output;

        end;

        do i=1 to n-sideeffect;

               y=0; output;

        end;

run;

 

proc glimmix data=multicenter;

   class center group;

   model sideeffect/n = group / solution;

   random intercept / subject=center;

run;

 

proc glimmix data=multicenter2;

        class center group;

        model y = group /link=logit dist=binomial solution;

        random intercept / subject=center;

run;

 

 

The example uses events/trials syntax to run logistic regression with random effects. Since I am not familiar with events/trials syntax, I made an experiment. I generated standard 0/1 data and use GLMM with no events/trials syntax to run. I believed that the program produces exactly same results because both are logistic random effects model. However, results show that estimators and SD from two methods are same, but degree of freedom for group are different, which leads to different P-values. It looks like both are right. I don’t know which P-value I need to use.  Thank you in advance.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
JacobSimonsen
Barite | Level 11

There are two parts in this question: 1) why is the results different and 2) what result is most correct.

part 1 is easist to answer. Default method to calculate number of freedom is the "containment method". DF is calculated as df=n-rank[X Z] where X is the collumns of fixed effect to be tested and Z is the collumns of random effects. The rank of [X Z] is 16 in both models (1+15). but, the number of obserations differs much in the two models, 30 in the first and 503 in the other. Therefore, df is respectively 14 and 487.

 

but what is most correct? I think 14 is the correct number of DF. Because, when the binomial distributed variable is splitted into a number of 0/1 observations, then the calculation of DF does not account for that there are many observations within same center (the random effect), but instead it think that all the df of the residuals are "between subject". I also think that the option "DDFM=BETWITHIN" should be used, which would give you df=14 in both models.

 

By the way, I tried this variation of your program and found something interesting...

data multicenter3;
        set multicenter;
        w=sideeffect;
               y=1; output;
        w= n-sideeffect;
               y=0; output;
run;


proc glimmix data=multicenter3;
     class center group;
        model y = group /link=logit dist=binomial solution  ;
        random intercept / subject=center;
        weight w;
run;

Which I should be a better way to split the observation into 0/1 observations, because the dataset will be smaller. I thought it should give same number as the program you wrote. To my surprise, it doesnt use weights to calculate the number of freedom. That must be a mistake!  it gives df=42, which probably is calculated as 58-16, and 58 is here the number of observations when weights is NOT counted. But weight should be counted.

 

 

View solution in original post

4 REPLIES 4
JacobSimonsen
Barite | Level 11

There are two parts in this question: 1) why is the results different and 2) what result is most correct.

part 1 is easist to answer. Default method to calculate number of freedom is the "containment method". DF is calculated as df=n-rank[X Z] where X is the collumns of fixed effect to be tested and Z is the collumns of random effects. The rank of [X Z] is 16 in both models (1+15). but, the number of obserations differs much in the two models, 30 in the first and 503 in the other. Therefore, df is respectively 14 and 487.

 

but what is most correct? I think 14 is the correct number of DF. Because, when the binomial distributed variable is splitted into a number of 0/1 observations, then the calculation of DF does not account for that there are many observations within same center (the random effect), but instead it think that all the df of the residuals are "between subject". I also think that the option "DDFM=BETWITHIN" should be used, which would give you df=14 in both models.

 

By the way, I tried this variation of your program and found something interesting...

data multicenter3;
        set multicenter;
        w=sideeffect;
               y=1; output;
        w= n-sideeffect;
               y=0; output;
run;


proc glimmix data=multicenter3;
     class center group;
        model y = group /link=logit dist=binomial solution  ;
        random intercept / subject=center;
        weight w;
run;

Which I should be a better way to split the observation into 0/1 observations, because the dataset will be smaller. I thought it should give same number as the program you wrote. To my surprise, it doesnt use weights to calculate the number of freedom. That must be a mistake!  it gives df=42, which probably is calculated as 58-16, and 58 is here the number of observations when weights is NOT counted. But weight should be counted.

 

 

Rick_SAS
SAS Super FREQ

Hi Jacob,

I think you are confusing weights with frequencies. In regression analyses, weights do not affect the degrees of freedom. For an explanation and example, see the article "The difference between frequencies and weights in regression analysis."

JacobSimonsen
Barite | Level 11

Yes, correct, I did mess up weight with frequency. The program I intented to write should be with "freq" statement instead of "weight".

proc glimmix data=multicenter3;
     class center group;
        model y = group /link=logit dist=binomial solution  ;
        random intercept / subject=center;
		freq w;
run;

Then the degrees of freedom gives what I expect (487), which though not is the number of DF I will recommend to use.

 

Henry
Calcite | Level 5

Hi JacobSimonsen,

 

Thank you for your reply. I think you are right. When using vents/trials syntax, SAS automatically apply DDFM=containment option, which leads to df=14. But in my program, DDFM=BETWITHIN should be used, since it accounts for correlation within clusters. I found a paper disccusing this question "Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials". It suggests that "Between-Within denominator degrees of freedom approximation method for F tests should be recommended when the GLMM is used in analysing CRTs with binary outcomes and few heterogeneous clusters, due to its type I error properties and relatively higher power". 

 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1888 views
  • 3 likes
  • 3 in conversation