Solved
New Contributor
Posts: 2

# DF difference in Proc glimmix when using events/trials syntax

I am learning generalized linear mixed models using Proc Glimmix in SAS. I use an example from SAS website:

http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_glimmix_gett...

data multicenter;

input center group\$ n sideeffect;

datalines;

1  A  32  14

1  B  33  18

2  A  30   4

2  B  28   8

3  A  23  14

3  B  24   9

4  A  22   7

4  B  22  10

5  A  20   6

5  B  21  12

6  A  19   1

6  B  20   3

7  A  17   2

7  B  17   6

8  A  16   7

8  B  15   9

9  A  13   1

9  B  14   5

10  A  13   3

10  B  13   1

11  A  11   1

11  B  12   2

12  A  10   1

12  B   9   0

13  A   9   2

13  B   9   6

14  A   8   1

14  B   8   1

15  A   7   1

15  B   8   0

;

run;

data multicenter2;

set multicenter;

do i=1 to sideeffect;

y=1; output;

end;

do i=1 to n-sideeffect;

y=0; output;

end;

run;

proc glimmix data=multicenter;

class center group;

model sideeffect/n = group / solution;

random intercept / subject=center;

run;

proc glimmix data=multicenter2;

class center group;

model y = group /link=logit dist=binomial solution;

random intercept / subject=center;

run;

The example uses events/trials syntax to run logistic regression with random effects. Since I am not familiar with events/trials syntax, I made an experiment. I generated standard 0/1 data and use GLMM with no events/trials syntax to run. I believed that the program produces exactly same results because both are logistic random effects model. However, results show that estimators and SD from two methods are same, but degree of freedom for group are different, which leads to different P-values. It looks like both are right. I don’t know which P-value I need to use.  Thank you in advance.

Accepted Solutions
Solution
‎01-25-2016 02:52 AM
Super Contributor
Posts: 301

## Re: DF difference in Proc glimmix when using events/trials syntax

There are two parts in this question: 1) why is the results different and 2) what result is most correct.

part 1 is easist to answer. Default method to calculate number of freedom is the "containment method". DF is calculated as df=n-rank[X Z] where X is the collumns of fixed effect to be tested and Z is the collumns of random effects. The rank of [X Z] is 16 in both models (1+15). but, the number of obserations differs much in the two models, 30 in the first and 503 in the other. Therefore, df is respectively 14 and 487.

but what is most correct? I think 14 is the correct number of DF. Because, when the binomial distributed variable is splitted into a number of 0/1 observations, then the calculation of DF does not account for that there are many observations within same center (the random effect), but instead it think that all the df of the residuals are "between subject". I also think that the option "DDFM=BETWITHIN" should be used, which would give you df=14 in both models.

By the way, I tried this variation of your program and found something interesting...

``````data multicenter3;
set multicenter;
w=sideeffect;
y=1; output;
w= n-sideeffect;
y=0; output;
run;

proc glimmix data=multicenter3;
class center group;
model y = group /link=logit dist=binomial solution  ;
random intercept / subject=center;
weight w;
run;``````

Which I should be a better way to split the observation into 0/1 observations, because the dataset will be smaller. I thought it should give same number as the program you wrote. To my surprise, it doesnt use weights to calculate the number of freedom. That must be a mistake!  it gives df=42, which probably is calculated as 58-16, and 58 is here the number of observations when weights is NOT counted. But weight should be counted.

All Replies
Solution
‎01-25-2016 02:52 AM
Super Contributor
Posts: 301

## Re: DF difference in Proc glimmix when using events/trials syntax

There are two parts in this question: 1) why is the results different and 2) what result is most correct.

part 1 is easist to answer. Default method to calculate number of freedom is the "containment method". DF is calculated as df=n-rank[X Z] where X is the collumns of fixed effect to be tested and Z is the collumns of random effects. The rank of [X Z] is 16 in both models (1+15). but, the number of obserations differs much in the two models, 30 in the first and 503 in the other. Therefore, df is respectively 14 and 487.

but what is most correct? I think 14 is the correct number of DF. Because, when the binomial distributed variable is splitted into a number of 0/1 observations, then the calculation of DF does not account for that there are many observations within same center (the random effect), but instead it think that all the df of the residuals are "between subject". I also think that the option "DDFM=BETWITHIN" should be used, which would give you df=14 in both models.

By the way, I tried this variation of your program and found something interesting...

``````data multicenter3;
set multicenter;
w=sideeffect;
y=1; output;
w= n-sideeffect;
y=0; output;
run;

proc glimmix data=multicenter3;
class center group;
model y = group /link=logit dist=binomial solution  ;
random intercept / subject=center;
weight w;
run;``````

Which I should be a better way to split the observation into 0/1 observations, because the dataset will be smaller. I thought it should give same number as the program you wrote. To my surprise, it doesnt use weights to calculate the number of freedom. That must be a mistake!  it gives df=42, which probably is calculated as 58-16, and 58 is here the number of observations when weights is NOT counted. But weight should be counted.

SAS Super FREQ
Posts: 3,834

## Re: DF difference in Proc glimmix when using events/trials syntax

[ Edited ]

Hi Jacob,

I think you are confusing weights with frequencies. In regression analyses, weights do not affect the degrees of freedom. For an explanation and example, see the article "The difference between frequencies and weights in regression analysis."

Super Contributor
Posts: 301

## Re: DF difference in Proc glimmix when using events/trials syntax

Yes, correct, I did mess up weight with frequency. The program I intented to write should be with "freq" statement instead of "weight".

``````proc glimmix data=multicenter3;
class center group;
model y = group /link=logit dist=binomial solution  ;
random intercept / subject=center;
freq w;
run;``````

Then the degrees of freedom gives what I expect (487), which though not is the number of DF I will recommend to use.

New Contributor
Posts: 2

## Re: DF difference in Proc glimmix when using events/trials syntax

Hi JacobSimonsen,

Thank you for your reply. I think you are right. When using vents/trials syntax, SAS automatically apply DDFM=containment option, which leads to df=14. But in my program, DDFM=BETWITHIN should be used, since it accounts for correlation within clusters. I found a paper disccusing this question "Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials". It suggests that "Between-Within denominator degrees of freedom approximation method for F tests should be recommended when the GLMM is used in analysing CRTs with binary outcomes and few heterogeneous clusters, due to its type I error properties and relatively higher power".

☑ This topic is solved.