turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- DF difference in Proc glimmix when using events/tr...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2016 01:17 AM

I am learning generalized linear mixed models using Proc Glimmix in SAS. I use an example from SAS website:

**data** multicenter;

input center group$ n sideeffect;

datalines;

1 A 32 14

1 B 33 18

2 A 30 4

2 B 28 8

3 A 23 14

3 B 24 9

4 A 22 7

4 B 22 10

5 A 20 6

5 B 21 12

6 A 19 1

6 B 20 3

7 A 17 2

7 B 17 6

8 A 16 7

8 B 15 9

9 A 13 1

9 B 14 5

10 A 13 3

10 B 13 1

11 A 11 1

11 B 12 2

12 A 10 1

12 B 9 0

13 A 9 2

13 B 9 6

14 A 8 1

14 B 8 1

15 A 7 1

15 B 8 0

;

**run**;

**data** multicenter2;

set multicenter;

do i=**1** to sideeffect;

y=**1**; output;

end;

do i=**1** to n-sideeffect;

y=**0**; output;

end;

**run**;

**proc** **glimmix** data=multicenter;

class center group;

model sideeffect/n = group / solution;

random intercept / subject=center;

**run**;

**proc** **glimmix** data=multicenter2;

class center group;

model y = group /link=logit dist=binomial solution;

random intercept / subject=center;

**run**;

The example uses events/trials syntax to run logistic regression with random effects. Since I am not familiar with events/trials syntax, I made an experiment. I generated standard 0/1 data and use GLMM with no events/trials syntax to run. I believed that the program produces exactly same results because both are logistic random effects model. However, results show that estimators and SD from two methods are same, but degree of freedom for group are different, which leads to different P-values. It looks like both are right. I don’t know which P-value I need to use. Thank you in advance.

Accepted Solutions

Solution

01-25-2016
02:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2016 11:23 AM

There are two parts in this question: 1) why is the results different and 2) what result is most correct.

part 1 is easist to answer. Default method to calculate number of freedom is the "containment method". DF is calculated as df=n-rank[X Z] where X is the collumns of fixed effect to be tested and Z is the collumns of random effects. The rank of [X Z] is 16 in both models (1+15). but, the number of obserations differs much in the two models, 30 in the first and 503 in the other. Therefore, df is respectively 14 and 487.

but what is most correct? I think 14 is the correct number of DF. Because, when the binomial distributed variable is splitted into a number of 0/1 observations, then the calculation of DF does not account for that there are many observations within same center (the random effect), but instead it think that all the df of the residuals are "between subject". I also think that the option "DDFM=BETWITHIN" should be used, which would give you df=14 in both models.

By the way, I tried this variation of your program and found something interesting...

```
data multicenter3;
set multicenter;
w=sideeffect;
y=1; output;
w= n-sideeffect;
y=0; output;
run;
proc glimmix data=multicenter3;
class center group;
model y = group /link=logit dist=binomial solution ;
random intercept / subject=center;
weight w;
run;
```

Which I should be a better way to split the observation into 0/1 observations, because the dataset will be smaller. I thought it should give same number as the program you wrote. To my surprise, it doesnt use weights to calculate the number of freedom. That must be a mistake! it gives df=42, which probably is calculated as 58-16, and 58 is here the number of observations when weights is NOT counted. But weight should be counted.

All Replies

Solution

01-25-2016
02:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2016 11:23 AM

There are two parts in this question: 1) why is the results different and 2) what result is most correct.

part 1 is easist to answer. Default method to calculate number of freedom is the "containment method". DF is calculated as df=n-rank[X Z] where X is the collumns of fixed effect to be tested and Z is the collumns of random effects. The rank of [X Z] is 16 in both models (1+15). but, the number of obserations differs much in the two models, 30 in the first and 503 in the other. Therefore, df is respectively 14 and 487.

but what is most correct? I think 14 is the correct number of DF. Because, when the binomial distributed variable is splitted into a number of 0/1 observations, then the calculation of DF does not account for that there are many observations within same center (the random effect), but instead it think that all the df of the residuals are "between subject". I also think that the option "DDFM=BETWITHIN" should be used, which would give you df=14 in both models.

By the way, I tried this variation of your program and found something interesting...

```
data multicenter3;
set multicenter;
w=sideeffect;
y=1; output;
w= n-sideeffect;
y=0; output;
run;
proc glimmix data=multicenter3;
class center group;
model y = group /link=logit dist=binomial solution ;
random intercept / subject=center;
weight w;
run;
```

Which I should be a better way to split the observation into 0/1 observations, because the dataset will be smaller. I thought it should give same number as the program you wrote. To my surprise, it doesnt use weights to calculate the number of freedom. That must be a mistake! it gives df=42, which probably is calculated as 58-16, and 58 is here the number of observations when weights is NOT counted. But weight should be counted.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2016 12:05 PM - edited 01-24-2016 12:06 PM

Hi Jacob,

I think you are confusing weights with frequencies. In regression analyses, weights do not affect the degrees of freedom. For an explanation and example, see the article "The difference between frequencies and weights in regression analysis."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2016 12:52 PM

Yes, correct, I did mess up weight with frequency. The program I intented to write should be with "freq" statement instead of "weight".

```
proc glimmix data=multicenter3;
class center group;
model y = group /link=logit dist=binomial solution ;
random intercept / subject=center;
freq w;
run;
```

Then the degrees of freedom gives what I expect (487), which though not is the number of DF I will recommend to use.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-25-2016 02:51 AM

Hi JacobSimonsen,

Thank you for your reply. I think you are right. When using vents/trials syntax, SAS automatically apply DDFM=containment option, which leads to df=14. But in my program, DDFM=BETWITHIN should be used, since it accounts for correlation within clusters. I found a paper disccusing this question "Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials". It suggests that "Between-Within denominator degrees of freedom approximation method for F tests should be recommended when the GLMM is used in analysing CRTs with binary outcomes and few heterogeneous clusters, due to its type I error properties and relatively higher power".