BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BlueNose
Quartz | Level 8

Hello all,

I have data coming from 7 studies, and it looks like this (numbers are arbitrary):

StudyNSuccess %
115007
210009.5
325014.6
470016.8
5350013.9
667014.5
790023.8

For each study, I have the number of subjects sampled, and the proportion (in percentages) of people succeeding in some criterion.

I wish to calculate the pooled proportion along with CI.

I thought to do it using a generalized linear mixed model (PROC GLIMMIX), using a binomial link function, and the study as a random effect.

My code is:

proc glimmix data = meta;

  class study;

  model success/n =  / solution;

  random intercept / subject=study;

  estimate 'intercept' intercept 1  / cl ilink;

run;

In the output, I got a proportion that equals to the mean of all proportions in the data. My logic say that I should have got the weighted mean, since a random effect makes a difference to the variance only and not the expected value (and therefore the CI).

I tried an older method for meta analysis, the Freeman-Tukey arcsin transformation which yields two results, one for fixed effects and one for random effects. The fixed effect was the weighted mean, like my logic say it should be. The random effect was not far from what GLIMMIX gave me. My double question is:

1) Is my code correct in the first place ?

2) How can I explain the fact (which seems to be correct), that when using random effects, the pooled proportion is more or less equal to the mean of proportions and not the weighted mean ?

Thank you !

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Now the question comes to what you are defining as the weighted mean.  I assume that you are weighting each proportion by the total N.

I also note that what is in there as success % I was using as the number of successes, so the numbers I had are worthless.
So now I have:

data one;

input study n successp;

datalines;

1 1500 7

2 1000 9.5

3 250 14.6

4 700 16.8

5 3500 13.9

6 670 14.5

7 900 23.8

;

data two;

set one;

success=ceil(successp*n/100);

run;

proc glimmix data=two; /* marginal mean */

class study;

model success/n = /solution;

random intercept/subject=study;

estimate 'intercept' intercept 1 / cl ilink;

run;

proc glimmix data=two method=laplace; /* conditional mean */

class study;

model success/n = /solution;

random intercept/subject=study;

estimate 'intercept' intercept 1 / cl ilink;

run;

proc means data=two;

var successp;

weight n;

run;

proc means data=two;

var successp;

run;

Which gives: marginal mean=0.1363,  conditional mean=0.1359, weighted raw mean (as I defined it)=0.1352, and raw (unweighted) mean=0.1430.

So I still don't get that the marginal mean is almost equal to the raw unweighted mean.

Each of these estimates has a different meaning.  Of all, I would choose the conditional mean, knowing that the raw mean and marginal mean suffer from "regression to the mean".  The way I defined the weighted mean though is open to a lot of debate.  Basically, it is the sum of all successes divided by the sum of all n, which is really open to influence by a single study.

Steve Denham

View solution in original post

7 REPLIES 7
SteveDenham
Jade | Level 19

Oddly enough, I don't get a proportion that equals the mean of all proportions with that code.  Please check my numbers against yours.  Here is the code I used:

data one;

input study n success;

datalines;

1 1500 7

2 1000 9.5

3 250 14.6

4 700 16.8

5 3500 13.9

6 670 14.5

7 900 23.8

;

proc glimmix data=one; /* marginal mean */

class study;

model success/n = /solution;

random intercept/subject=study;

estimate 'intercept' intercept 1 / cl ilink;

run;

proc glimmix data=one method=laplace; /* conditional mean */

class study;

model success/n = /solution;

random intercept/subject=study;

estimate 'intercept' intercept 1 / cl ilink;

run;

data two;

set one;

ratio=success/n;

run;

proc means data=two;

var ratio;

run;

For ratio (mean of proportions) I get 0.0212320.

For your code (marginal mean), I get 0.01489.

For my code (conditional mean), I get 0.01443.

So, I am pretty confused and not much help.  What values did you obtain?

Steve Denham


BlueNose
Quartz | Level 8

Steve, thank you and I apologize for confusing you. I mentioned that these were arbitrary numbers, not the "real data". But we can work with it. Does your output number equals the weighted mean of all proportions ? If not, why ?

(maybe in the real data it was by random that it was equal the mean of proportions, I will be more specific and ask not why it is equal the mean, but rather why it is NOT equal the weighted mean)

One clarification, I did check this with a different method of meta analysis, a method I am confident with, and the results are similar to glimmix, so my code (and yours) is probably correct.

SteveDenham
Jade | Level 19

Now the question comes to what you are defining as the weighted mean.  I assume that you are weighting each proportion by the total N.

I also note that what is in there as success % I was using as the number of successes, so the numbers I had are worthless.
So now I have:

data one;

input study n successp;

datalines;

1 1500 7

2 1000 9.5

3 250 14.6

4 700 16.8

5 3500 13.9

6 670 14.5

7 900 23.8

;

data two;

set one;

success=ceil(successp*n/100);

run;

proc glimmix data=two; /* marginal mean */

class study;

model success/n = /solution;

random intercept/subject=study;

estimate 'intercept' intercept 1 / cl ilink;

run;

proc glimmix data=two method=laplace; /* conditional mean */

class study;

model success/n = /solution;

random intercept/subject=study;

estimate 'intercept' intercept 1 / cl ilink;

run;

proc means data=two;

var successp;

weight n;

run;

proc means data=two;

var successp;

run;

Which gives: marginal mean=0.1363,  conditional mean=0.1359, weighted raw mean (as I defined it)=0.1352, and raw (unweighted) mean=0.1430.

So I still don't get that the marginal mean is almost equal to the raw unweighted mean.

Each of these estimates has a different meaning.  Of all, I would choose the conditional mean, knowing that the raw mean and marginal mean suffer from "regression to the mean".  The way I defined the weighted mean though is open to a lot of debate.  Basically, it is the sum of all successes divided by the sum of all n, which is really open to influence by a single study.

Steve Denham

BlueNose
Quartz | Level 8

Your weighted mean is exactly what I meant. In my case it is equal to 17. The "raw" mean (just mean of the proportions, ignoring study size) is 14. The conditional model, gave an estimate closer to 14 than to 17. If I remove the random effects, I do get 17. But the random effect is essential here. After reading your syntax, and with the knowledge from the older method, I am sure my numbers are correct, I just don't understand why the addition of a random effect takes the mean away from the weighted mean, or, maybe it's the whole point of the conditional model, to take into account the influence of each individual study ?

SteveDenham
Jade | Level 19

Get a copy of Walt Stroup's Generalized Linear Mixed Models for a really good discussion of why the conditional mean can differ so much from the marginal mean (see Chapter 3.5, pages 99-115, Chapter 10.2 and 10.3, pages 299-325 and the cover art).

Honestly, I am treating this like a randomized block design, where each study is a block.  That may help explain things as well.

Steve Denham

BlueNose
Quartz | Level 8

Actually, I bought the book last year, there is no question that it's the best (only ?) book about GLMM's, but it is not the easiest to read, probably because the topic is not the easiest. Thank you for the reference of chapters and pages, it's very helpful and appreciated !

I will ask a very difficult question now, hopefully there is an answer to it. The book of Stroup is recommended here a lot. Are there any other books, discussing advanced topics (with SAS examples of course) which can be recommended ? what are the top books for using SAS (mixed models, survival analysis, diagnostic analysis, etc...). I am more biostatistics oriented.

SteveDenham
Jade | Level 19

Well, it's not the only book, but it is probably the only book that digs into GLIMMIX the way we need.  As Walt points out in the introduction, we need to "unlearn" a lot in order to be effective in using GLMMs.

Other SAS books I use a lot:  SAS for Mixed Models, 2nd. ed. by Littell et at., and Multiple Comparisons and Multiple Tests Using SAS, 2nd ed., by Westfall et al. One that I don't have, but would like to get is Vonesh's Generalized Linear and Nonlinear Models for Correlated Data: Theory and Application Using SAS.

For survival analysis, Paul Allison's Survival Analysis Using SAS: A Practical Guide, 2nd ed. is unsurpassed.  For general stuff, Milliken and Johnson's series Analysis of Messy Data is also in this category.

Also for mixed models in R, you might want to look at Zuur et al.'s Mixed Effects Models and Extensions in Ecology with R and Pinheiro and Bates Mixed-effects Models in S and S-PLUS.  Be prepared as these have some fundamental differences in approach, as well as being for an object oriented language.

Steve Denham

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 5830 views
  • 3 likes
  • 2 in conversation