Solved: Re: Comparing Binomial Success Parameters in a Stratified Approach - A...

ABiostatistician · Posted 03-23-2015 06:32 PM

I would like to contrast the effectiveness of drug treatment and surgical treatment in a study with the following data. Each row represents one trial, and each trial uses either drugs or surgeries to treat the patients.

Hospital	Treatment	Number of Patients Treated	Number of Patients Who Recovered	Percentage of Patients Who Recovered	Staffing Level	Funding Level	Hospital's Competence	Physician
1	Drug	500	250	50.0%	Low	Poor	?	Bob
1	Surgery	4	1	25.0%	Low	Poor	?	Bob
1	Drug	600	200	33.3%	Low	Poor	?	Bob
1	Surgery	6	2	33.3%	Low	Poor	?	Sue
1	Surgery	7	3	42.9%	Low	Poor	?	Sue
1	Surgery	10	5	50.0%	Low	Poor	?	Sue
2	Drug	50	43	86.0%	High	Rich	?	Scott
2	Drug	50	49	98.0%	High	Rich	?	Scott
2	Drug	60	57	95.0%	High	Rich	?	Mary
2	Surgery	60	52	86.7%	High	Rich	?	Mary
2	Drug	70	60	85.7%	High	Rich	?	Mary
3	Drug	40	35	87.5%	High	Poor	?	Bob
3	Drug	50	40	80.0%	High	Poor	?	Bob
3	Drug	50	45	90.0%	High	Poor	?	Mary

I am looking for a method that will

compare the overall recovery percentages between drugs and surgeries within each hospital. (Thus, Hospital #3 will be excluded in my analysis, because it only contains drug treatments.)
take the varying number of patients treated from trial to trial into account.
aggregate these comparisons for all hospitals.

Point #1 is very important, because I want to control for confounding variables that are due to the hospital's own attributes. I have measurements for 2 of those attributes (staffing and funding), but I can't measure the hospital's competence. Notice that Hospital 1 has low recovery percentages, but that may be due to its low staffing level, poor funding level, or some other confounding variable (like its inherently low competence). I want to ensure that the comparison of the recovery percentages focuses on only the type of treatment and controls for these confounding variables.

To see the significance for Point #2, consider Hospital 1. Its 2 drug trials treated 500 and 600 patients each, yet its 4 surgery trials had a total of 27 patients. Thus, I can't give equal weights to all recovery percentages.

I previously used logistic regression with

the binomial response as the number of patients who recovered out of all who were treated
the treatment type, the staffing level, and the funding level as the predictors
the multiple trials within each hospital as repeated measures

I used PROC GENMOD in SAS to implement the above model; I used its REPEATED statement to add HOSPITAL as a repeated measures effect. I used the odds ratio of the treatment type to compare the effectiveness of drugs vs. surgeries. However, I see this model as insufficient, because it does not control for unknown confounding variables (like the hospital's own competence).

My questions:

A. Can you suggest a method to make such a comparison with my above requirements? I seem to be comparing 2 binomial success parameters, but stratified by hospital and trying to control for confounding variables, but I can't think of a regression method to do this.

B. What procedure in SAS or R can implement your suggestion? My office uses SAS, so I prefer to use SAS, but I can use R if needed.

C. Do I need to explicitly include staffing level and funding level in my model to control for these confounders, or does the stratification of the analysis by hospital control for the confounders already?

SteveDenham · Posted 04-10-2015 03:02 PM

Hmm. My prejudices may be becoming too well known:smileyblush: regarding GLIMMIX.

OK--the GEE approach in GENMOD isn't hierarchical. It seems pretty logical that physician will be nested within hospital, and you COULD consider that as a repeated measure on the hospital.

I assume that recovery_count and patient_count are aggregates within a physician.

A GLIMMIX approach might look like (marginal, pseudolikelihood):

proc glimmix data = hospitaldata;

class treatment (ref = 'Drug')

staffing (ref = 'Low')

funding (ref = 'Low')

hospital

physician_id ;

model recovery_count / patient_count

=

treatment

staffing

funding

staffing * funding

treatment * staffing

treatment * funding

/ dist = bin

link = logit;

random hospital physician*hospital;

lsmeans treatment / ilink diff cl;

lsmeans staffing / ilink diff cl;

lsmeans funding / ilink diff cl;

lsmeans staffing * funding / ilink diff cl;

lsmeans treatment * staffing / ilink diff cl;

lsmeans treatment * funding / ilink diff cl;

run;

To shift to a conditional approach, add method=laplace to the proc glimmix statement, and change the random statement to

random intercept physician/subject=hospital;

Steve Denham

View solution in original post

SteveDenham · Posted 03-30-2015 10:49 AM

Can you share the model statement from your PROC GENMOD run? It should not be a problem to include all of these issues, see whether they add to your understanding of the response, and develop an appropriate comprehensive model.

Steve Denham

ABiostatistician · Posted 04-09-2015 01:41 PM

Sorry for the late reply, Steve. I was on vacation. Thank you very much for offering to examine my code; here it is.

proc genmod

data = hospitaldata;

class treatment (ref = 'Drug')

staffing (ref = 'Low')

funding (ref = 'Low')

hospital;

model recovery_count / patient_count

=

treatment

staffing

funding

staffing * funding

treatment * staffing

treatment * funding

/ dist = bin

link = logit;

repeated

subject = hospital;

lsmeans treatment / exp diff cl;

lsmeans staffing / exp diff cl;

lsmeans funding / exp diff cl;

lsmeans staffing * funding / exp diff cl;

lsmeans treatment * staffing / exp diff cl;

lsmeans treatment * funding / exp diff cl;

run;

SteveDenham · Posted 04-10-2015 01:19 PM

The subject variable needs to be listed in the class statement. I also wonder about report_type, because unless it is an easily understandable continuous variable, it too needs to be included in the class statement. Other than that, I think you have what you need.

Steve Denham.

ABiostatistician · Posted 04-10-2015 02:49 PM

Thanks for your prompt reply, Steve.

I mistakenly used "report_type" and "physician" when I should have used "treatment" and "hospital". My apologies - this has been fixed, and, yes, they are categorical variables and should be in the class statement.

1) Does this GENMOD code implement a multilevel (a.k.a. hierarchical) model?

2) Why didn't you recommend using PROC GLIMMIX instead?

SteveDenham · Posted 04-10-2015 03:02 PM

Hmm. My prejudices may be becoming too well known:smileyblush: regarding GLIMMIX.

OK--the GEE approach in GENMOD isn't hierarchical. It seems pretty logical that physician will be nested within hospital, and you COULD consider that as a repeated measure on the hospital.

I assume that recovery_count and patient_count are aggregates within a physician.

A GLIMMIX approach might look like (marginal, pseudolikelihood):

proc glimmix data = hospitaldata;

class treatment (ref = 'Drug')

staffing (ref = 'Low')

funding (ref = 'Low')

hospital

physician_id ;

model recovery_count / patient_count

=

treatment

staffing

funding

staffing * funding

treatment * staffing

treatment * funding

/ dist = bin

link = logit;

random hospital physician*hospital;

lsmeans treatment / ilink diff cl;

lsmeans staffing / ilink diff cl;

lsmeans funding / ilink diff cl;

lsmeans staffing * funding / ilink diff cl;

lsmeans treatment * staffing / ilink diff cl;

lsmeans treatment * funding / ilink diff cl;

run;

To shift to a conditional approach, add method=laplace to the proc glimmix statement, and change the random statement to

random intercept physician/subject=hospital;

Steve Denham

ABiostatistician · Posted 04-11-2015 01:13 AM

Thanks for your detailed reply, Steve.

1) Are you suggesting that GLIMMIX is better than GENMOD because GLIMMIX can handle multilevel data with more than 2 layers, whereas GENMOD can only handle 2 layers?

2) As I understand, in GENMOD, a repeated measures effect, H, is the second layer, while the multiple observations that are repeated under each value of H form the first layer. H is a categorical latent variable whose regression coefficient is not estimated; its function is to allow the correlation between repeated measures of the same group to be accounted for in estimating the standard errors of the regression coefficients of the other covariates in the model. Is that correct?

3) I actually did not mean to include physicians in my data set. However, now that you mention it, yes, physicians form a second layer between the treatments and the hospitals. Specifically, each hospital contains multiple physicians, and each physician administers multiple treatments. It sounds like you're saying that GENMOD cannot model this type of data, and that your above GLIMMIX code can. Is that correct?

4) A complication that I have with Question #3 is that some physicians work in multiple hospitals. Is that a problem, especially for your code?

Thanks for your help!

SteveDenham · Posted 04-13-2015 09:25 AM

In order:

1. I'm suggesting GLIMMIX is better than GENMOD because, well, I like mixed models, and use GLIMMIX a lot more than GENMOD, so I understand what it does better. I think of this analogy--GLM:MIXED::GENMOD:GLIMMIX, and since I almost always work in a broad inference space, I tend to gravitate to GLIMMIX. However, there are things that can be done in GENMOD (zero inflated distributions and easy Bayesian stuff) that don't show up in GLIMMIX.

2. Yeah, that's what it looks like.

3. Well, my code addresses both levels, and assumes aggregation at the physician level.

4. I was afraid of that being the case. If you consider that a physician at one hospital is "different" from the same physician at another hospital (say, due to staffing, equipment, etc.) then you might code them as different, and my code above should work. If you can't make that assumption then physician is NOT a different level, and thus the random statement should look more like:

random hospital physician;

Good estimates here only if you have a lot of physicians at multiple hospitals.

Hope this helps.

Steve Denham

ABiostatistician · Posted 04-13-2015 01:53 PM

Hi Steve,

5. Your answer to my Question #1 seems to suggest that GENMOD does not implement mixed models. If I run GENMOD with a REPEATED statement to include a repeated measures effect, isn't that a mixed model? For example, in the code below, treatment, staffing and funding are fixed effects, whereas hospital is a random effect. I always thought that a model with a repeated measures effect is a common special case of a mixed model. Please tell me why I am wrong to say that the code below runs a mixed model.

Thanks!

proc genmod

data = hospitaldata;

class treatment (ref = 'Drug')

staffing (ref = 'Low')

funding (ref = 'Low')

hospital;

model recovery_count / patient_count

=

treatment

staffing

funding

/ dist = bin

link = logit;

repeated

subject = hospital;

run;

Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

lvm · Posted 04-13-2015 03:19 PM

GENMOD does not fit mixed models, in the way the term is usually defined. There are only fixed-effects in models fitted by GENMOD. The repeated statement in GENMOD provides a way to adjust for overdispersion and/or correlation, but this is done in a quasi-likelihood way (there is not a real distribution). Essentially, this is done by incorporating terms that would be random into the so-called working correlation matrix (this is done behind the scenes). With this said, the quasi-likelihood approach can be quite useful, if you can live within the constraints of the procedure. I think that true GLMM approaches give much more flexibility, and ease of interpretation.

ABiostatistician · Posted 04-13-2015 06:13 PM

Thanks, lvm. I did not know this, so this is very helpful. Thanks for correcting me.

ABiostatistician · Posted 04-14-2015 01:00 AM

Hi Steve,

I have over 100 physicians, 33% of whom worked at multiple hospitals. Is that enough physicians to implement your code in Point #4 above, where you write

random hospital physician;

?

Thanks.

SteveDenham · Posted 04-14-2015 08:18 AM

That ought to be sufficient, according to the guys on the R-mixed-models list. They are better theoreticians than me, so I trust them (at least on the theory side...)

Steve Denham

ABiostatistician · Posted 04-14-2015 02:11 PM

Thanks, Steve. I tried both methods, and their results are similar.

If physician and hospital are 2 repeated-measures effects on the same level of the hierarchy, then the model's (-2 * residual pseudo-likelihood) is 94350.51
If each physician-hospital pair is considered to be unique, and physician-hospital is 1 level below hospital, then the model's (-2 * residual pseudo-likelihood) is 94318.12

In comparing the quantity (-2 * residual pseudo-likelihood), Model #1 is only 32.39 bigger than Model #2. A smaller (-2 * residual pseudo-likelihood) indicates a better fit, but I don't think that this difference is significant between these 2 models.

The parameter estimates are quite similar.

I think that the assumption of Model #2 is too strong, so I like Model #1 better.

SteveDenham · Posted 04-15-2015 03:18 PM

That's the drawback to the PL approach--it is hard to get metrics to compare models, as the pseudo-data is not the same under the two models, thus even looking at the -2*log pseudo-likelihood values is problematic. Now if it were fit using integration methods, you could look at information criteria to come up with a decision--one that incorporates both model complexity and information completeness. Have you tried METHOD=LAPLACE in the PROC GLIMMIX statement? It may run into memory problems, or execution time problems, or convergence problems...

To compare the two models, you will need to have the RANDOM statements in subject= format.

So, 1:

random intercept hospital/subject=physician;

and 2:

random intercept/subject=hospital;

random intercept/subject=physician;

These two models compute variance components due to physician and physician-hospital in the first case, and due to hospital and physician in the second case (no correlation). I guess you could throw in a third that only considers physician-hospital dyads as:

3:

random intercept/subject=physician*hospital;

See if any of this works with the Laplacian integration method.

Steve Denham

Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics

Re: Comparing Binomial Success Parameters in a Stratified Approach - An Example in Biostatistics