BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I would like to contrast the effectiveness of drug treatment and surgical treatment in a study with the following data. Each row represents one trial, and each trial uses either drugs or surgeries to treat the patients.

HospitalTreatmentNumber of Patients TreatedNumber of Patients Who RecoveredPercentage of Patients Who RecoveredStaffing LevelFunding LevelHospital's CompetencePhysician
1Drug50025050.0%LowPoor?Bob
1Surgery4125.0%LowPoor?Bob
1Drug60020033.3%LowPoor?Bob
1Surgery6233.3%LowPoor?Sue
1Surgery7342.9%LowPoor?Sue
1Surgery10550.0%LowPoor?Sue
2Drug504386.0%HighRich?Scott
2Drug504998.0%HighRich?Scott
2Drug605795.0%HighRich?Mary
2Surgery605286.7%HighRich?Mary
2Drug706085.7%HighRich?Mary
3Drug403587.5%HighPoor?Bob
3Drug504080.0%HighPoor?Bob
3Drug504590.0%HighPoor?Mary

I am looking for a method that will

  1. compare the overall recovery percentages between drugs and surgeries within each hospital. (Thus, Hospital #3 will be excluded in my analysis, because it only contains drug treatments.)
  2. take the varying number of patients treated from trial to trial into account.
  3. aggregate these comparisons for all hospitals.

Point #1 is very important, because I want to control for confounding variables that are due to the hospital's own attributes. I have measurements for 2 of those attributes (staffing and funding), but I can't measure the hospital's competence. Notice that Hospital 1 has low recovery percentages, but that may be due to its low staffing level, poor funding level, or some other confounding variable (like its inherently low competence). I want to ensure that the comparison of the recovery percentages focuses on only the type of treatment and controls for these confounding variables.


To see the significance for Point #2, consider Hospital 1. Its 2 drug trials treated 500 and 600 patients each, yet its 4 surgery trials had a total of 27 patients. Thus, I can't give equal weights to all recovery percentages.



I previously used logistic regression with

  • the binomial response as the number of patients who recovered out of all who were treated
  • the treatment type, the staffing level, and the funding level as the predictors
  • the multiple trials within each hospital as repeated measures

I used PROC GENMOD in SAS to implement the above model; I used its REPEATED statement to add HOSPITAL as a repeated measures effect. I used the odds ratio of the treatment type to compare the effectiveness of drugs vs. surgeries. However, I see this model as insufficient, because it does not control for unknown confounding variables (like the hospital's own competence).


My questions:

A. Can you suggest a method to make such a comparison with my above requirements? I seem to be comparing 2 binomial success parameters, but stratified by hospital and trying to control for confounding variables, but I can't think of a regression method to do this.

B. What procedure in SAS or R can implement your suggestion? My office uses SAS, so I prefer to use SAS, but I can use R if needed.

C. Do I need to explicitly include staffing level and funding level in my model to control for these confounders, or does the stratification of the analysis by hospital control for the confounders already?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Hmm.  My prejudices may be becoming too well known:smileyblush: regarding GLIMMIX.

OK--the GEE approach in GENMOD isn't hierarchical.  It seems pretty logical that physician will be nested within hospital, and you COULD consider that as a repeated measure on the hospital.

I assume that recovery_count and patient_count are aggregates within a physician.

A GLIMMIX approach might look like (marginal, pseudolikelihood):

proc glimmix     data = hospitaldata;

     class     treatment (ref = 'Drug')

                  staffing (ref = 'Low')

                  funding (ref = 'Low')

                  hospital

                  physician_id  ;

     model recovery_count / patient_count

           =

               treatment

               staffing

               funding

               staffing * funding

               treatment * staffing

               treatment * funding

                    /    dist = bin

                         link = logit;

    random hospital physician*hospital;

     lsmeans treatment / ilink diff cl;

     lsmeans staffing / ilink diff cl;

     lsmeans funding / ilink diff cl;

     lsmeans staffing * funding / ilink diff cl;

     lsmeans treatment * staffing / ilink diff cl;

     lsmeans treatment * funding / ilink diff cl;

run;

To shift to a conditional approach, add method=laplace to the proc glimmix statement, and change the random statement to

random intercept physician/subject=hospital;

Steve Denham

View solution in original post

14 REPLIES 14
SteveDenham
Jade | Level 19

Can you share the model statement from your PROC GENMOD run?  It should not be a problem to include all of these issues, see whether they add to your understanding of the response, and develop an appropriate comprehensive model.

Steve Denham

ABiostatistician
Calcite | Level 5

Sorry for the late reply, Steve.  I was on vacation.  Thank you very much for offering to examine my code; here it is.

proc genmod

     data = hospitaldata;

     class     treatment (ref = 'Drug')

                  staffing (ref = 'Low')

                  funding (ref = 'Low')

                  hospital;

     model recovery_count / patient_count

           =

               treatment

               staffing

               funding

               staffing * funding

               treatment * staffing

               treatment * funding

                    /    dist = bin

                         link = logit;

     repeated

               subject = hospital;

     lsmeans treatment / exp diff cl;

     lsmeans staffing / exp diff cl;

     lsmeans funding / exp diff cl;

     lsmeans staffing * funding / exp diff cl;

     lsmeans treatment * staffing / exp diff cl;

     lsmeans treatment * funding / exp diff cl;

run;

SteveDenham
Jade | Level 19

The subject variable needs to be listed in the class statement.  I also wonder about report_type, because unless it is an easily understandable continuous variable, it too needs to be included in the class statement.  Other than that, I think you have what you need.

Steve Denham.

ABiostatistician
Calcite | Level 5

Thanks for your prompt reply, Steve. 

I mistakenly used "report_type" and "physician" when I should have used "treatment" and "hospital".  My apologies - this has been fixed, and, yes, they are categorical variables and should be in the class statement.

1) Does this GENMOD code implement a multilevel (a.k.a. hierarchical) model?

2) Why didn't you recommend using PROC GLIMMIX instead? 

SteveDenham
Jade | Level 19

Hmm.  My prejudices may be becoming too well known:smileyblush: regarding GLIMMIX.

OK--the GEE approach in GENMOD isn't hierarchical.  It seems pretty logical that physician will be nested within hospital, and you COULD consider that as a repeated measure on the hospital.

I assume that recovery_count and patient_count are aggregates within a physician.

A GLIMMIX approach might look like (marginal, pseudolikelihood):

proc glimmix     data = hospitaldata;

     class     treatment (ref = 'Drug')

                  staffing (ref = 'Low')

                  funding (ref = 'Low')

                  hospital

                  physician_id  ;

     model recovery_count / patient_count

           =

               treatment

               staffing

               funding

               staffing * funding

               treatment * staffing

               treatment * funding

                    /    dist = bin

                         link = logit;

    random hospital physician*hospital;

     lsmeans treatment / ilink diff cl;

     lsmeans staffing / ilink diff cl;

     lsmeans funding / ilink diff cl;

     lsmeans staffing * funding / ilink diff cl;

     lsmeans treatment * staffing / ilink diff cl;

     lsmeans treatment * funding / ilink diff cl;

run;

To shift to a conditional approach, add method=laplace to the proc glimmix statement, and change the random statement to

random intercept physician/subject=hospital;

Steve Denham

ABiostatistician
Calcite | Level 5

Thanks for your detailed reply, Steve.

1) Are you suggesting that GLIMMIX is better than GENMOD because GLIMMIX can handle multilevel data with more than 2 layers, whereas GENMOD can only handle 2 layers? 

2) As I understand, in GENMOD, a repeated measures effect, H, is the second layer, while the multiple observations that are repeated under each value of H form the first layer.  H is a categorical latent variable whose regression coefficient is not estimated; its function is to allow the correlation between repeated measures of the same group to be accounted for in estimating the standard errors of the regression coefficients of the other covariates in the model.  Is that correct?

3) I actually did not mean to include physicians in my data set.  However, now that you mention it, yes, physicians form a second layer between the treatments and the hospitals.  Specifically, each hospital contains multiple physicians, and each physician administers multiple treatments.  It sounds like you're saying that GENMOD cannot model this type of data, and that your above GLIMMIX code can.  Is that correct?

4) A complication that I have with Question #3 is that some physicians work in multiple hospitals.  Is that a problem, especially for your code?

Thanks for your help!

SteveDenham
Jade | Level 19

In order:

1. I'm suggesting GLIMMIX is better than GENMOD because, well, I like mixed models, and use GLIMMIX a lot more than GENMOD, so I understand what it does better.  I think of this analogy--GLM:MIXED::GENMOD:GLIMMIX, and since I almost always work in a broad inference space, I tend to gravitate to GLIMMIX.  However, there are things that can be done in GENMOD (zero inflated distributions and easy Bayesian stuff) that don't show up in GLIMMIX.

2. Yeah, that's what it looks like.

3. Well, my code addresses both levels, and assumes aggregation at the physician level.

4. I was afraid of that being the case.  If you consider that a physician at one hospital is "different" from the same physician at another hospital (say, due to staffing, equipment, etc.) then you might code them as different, and my code above should work.  If you can't make that assumption then physician is NOT a different level, and thus the random statement should look more like:

random hospital physician;

Good estimates here only if you have a lot of physicians at multiple hospitals.

Hope this helps.

Steve Denham

ABiostatistician
Calcite | Level 5

Hi Steve,

5.  Your answer to my Question #1 seems to suggest that GENMOD does not implement mixed models.  If I run GENMOD with a REPEATED statement to include a repeated measures effect, isn't that a mixed model?  For example, in the code below, treatment, staffing and funding are fixed effects, whereas hospital is a random effect.  I always thought that a model with a repeated measures effect is a common special case of a mixed model.  Please tell me why I am wrong to say that the code below runs a mixed model.

Thanks!

proc genmod

     data = hospitaldata;

     class     treatment (ref = 'Drug')

                  staffing (ref = 'Low')

                  funding (ref = 'Low')

                  hospital;

     model recovery_count / patient_count

           =

               treatment

               staffing

               funding

                    /    dist = bin

                         link = logit;

     repeated

               subject = hospital;

run;

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

GENMOD does not fit mixed models, in the way the term is usually defined. There are only fixed-effects in models fitted by GENMOD. The repeated statement in GENMOD provides a way to adjust for overdispersion and/or correlation, but this is done in a quasi-likelihood way (there is not a real distribution). Essentially, this is done by incorporating terms that would be random into the so-called working correlation matrix (this is done behind the scenes). With this said, the quasi-likelihood approach can be quite useful, if you can live within the constraints of the procedure. I think that true GLMM approaches give much more flexibility, and ease of interpretation.

ABiostatistician
Calcite | Level 5

Thanks, lvm.  I did not know this, so this is very helpful.  Thanks for correcting me.

ABiostatistician
Calcite | Level 5

Hi Steve,

I have over 100 physicians, 33% of whom worked at multiple hospitals.  Is that enough physicians to implement your code in Point #4 above, where you write

random hospital physician;

?

Thanks.

SteveDenham
Jade | Level 19

That ought to be sufficient, according to the guys on the R-mixed-models list.  They are better theoreticians than me, so I trust them (at least on the theory side...)

Steve Denham

ABiostatistician
Calcite | Level 5

Thanks, Steve.  I tried both methods, and their results are similar. 

  1. If physician and hospital are 2 repeated-measures effects on the same level of the hierarchy, then the model's (-2 * residual pseudo-likelihood) is 94350.51
  2. If each physician-hospital pair is considered to be unique, and physician-hospital is 1 level below hospital, then the model's (-2 * residual pseudo-likelihood) is 94318.12

In comparing the quantity (-2 * residual pseudo-likelihood), Model #1 is only 32.39 bigger than Model #2.  A smaller (-2 * residual pseudo-likelihood) indicates a better fit, but I don't think that this difference is significant between these 2 models.

The parameter estimates are quite similar.

I think that the assumption of Model #2 is too strong, so I like Model #1 better.

SteveDenham
Jade | Level 19

That's the drawback to the PL approach--it is hard to get metrics to compare models, as the pseudo-data is not the same under the two models, thus even looking at the -2*log pseudo-likelihood values is problematic.  Now if it were fit using integration methods, you could look at information criteria to come up with a decision--one that incorporates both model complexity and information completeness. Have you tried METHOD=LAPLACE in the PROC GLIMMIX statement?  It may run into memory problems, or execution time problems, or convergence problems...

To compare the two models, you will need to have the RANDOM statements in subject= format.

So, 1:

random intercept hospital/subject=physician;

and 2:

random intercept/subject=hospital;

random intercept/subject=physician;

These two models compute variance components due to physician and physician-hospital in the first case, and due to hospital and physician in the second case (no correlation).  I guess you could throw in a third that only considers physician-hospital dyads as:

3:

random intercept/subject=physician*hospital;

See if any of this works with the Laplacian integration method.

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2246 views
  • 6 likes
  • 3 in conversation