BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello, fellow SAS Users!  I am reaching out to you for help as I begin to learn how to analyze a somewhat complicated set of data.  I am trying to do an analysis to see if an increase in variable X results in a statistically significant increase in my dependent variable, y, which is a count of the number of events variable, while controlling for other variables, say z1 through z5.  The dataset contains measurements (y) on patients at various hospitals and patients are measured anywhere between 1 and 10 times over the course of a year at different time points (not on any standardized or regularly scheduled intervals).  I expect measurements within each hospital to be correlated as the standards of care for each hospital may be different and I also expect each patient's measurements to be correlated and nested within patient, within hospital. I was hoping to use a negative binomial model (or Poisson) to model the counts because at the end of the day, I'd like to be able to say something like, a one unit increase in X results in a 16% increase in the Y, on average, holding all other variables constant (and controlling for hospital and those other variables) .  I am struggling with how to properly set this model up in SAS.  Every patient in my dataset has a unique ID as well as the hospital they went to, along with their other covariates and patients could go to various hospitals and be measured (although their patient ID would remain the same regardless of the hospital they went to).  I am looking for assistance on what might be an appropriate model to use and how to set this up in SAS.  I read a bit about setting this up using general estimating equations (GEE) or perhaps even proc glimmix, but I'm not sure what the best approach is here.  I tried setting this up as follows, but I'm not sure if this would be correct:

proc genmod data = mypatients;

class hospital patientID /param=glm;

  model y = X z1 z2 z3 z4 z5   / type3 dist=nb link=log;

  store p1;

  repeated subject=patientID;

run;

Does this seem right?  It seems I'm neglecting the nesting of measurements within patients and patients within hospitals with this approach.  I'd very much appreciate any guidance or insight others might have.  Thanks so much,in advance,for your assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

This approach in GENMOD does ignore the hierarchical structure of your data, so a possible approach would be:

proc glimmix data=mypatients method=laplace;/* G side methodology to get conditional estimates */

class hospital patientID visitno;

model y = x z1 z2 z3 z4 z5 visitno visitno*x/dist=negbin solution;

random intercept/subject=hospital; /*Random effect of hospital=measurements of patients within hospitals*/

random visitno/type=chol subject=subjectID; /*Random "repeated" effect of subject, modeling visit number as a repeated effect*/
store p1;

run;

Hope this helps you get started.

Steve Denham

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

This approach in GENMOD does ignore the hierarchical structure of your data, so a possible approach would be:

proc glimmix data=mypatients method=laplace;/* G side methodology to get conditional estimates */

class hospital patientID visitno;

model y = x z1 z2 z3 z4 z5 visitno visitno*x/dist=negbin solution;

random intercept/subject=hospital; /*Random effect of hospital=measurements of patients within hospitals*/

random visitno/type=chol subject=subjectID; /*Random "repeated" effect of subject, modeling visit number as a repeated effect*/
store p1;

run;

Hope this helps you get started.

Steve Denham

statistician13
Quartz | Level 8

Steve,

Thanks so much for your assistance.  I think this is exactly what I was thinking and looking for, but had trouble determining how to structure the code.  I have three questions for you on the code above: 

  1. If I want to control or adjust for hospital, do I need to include hospital as a variable in the model statement (e.g. model y = x z1 z2 z3 z4 z5 hospital visitno visitno*x/dist=negbin solution;) or will this be adjusted automatically by incorporating this into the random intercept statement?
  2. Is there a particular reasoning in selecting the CHOL covariance matrix parametrization here?
  3. Why did you decide to include an interaction term of visitno and x in the model?  Is this necessary given my objectives/research question (namely, does an increase in x result in an increase in y given hospital and all the other covariates)?

Thanks again for your assistance.

SteveDenham
Jade | Level 19
  • If I want to control or adjust for hospital, do I need to include hospital as a variable in the model statement (e.g. model y = x z1 z2 z3 z4 z5 hospital visitno visitno*x/dist=negbin solution;) or will this be adjusted automatically by incorporating this into the random intercept statement?

No.  This is one of the differences between GENMOD and GLIMMIX.  Random effects are not included in the MODEL statement.

  • Is there a particular reasoning in selecting the CHOL covariance matrix parametrization here?

Because the visits are both unequally spaced in time, and vary so much from subject to subject, only some type of unstructured covariance matrix seems appropriate up front.  I use the Cholesky root parameterization as it ensures that the covariance matrix is at least positive semidefinite.

  • Why did you decide to include an interaction term of visitno and x in the model?  Is this necessary given my objectives/research question (namely, does an increase in x result in an increase in y given hospital and all the other covariates)?

This was the most difficult part for me without knowledge of how the dataset was constructed.  I have a strong suspicion that X varies from visit to visit while the variables z1 thru z5 are fixed for each ID.  Thus, the inclusion of the covariate (X) by fixed effect (visitno) term.  It tests for equality of slopes per visit.  If it is nonsignificant, I would then fit a successor model that did not include the interaction.  If it is significant, I would fit a model with only the interaction, but specify a NOINT option in the model statement.  See Littell's SAS System for MIxed Models, 2nd ed. for the parts on analysis of covariance, or Milliken and Johnson's Analysis of Messy Data III. Analysis of Covariance for references to this approach.

Hope this answers some of your questions.

Steve Denham

statistician13
Quartz | Level 8

Steve, you are amazing. Thanks so much for your well thought out responses and your kind reply.  This was a huge help.  I understand how all these different components in glimmix are working now.  You are amazing, my friend!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3862 views
  • 0 likes
  • 2 in conversation