Solved: Re: Longitudinal growth model using proc mixed

jgreenberg321 · Posted 07-26-2020 06:46 PM

Hello,

I am trying to model change in average hospital complication rates over time using proc mixed. I've tried reading articles online regarding the appropriate syntax for specifying this but want to make sure I am interpreting the results correctly.

proc mixed data=have;

class hospitalid;

model complicated_rate=time/solution chisq;

random intercept time/type=un subject=hospitalid_index g vc;

run;

However, my questions are as follows:

1) I tried specifying autocorrelation with "type=ar(1)" in my random statement, and my AIC was substantially larger. I assume that is indicating the model is better fit without specifying autocorrelation?

2) How can I request a significance test on my random effects (i.e. does the effect of time vary significantly across hospitals)?

Thank you in advance!

SteveDenham · Posted 07-27-2020 08:23 AM

There is a way to use an autoregressive error structure, but it requires separating time and hospitalid into a REPEATED statement and a RANDOM statement, respectively.

This would look like:

proc mixed data=have;
class hospitalid;
model complicated_rate=time/solution chisq;
random intercept /subject=hospitalid ;
repeated time/type=ar(1) subject=hospitalid;
run;

@sld points out why you can't have a single random statement with intercept and time, and an AR(1) error structure.

Now if you wish to model heterogeneity of variance by time, you could look at type=arh(1) in the repeated statement, and compare AICC values to select a potential error structure.

Last, and maybe most importantly, what sort of variable is complicated_rate? Is it a proportion, that could possibly be put into the form (# of patients with complications/ total # of patients). If so, you may want to shift to PROC GLIMMIX, where you can fit this as a binomial variable.

SteveDenham

View solution in original post

sld · Posted 07-27-2020 01:13 AM

You specify hospitalid in the CLASS statement, but subject = hospitalid_index in the RANDOM statement.

What is the difference between hospitalid and hospitalid_index? How many levels of hospitalid are in your dataset?

Do your results change if you use hospitalid as the subject in the RANDOM statement with type=UN?

The RANDOM statement models a 2 x 2 covariance structure, with the intercept variance and slope variance on the diagonal, and the covariance of intercept and slope on the off-diagonal. Using type=ar(1) on the RANDOM statement is nonsensical and does not do what you want. Logically, the off-diagonal elements in the covariance matrix are not autocorrelated (although it's a moot point when there is only one covariance), and more importantly you do not want homogeneous (i.e., equal) variances for intercept and slope. The residuals from the regression of complicated_rate on time might be autocorrelated, but that would be addressed by the REPEATED statement in Proc MIXED.

Have you plotted complicated_rate versus time by hospitalid_index to visually assess whether there is a relationship with time (and whether it appears linear), whether variance among intercepts is appreciable, and whether variance among slopes is appreciable?

You likely would find SAS for Mixed Models: Introduction and Basic Applications by Stroup et al. informative, especially Ch 10 on random coefficients models. You'll notice that in their examples, they do not model autocorrelation of residuals; some people do, some people don't.

Leaping to your questions:

Q1: No, using type=ar(1) is wrong.

Q2: I would switch to Proc GLIMMIX and use the COVTEST statement: GLIMMIX documentation for COVTEST

I hope this helps move you forward.

SteveDenham · Posted 07-27-2020 08:23 AM

There is a way to use an autoregressive error structure, but it requires separating time and hospitalid into a REPEATED statement and a RANDOM statement, respectively.

This would look like:

proc mixed data=have;
class hospitalid;
model complicated_rate=time/solution chisq;
random intercept /subject=hospitalid ;
repeated time/type=ar(1) subject=hospitalid;
run;

@sld points out why you can't have a single random statement with intercept and time, and an AR(1) error structure.

Now if you wish to model heterogeneity of variance by time, you could look at type=arh(1) in the repeated statement, and compare AICC values to select a potential error structure.

Last, and maybe most importantly, what sort of variable is complicated_rate? Is it a proportion, that could possibly be put into the form (# of patients with complications/ total # of patients). If so, you may want to shift to PROC GLIMMIX, where you can fit this as a binomial variable.

SteveDenham

jgreenberg321 · Posted 07-27-2020 08:56 AM

Thank you both for the feedback. That helps to clarify a lot regarding the appropriate syntax structure. To follow-up a few points:

1) The trend does look approximately linear when plotted across time, and visually there does seem to heterogeneity across cluster.

2) Regarding the outcome variable: the variable is essentially a binomial proportion (i.e. proportion of cases that had a complication). I had initially planned to use Glimmix, but didn't when the overall plot of the complication rates looked very much normally distributed. Would you still recommend modeling the event as a binomial outcome? Is the structure of the syntax (i.e. the repeated statement) generally the same when using Glimmix rather than proc mixed?

Thank you!

SteveDenham · Posted 07-27-2020 10:25 AM

If the distribution is relatively unskewed, assuming normal errors probably is OK, but it depends on the number of observations for each level of hospitalid.

As far as repeated measures in GLIMMIX, there is no REPEATED statement. You use a RANDOM statement with a residual option if you are fitting using pseudo-likelihood (recently recommended by Stroup and Claassen (2020), or model the "repeatedness" as a G-side random effect with a maximum likelihood method (type=laplace or type=quad).

jgreenberg321 · Posted 07-27-2020 12:39 PM

Thanks. I have about 5-7 time-point observations per level of hospitalid. The overall distribution of complication rates is pretty normally distributed. I did try to model model as a binomial in glimmix just to compare, but realized that may be problematic since the outcome is not an integer value (i.e. it is not 0/1 but rather 2.4, for example).

I noticed one other things that seemed unusual: if I am not misunderstanding, it seems that to use proc mixed and put a random coefficient on time, along with a repeated statement, the "time" variable needs to be entered in the class statement. That would seem unusual, because it would prevent testing a linear fixed effect on time. Am I mistaken? Thank you!

sld · Posted 07-27-2020 02:07 PM

A few thoughts regarding whether to use a normal distribution for proportions, or a binomial distribution for (number of complications)/(number of cases):

(1) I would think that both (number of complications) and (number of cases) would be integers, so I'm puzzled by your statement that the outcome is not an integer. But then I don't fully know your study.

(2) The binomial approach is preferable if (number of cases) is variable, or if the proportion of complications is small (or large), or if the (number of cases) is small such that proportion of cases is relatively discrete.

(3) Distributional assumptions apply not to the overall (meaning all observations combined) set of response data but rather to the response conditional on the predictors, which is why we often use residual diagnostics to assess distributional assumptions.

Good spotting on the use of time. The solution to that is to have two copies of time, one continuous and one categorical:

data have;
set have;
time_c = time;
run;

proc mixed data=have;
class hospitalid time_c;
model complicated_rate=time/solution chisq;
random intercept /subject=hospitalid ;
repeated time_c / type=ar(1) subject=hospitalid;
run;

This combination of RANDOM and REPEATED with AR(1) is referred to as "AR(1)+RE"; see Littell et al. (2000) Modelling covariance structure in the analysis of repeated measures data for a nice description of the distinction between this structure and plain AR(1) (which you would implement by keeping the REPEATED statement and omitting the RANDOM statement). In my practice, with small datasets, I usually have estimation trouble with AR(1)+RE.

jgreenberg321 · Posted 07-27-2020 02:44 PM

Thank you for the detailed explanation.

To clarify the outcome, it's a risk-adjusted complication rate on the hospital level. I'm not directly measuring/modeling the number of 0's and 1's in this context.

Regarding the rest of your explanation, that all makes sense. I have found that trying to add some complexity to these models does create some problems with convergence, as you mentioned. Practically, I am primarily interested in the fixed effect estimate of time, which is easier to model. Thanks again.

SteveDenham · Posted 07-28-2020 07:42 AM

Oh. So an individual may have more than one complication The numerator is number of complications rather than number of patients who have at least one complication, adjusted for risk prevalence (assumption on the latter) Normal seems pretty good then. What does a QQ plot of the residuals look like? It that doesn't show any signs of variance dependence on the value, you should be fine.

SteveDenham

jgreenberg321 · Posted 07-28-2020 09:16 AM

Well, it's a risk-adjusted rate per hospital per year. It's actually calculated as the total predicted number of events over the total number of expected, per year. Therefore the numbers are generally non-integer values.

jgreenberg321 · Posted 07-28-2020 09:18 AM

Sorry- the qq plot looks pretty good. There is some deviation out at the upper end of the data but quite linear over most of the data.

SteveDenham · Posted 07-28-2020 10:09 AM

@jgreenberg321 :In the immortal words of Leroy Jethro Gibbs, "Never apologize. It's a sign of weakness." Everything you have says use a normal distribution. Well done.

SteveDenham

jgreenberg321 · Posted 07-28-2020 10:11 AM

Ha! Thank you— and thank you for your help!