## How do I model repeated measures for a secondary analysis?

Occasional Contributor
Posts: 5

# How do I model repeated measures for a secondary analysis?

I know I could really use a good, detailed primer on repeated measures analysis, but for now I'll just ask a specific question.

How do I go about modeling longitudinal data that was not collected as part of any study, but rather, is data that happens to be available?  Specifically, the number of measurements per subject ranges from 1 to 12 and the time between measurements is all over the place (from a few weeks to many years.)

Here is a quick summary:  Data from a clinical center that treats Alzheimer's disease and other neurodegenerative diseases; also contains data on normal controls.  Research question of interest is:  Does Depression have an effect on change in functional outcomes (ability to perform daily activities such as cooking, bill-paying, etc.), after controlling for changes in cognitive impairment?

So: Each patient either has depression at baseline (dep = 1) or does not (dep = 0.)  The outcome is a composite measure of functionality (Function_total) that we measure each time a patient comes in for a visit.  Num_days is the number of days from baseline to a given visit.  We also want to control for another repeated-measure variable:  Cog_total, which is a measure of the person's cognition measured at each visit.  Other variables we should control for:  baseline_age and Diagnosis (Alzheimer's vs. Parkinson's vs. Control, etc.)

My specific questions include:

- How do I determine the correct covariance matrix to use?

- How do I determine if/when to use the REPEATED vs. RANDOM statements?

- How do I decide whether to include "INTERCEPT" in the RANDOM statement?

- Does the fact that Cog-total (a covariate) is a repeated measure make a difference?  What do I have to do to address that?

I am using this model:

Proc mixed;

class subjID dep Diagnosis;

model function_total = dep num_days dep*num_days cog_total baseline_age diagnosis;

random num_days intercept/subject=subjID;

run;

When I run the model above, all terms in the model are highly significant, including the depression/time interaction term of interest.  If I remove the "intercept" term from the random statement, the interaction term is no longer significant at all.  How should I interpret this?

Any help is much appreciated!!

Posts: 2,655

## Re: How do I model repeated measures for a secondary analysis?

Umm, a lot of what is happening is due to num_days being a continuous variable in the model.  With the huge range you have, I would suggest moving to radial smoothing, as presented in Example 41.6 Radial Smoothing of Repeated Measures Data in the GLIMMIX Procedure documentation.  I have tried to translate your design into the first set of code (there are more based on the outcome of the data in the example, and at least the last may be directly applicable to your data) in the example.

proc glimmix data=your_data;

t2 = num_days / 100;

class subid dep diagnosis;

model function_total = dep num_days dep*num_days cog_total baseline_age diagnosis;

random t2 / type=rsmooth subject=subjid

knotmethod=kdtree;

output out=gmxout pred(blup)=pred;

nloptions tech=newrap;

run;

I hope this gets you started.  For some light reading on repeated measures in SAS, get a copy of Littell et al.'s The SAS System for Mixed Models, 2nd ed.  For a more in depth approach, get McCullagh and Nelder's text, Generalized Linear Models Vanesh and Chinchilli's Linear and Nonlinear Models for the Analysis of Repeated Measurements.

Steve Denham

Occasional Contributor
Posts: 5

## Re: How do I model repeated measures for a secondary analysis?

Thanks Steve - that is helpful.  I will check out the Radial Smoothing info re: GLIMMIX.

One more question that I forgot to include before:  About 25% of the patients in our dataset have only the baseline visit; i.e., they have no follow-up visits so they are not useful for directly answering the question re: change in functional outcome over time.  So:  Should I just remove these from the dataset before running the analysis?  Or is it beneficial to leave them in for the purposes of getting a better sense of the overall distribution (i.e., variation) of the baseline data?

Also, I would still appreciate some guidance on how to answer these questions re: longitudinal analysis, separate from the specific analysis I'm working on now:

- How do I determine if/when to use the REPEATED vs. RANDOM statements?

- How do I decide whether to include "INTERCEPT" in the RANDOM statement?

- Does the fact that a covariate (that I want to control for) is a repeated measure make a difference?  What do I have to do to address that?

Thanks!

Dina

Posts: 2,655

## Re: How do I model repeated measures for a secondary analysis?

Repeated and random--it's why GLIMMIX is a better place to be, with only the RANDOM statement, and repeated/residual being an option.  OK, here's my take--if you have subjects that are measured repeatedly (longitudinal design) then they are likely to be more correlated within subject than between subjects, so that would imply a REPEATED statement (or the residual option to the RANDOM statement in GLIMMIX).  If subjects are blocked or clustered in some way, so that subjects within a block/cluster are likely to be more correlated than subjects between blocks/clusters, then that is a RANDOM effect.  Now REPEATED (and MIXED) are best used if the mean and variance are not functionally related, as in Gaussian or lognormal distributions.  If the response variable is a count or a proportion or a ratio or about anything else, then...  See Stroup's Generalized Linear Mixed Models for a long discussion.

How to decide whether to include INTERCEPT in the RANDOM statement?  For PROC MIXED, the following are equivalent:

random subjid;

random intercept/subject=subjid;

or

random subjid subjid*varname;

random intercept varname/subject=subjid;

For a lot of reasons that involve numerical stability of algorithms, the syntax with subject= is preferred.  To decide if you need INTERCEPT in the model, decide if the subject is going to be a random effect in your model.  This leads to other issues, if there are repeated measurements on the subject.  For some repeated error structures (say compound symmetry), no additional random error due to subject needs to be incorporated, as all of the variability will be modeled with the REPEATED statement.  For others (autoregressive and antedependent structures), there may (or may not, depending on the data at hand) be additional error attributable to variability between subjects.

If the preceding is not absolutely and perfectly clear, I apologize and say repeated measures analysis is best learned by jumping in and seeing what goes wrong first, and then turning to those that have been through the wringer and asking what went wrong.  It might be apparent, and it might not, depending on the data at hand.

If there is any way to subset your data into something with regular follow-up intervals and try various approaches on those until you feel satisfied, and then move on to the more difficult cases, you will be happier in the long run.  What I threw out as an approach using GLIMMIX is not readily transparent, and you'll need confidence in your data and models before moving to a way to handle really oddly collected repeated measurements.

Oh, and the time varying covariate thing.  That is another problem.  If you are worried about age, then initial age is all I would have in the model--the age at succeeding follow-ups will be a linear function that is included in the time part of the model.  If it is something else, then this is going to get even more difficult.

You can include the patients that only have a baseline, but be aware that inclusion means that they influence any continuous covariate and continuous by class covariate estimates.  Makes the population inferred to broader, but if drop out is associated with the covariates, then all of the inference is biased.  See if there is any relationship between measures at initial and continuance on study before deciding to include or exclude.

Steve Denham

Message was edited by: Steve Denham

Discussion stats
• 3 replies
• 303 views
• 7 likes
• 2 in conversation