BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elaineb
Calcite | Level 5

s
When using the PROC GLM for repeated measures, how can I assess the effect of the different factors in the model, including continuous variables. Especifically, there are some significant factors in the univariate ANOVA (interaction across time), but I don't know how to tease out the way the interaction works. I tried "solution" option in the model statement, but it does not come out.

Thank you for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I must admit I am stuck as far as getting the parameters.  You might try removing the nouni option, in which case you would get the parameters for each time point.  However...

I would really recommend changing the analysis to PROC MIXED.  It does a much better job of handling repeated measures than does GLM.  First I would transform the data into long form:

data long;

set temp;

length=len6;time=6;output;

length=len9;time=9;output;

length=len12;time=12;output;

length=len18;time=18;output;

length=len24;time=24;output;

drop len6 len9 len12 len18 len24;

run;

data long;

set long;

timer=time; /* Sets an identical value that will be used as a continuous variable in PROC MIXED  */

run;

The PROC MIXED code would then look like:

proc mixed data=long;

class sex medu time subjid; /* This assumes that each child has a unique ID already on the original temp dataset  */

model length=sex|medu|time birthwei durbf/solution ddfm=kr(firstorder); /* See below for comments on this model */

repeated time/type= sp(pow)(timer) subject=subjid;

random intercept/subject=subjid;

run;

The model statement looks at separate trajectories in time for all sex by medu combinations.  Later on, you can construct LSMESTIMATE statements to test hypotheses of interest.  Birthweight and duration of breastfeeding are fit as continous covariates.  Marginal means would be at the mean values of each of these covariates (LSMEANS statement not included);  The ddfm=kr(firstorder) applies the Kenward-Rogers correction to the standard errors and to degrees of freedom.  It should be standard for small to moderate sized datasets (less than 10,000 subjects).

The F tests will address differences between marginal means and whether continuous covariates differ from zero.

As far as the repeated and random statements: I chose the spatial power estimate because of the uneven spacing in time of the measurements.  This models the correlation between measures as a power function dependent on the length of time between measurements.  This correlation is the key difference in approaches between GLM and MIXED.  The GLM approach assumes sphericity/independence of repeated measures, and that is definitely not the case for growth curves.  For example, look at Example 59.2 Repeated Measures in the MIXED documentation, where Pothoff and Roy's classic growth measurements dataset is examined.  There are very extensive examples here, almost all of which could pertain to your dataset.

As an aside, I include the random intercept statement, as this removes subject to subject variability as a separate source of variation, leaving the residual variability to be modeled as a correlated growth curve.  This may be omitted, but has shown to be of value in modeling repeated measurements with autoregressive type errors (see Littell, Henry and Ammerman, J. Anim Sci. 1998, 76:1216-1231).

Steve Denham


View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

When you say, "it does not come out" with the solution option, what do you mean?  Are the estimates not printing, or do they print, but seem unreasonable?  If it's the first, then there is probably something we can address in the syntax, but if it is the second, we will need some context.  Sharing your GLM code and the study design would be a good starting point.

Steve Denham

elaineb
Calcite | Level 5

Thank you Steve. Only the ANOVA tables are printed, not the coefficient estimates for each variable in the model. I have longitudinal study with children length measurements at 6, 9, 12, 18 and 24 months (about 200 chidren). I want to assess the association between growth and some variables like parental height, duration of breastfeeding (in months), mother's education (categorical), and others. My code:

PROC GLM; data=temp;

class sex medu;

model len6 len9 len12 len18 len24 = sex birthwei medu durbf / solution nouni;

repeated lenage 5 (6 9 12 18 24) polynomial / summary printe;

run;

I have the between and within SS's but I would like to see how the significant variables in the model (betwen and/or within) are associated with growth, in which way. for example, I have that duration of breastfeeding is associated with the shape od the growth curve (within subjects ANOVA), but can't tell how... Is there a way of getting this information, please?

Thank you.

SteveDenham
Jade | Level 19

I must admit I am stuck as far as getting the parameters.  You might try removing the nouni option, in which case you would get the parameters for each time point.  However...

I would really recommend changing the analysis to PROC MIXED.  It does a much better job of handling repeated measures than does GLM.  First I would transform the data into long form:

data long;

set temp;

length=len6;time=6;output;

length=len9;time=9;output;

length=len12;time=12;output;

length=len18;time=18;output;

length=len24;time=24;output;

drop len6 len9 len12 len18 len24;

run;

data long;

set long;

timer=time; /* Sets an identical value that will be used as a continuous variable in PROC MIXED  */

run;

The PROC MIXED code would then look like:

proc mixed data=long;

class sex medu time subjid; /* This assumes that each child has a unique ID already on the original temp dataset  */

model length=sex|medu|time birthwei durbf/solution ddfm=kr(firstorder); /* See below for comments on this model */

repeated time/type= sp(pow)(timer) subject=subjid;

random intercept/subject=subjid;

run;

The model statement looks at separate trajectories in time for all sex by medu combinations.  Later on, you can construct LSMESTIMATE statements to test hypotheses of interest.  Birthweight and duration of breastfeeding are fit as continous covariates.  Marginal means would be at the mean values of each of these covariates (LSMEANS statement not included);  The ddfm=kr(firstorder) applies the Kenward-Rogers correction to the standard errors and to degrees of freedom.  It should be standard for small to moderate sized datasets (less than 10,000 subjects).

The F tests will address differences between marginal means and whether continuous covariates differ from zero.

As far as the repeated and random statements: I chose the spatial power estimate because of the uneven spacing in time of the measurements.  This models the correlation between measures as a power function dependent on the length of time between measurements.  This correlation is the key difference in approaches between GLM and MIXED.  The GLM approach assumes sphericity/independence of repeated measures, and that is definitely not the case for growth curves.  For example, look at Example 59.2 Repeated Measures in the MIXED documentation, where Pothoff and Roy's classic growth measurements dataset is examined.  There are very extensive examples here, almost all of which could pertain to your dataset.

As an aside, I include the random intercept statement, as this removes subject to subject variability as a separate source of variation, leaving the residual variability to be modeled as a correlated growth curve.  This may be omitted, but has shown to be of value in modeling repeated measurements with autoregressive type errors (see Littell, Henry and Ammerman, J. Anim Sci. 1998, 76:1216-1231).

Steve Denham


elaineb
Calcite | Level 5

Thanks a lot, Steve. I have used PROC MIXED with another kind of data. I tried with this present dataset but it did not give me the answers either.

But I will try now to use for this data the options and covariance structure you are suggesting.

Best regards,

Elaine


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1834 views
  • 0 likes
  • 2 in conversation