BookmarkSubscribeRSS Feed
nsns
Obsidian | Level 7

Hi,

I have a very large dataset that includes daily measurements over 52 weeks (365 days - a year) for each participant. We are interested in evaluating the effect of treatment (active and control) on the measurement for each participant. 

(There are multiple sites and within each site there are participants and each participant has daily measurements.)  I want to account for the repeated measurements of each individual within the site.

 

The model will include  site as a random effect.  In addition to the overall effect of treatment we want to include a weekly effect (measurements taken during the week are likely to be correlated). 

1)  What is the appropriate random statement for site? The statement below does not seem to work. 

2) I am trying to decide whether including the daily measurements with week as an effect (i.e. there are 7 measurements in each week) by using 2 repeated terms i.e. random _residual (see option 1), but this does not work.  Alternatively, should I summarize the measurements by week (i.e. mean of each 7 measurements for that week using proc means) and then use the mean weekly measurement as the dependent.(option 2).

Note that in both cases I've commented out the random effect for site -since this syntax is also not working.

Any help with this would be greatly appreciated.

 

The code I am running looks like this:

 

Option 1:

*using daily measurements;

proc glimmix data=<raw data> order=data;  method=reml;

class site id trt week;

model meanweekly=site trt week trt*week / ddfm=kr;

*random int / subject=site;

random _residual_ / subject=id(site) type=ar(1);

random _residual_ / subject=week(id) type=ar(1);

run;

 

Option 2:

*using the mean weekly measurements;

proc glimmix data=<weekly_summarized data> order=data;  method=reml;

class site id trt week;

model meanweekly=site trt week trt*week / ddfm=kr;

*random int / subject=site;

random _residual_ / subject=id(site) type=ar(1);

run;

8 REPLIES 8
nsns
Obsidian | Level 7
Hi,,
Since I did not receive any replies, I am wondering if my questions are not clear. I will try again:

There are two topics in question.
1)Regarding the input data: What type of input data is best to use -
a)raw data which includes 365 measurements (DAY) per participant with a variable called WEEK (7 days per week).
OR
b)summarized data that include mean weekly data (WEEK) - one observation per week.

2) Regarding the model - There is a random effect that is the site and the repeated measurements that are taken for each participants.
The statement RANDOM int / subject=site; is not working. Any ideas why not?
The code is as follows:
proc glimmix data=< raw or weekly summarized > order=data; method=reml;
class site id trt week;
model mean=site trt week trt*week / ddfm=kr;
*RANDOM int / subject=site; **G matrix;
RANDOM _residual_ / subject=id(site) type=ar(1); **repeated measures for participants within sites; *R;
RANDOM _residual_ / subject=week(id) type=ar(1); **Include this if using raw data (not weekly summarized data); *R;
run;

Are the RANDOM statements correct? The G matrix Random statement i.e. Site - is not working.

Any input is appreciated. Thank you.
StatsMan
SAS Super FREQ

Remove SITE from the MODEL statement. In GLIMMIX (and MIXED) random effects do not go on the MODEL statement. While GLIMMIX can handle an AR(1) structure for a small number of time points (as in a designed experiment), you may have a tough time fitting this model to your observational data. Try the model with all your data, but you may need to summarize to weekly (or even monthly) data to get a mixed model to work here.

nsns
Obsidian | Level 7

Thank you so much for your reply. 

I 1) removed site from the model statement 2) ran the G matrix random together with the R matrix random with  residual option statements on weekly summarized data with TOEP (instead of AR(1)) as follows

RANDOM int / subject=site; 

RANDOM week / subject=id(site) residual type=TOEP;

After over an hour, SAS stopped processing because of insufficient memory.

 

When I run the model including site in the model statement without the RANDOM int/subject=site then the model runs successfully in around 15 seconds run time.

 

Using monthly summarized data with the RANDOM int/subject=site statement, the model converged but the G matrix was not positive definite - so something is wrong there as well.  

I can't figure out what is the problem with using site as a random effect.  Any ideas?

And how is the model affected if I use site as a fixed effect instead of a random effect? 

Thanks for your insights.

SteveDenham
Jade | Level 19

I would try the following code:

 

proc glimmix data=<weekly_summarized data> order=data;  method=reml;
class site id trt week;
model meanweekly=site trt week trt*week / ddfm=kr2;
random int / subject=site;
random week/ residual subject=id(site) type=ar(1);

run;

this is close to your Option 2, but rearranged a bit. @StatsMan makes a good point about fitting weekly averages, but even that may require more memory than is available to you.  The V matrix including the random and residual effects may be too large, which might lead to using monthly averages, or using month as a by variable and doing 12 separate analyses on a monthly basis and combining what you get with the monthly average analysis.

 

I also switched the ddfm=kr to ddfm=kr2, as the latter is better "behaved".

 

SteveDenham

 

nsns
Obsidian | Level 7

Thanks Steve.  

I ran your code which took a very long time to run- but again got a message about insufficient memory.

Curious to what your thoughts are (or anyone else) about not using site as a random variable, but rather include as a fixed effect.   Often times site in included in the model statement and not as a random effect.  

Actually the data with which I am analyzing now runs much better when I include site as a fixed effect and drop the random effect.  I do of course include the residual  effect (week).

 

Thanks.

SteveDenham
Jade | Level 19

One thing to try before you change site to a fixed effect would be to drop the ddfm=kr2 option.  That worked for us just the other day.  You could then adjust for at least some of the bias by using the EMPIRICAL option in the PROC GLIMMIX statement.  If the model converges, be sure to check the denominator degrees of freedom associated with F tests and tests of differences in LSMEANS.  If it looks like the "whole-plot" effects are being tested with the residual degrees of freedom, you may have to specify the denominator degrees of freedom using the ddf= option.  Additionally, don't include 'site' as both a fixed and a random effect, especially if you have a large number of sites (as I did in the code I posted, and for which I apologize).

 

Another possibility, depending on how many treatments are being examined, is to use an infinite number for the denominator degrees of freedom by specifying the CHISQ option rather than the ddfm= option in the MODEL statement.  This will give both F tests and chi-squared tests for the effects in the model. With 52 weekly timepoints, the whole-plot df is large enough to consider this option.  If you go to monthly averages, it still may be a very good approximation.

 

SteveDenham

nsns
Obsidian | Level 7

Thanks for your response Steve.

My model has actually has more terms in it than in my example (baseline parameters - some crossed with treatment) together with multiple treatment groups (with one control group).

I ran the model as you suggested with site as a random effect (removing from model statement) and compared the ddf and the least square estimates and differences. 

The models converged but with these notes in the log

NOTE: Estimated G matrix is not positive definite.

NOTE: A linear combination of covariance parameters is confounded with the residual variance.

The second note did not appear in the model with emphirical option.

 

The ddfm was the same and high in the 2 models you suggested (removing ddfm=kr and removing ddfm=kr + emphirical option) for all the effects.  Type 3 test for fixed effects was different (understandably) for the cross effects with a higher "natural" df (i.e the trt*site and the trt*week). 

Checking the lsmeans and differences, they were the same for the two models but pvalues were different due to the standard errors being smaller in the emphirical model (I'm not so clear exactly why - I need to read up more about that). However, the estimates themselves were very similar to the model with site as a fixed effect.

 

If I want to force the ddf would you suggest I use the degrees of freedom from the model without site as a random effect?

Thanks again for your help and input.  I really appreciate it.

 

SteveDenham
Jade | Level 19

If you are going to use the ddf option on a split-plot design, use the df from the Type3 analysis.  I would then apply the empirical option.

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1618 views
  • 0 likes
  • 3 in conversation