Solved: Cluster-randomized design

RLarl · Posted 10-25-2024 02:09 PM

Hello.

I am seeking help with specifying the right model to analyzed data from a cluster-randomized control trial.

I apologize for the long message, but I am trying to be as clear as possible.

I expect an overall improvement over time (t1 vs t2), but stronger improvement in the treatment versus the control condition. In other words, I expect an interaction between condition and time.

The dependent variable is measured in three different ways, that is, using three different measures - they are all on the same scale. I expect the same interaction effect mentioned above, for all three measures. In addition, I have two individual differences variables: personality1 and personality2, which I want to include as full factors.

The complication is that the random assignment to one of the the conditions was not done at the subject level, but rather at site level. The variable SITE has 23 sites with a roughly, although not identical N of subj in each (circa 30 in each). 13 sites were assigned to the treatment, and 10 to the control condition.

Finally, the site were in different areas, so I have an additional factor called AREA, which I intend to treat as random.

To recap:

Cond: Treat vs Ctrl

Time: T1 vs T2 (repeated)

Measure: a, b, c, d (repeated)

subj = unique identifier for each participant

P1 and P2 = continuous predictors

DV = continuous variable

This is the basic model that I am thinking of using:

proc mixed data=dataset1;

class subj cond time measure site area;

model DV = cond|time|measure|P1|P2;

random intercept /subject=area;

random intercept /subject=site(area); /since site are nested within area*/

repeated / subject=subj type=cs;

My questions :

1. Are the two random statements enough to account for the fact that the random assignment is done at the site level, instead of being at the subj level?

2. Is the repeated statement correct, or do I have to add the two repeated variables?

If I have to add the repeated factors in the repeated statement, I gather from reading some documentation and several articles that have used somewhat similar designs, that i have a few options.

If I want to use "compound symmetry" as covariance structure, I cannot simply specify two repeated factors - it does not work.

But I seem to have two options:

Option 1. put one repeated factor in the group option of the repeated statement.

In my case I would thus use this:

proc mixed data=dataset1;

class subj cond time measure site area;

model DV = cond|time|measure|P1|P2;

random intercept /subject=area;

random intercept /subject=site(area); /since site are nested within area*/

repeated Time / Subject = id*measure Group = measure Type = CS R Rcorr;

Using the above syntax however, I get different results depending on which of the two factors (time or measure) I indicate in the group= . I do not understand why.

Option 2. Create a new class variable in my data set, which combines time and measure

data new_dataset1;
set data=dataset1;
time_measure = cats(time, '_', measure); /* Combine time and measure into a single factor */
run;

proc mixed data=new_dataset1;

class subj cond time measure site area time_measure;

model DV = cond|time|measure|P1|P2;

random intercept /subject=area;

random intercept /subject=site(area); /since site are nested within area*/

repeated time_measure / subject=subj type=cs;

Option 3. A third option entails changing the covariance structure

proc mixed data=dataset1;

class subj cond time measure site area ;

model DV = cond|time|measure|P1|P2;

random intercept /subject=area;

random intercept /subject=site(area); /since site are nested within area*/

repeated time measure/ subject=subj type = UN@UN ;

Option 2 (repeated time_measure / subject=subj type=cs) and the original model (repeated / subject=subj type=cs) give the exact same results, but Option 1 (repeated Time / Subject = id*measure Group = measure Type = CS R Rcorr) and Option 3 (repeated time measure/ subject=subj type = UN@UN 😉 give different results from each other and from both Option 2 and the original model.

For reasons that have to do with previous research, I would prefer to use compound symmetry as covariance structure, thus the original model, if valid, or option 1 or 2. Are they equally valid?

SteveDenham · Posted 11-04-2024 02:46 PM

I spent a bit more time going over this but still am not sure about the design and number of levels within some of the factors. So lets start back at the first model you proposed, so that we can get through some thing that might help:

proc mixed data=dataset1; 
class subj cond time measure site area;
model DV = cond|time|measure|P1|P2;
*random intercept /subject=area;
*random intercept /subject=site(area); /since site are nested within area*/
/* These two statements could be reduced to a single statement. Since they are equivalent to random area; and random site*area; you could get by with the single statement: */
random intercept site/subject=area type=cs;

*repeated / subject=subj type=cs;
/*For the REPEATED statement make sure that subj is completely unique, such that there are no subjects with the same identifier at the various sites. Also, since the various measures are found on all subjects, you should consider the following REPEATED statement*/
repeated time/subject=subj*measure type=cs;

This REPEATED statement is a sort of kludgy way of getting away from the Kronecker product approach. It depends on measure effect on the dependent variable being relatively homogeneous (i.e., only a level effect, and that is independent of the other terms in the model).

There are ways around some of the seemingly unending times to completion, but I am not sure if they are all available in SAS Studio. You might want to consider moving to GLIMMIX where you can use the NLOPTIONS statement to tune the iterative algorithm.

SteveDenham

View solution in original post

jiltao · Posted 10-29-2024 03:10 PM

Those are different ways of modeling correlations in your data, and all look reasonable to me. You might examine the Fit Statistics table. The model with the smaller AICC or BIC indicates a better fit to your data.

Thanks,

Jill

RLarl · Posted 10-31-2024 05:23 AM

@jiltao , thanks for the comment

SteveDenham · Posted 10-30-2024 08:53 AM

Preferring a structure may not always be the best approach to covariance structure selection. A compound symmetry structure enforces equal variances for each subject, and equal covariances between subjects. If the study is cluster randomized, this may not be a reasonable assumption. @jiltao suggested looking at the various information criteria as a method of selecting between structures and I agree with one caveat - be sure that the data for the structures being compared are identical. In your Option 3 with the Kronecker product, the input data might be different to your original model or the other options. I don't know what happens in the case where there is missing fixed effect datapoints for the Kronecker family of structures.

SteveDenham

RLarl · Posted 10-31-2024 01:21 PM

hello Steve.

of course you are right that covariance structures should be selected with a criterion, and that cs may not be appropriate. I have read that unstructured (type=UN) is best in these cases, but I cannot get it to work on SAS studio -- in the sense that it spins forever and I have to terminate the procedure. Would variance component be a reasonable choice for my design, instead of CS?

SteveDenham · Posted 11-04-2024 02:46 PM

I spent a bit more time going over this but still am not sure about the design and number of levels within some of the factors. So lets start back at the first model you proposed, so that we can get through some thing that might help:

proc mixed data=dataset1; 
class subj cond time measure site area;
model DV = cond|time|measure|P1|P2;
*random intercept /subject=area;
*random intercept /subject=site(area); /since site are nested within area*/
/* These two statements could be reduced to a single statement. Since they are equivalent to random area; and random site*area; you could get by with the single statement: */
random intercept site/subject=area type=cs;

*repeated / subject=subj type=cs;
/*For the REPEATED statement make sure that subj is completely unique, such that there are no subjects with the same identifier at the various sites. Also, since the various measures are found on all subjects, you should consider the following REPEATED statement*/
repeated time/subject=subj*measure type=cs;

This REPEATED statement is a sort of kludgy way of getting away from the Kronecker product approach. It depends on measure effect on the dependent variable being relatively homogeneous (i.e., only a level effect, and that is independent of the other terms in the model).

There are ways around some of the seemingly unending times to completion, but I am not sure if they are all available in SAS Studio. You might want to consider moving to GLIMMIX where you can use the NLOPTIONS statement to tune the iterative algorithm.

SteveDenham

RLarl · Posted 11-11-2024 05:19 AM

Hello Steve,

thank you very much for getting back to me about this. I appreciate it.

I ran several models, and compared results, and I believe that I have a clearer picture now.

Cluster-randomized design

Re: Cluster-randomized design

Re: Cluster-randomized design

Re: Cluster-randomized design

Re: Cluster-randomized design

Re: Cluster-randomized design

Re: Cluster-randomized design

Re: Cluster-randomized design

Registration is open