BookmarkSubscribeRSS Feed
hellorc
Obsidian | Level 7

Hello SAS community, recently my team is working on a serology study which involves some longitudinal data. We came up with a proc mixed code for the analysis and would like to hear your feedback, any comment is welcome!

 

Here is a sample data:

data serology;
input id gender group type il2 il5 il6 @@;
datalines;
1 1 1 1 6232 3193 5233
1 1 1 2 4689 7653 1231
2 1 2 1 8863 3659 1122
2 1 2 2 7789 1123 5302
3 2 1 1 6653 8842 9082
3 2 1 2 6321 3134 1234
4 1 3 1 5312 1251 7861
4 1 3 2 8764 1244 5721
;
run;

Variable descriptions:

ID= subject's identification no.

gender= subject's gender

group= 1 (control), 2 (treatment A), 3 (treatment B)

type= 1 (blood sample collected from site 1), 2 (blood sample collected from site 2)

il2, il5, il6 = serology variables of interest

 

Blood samples for each subject must be collected from site 1 first, and after some time they were collected again at site 2. We also have sample collection dates in the data (not provided in the above code).

 

The main research question is to investigate whether the variables of interest (il2, il5, and il6) are different among "groups" and "types".

 

At first we thought about performing ANOVA or GLM, but we decided that we should consider "type" as time point for doing longitudinal analysis. So far we have the following code:

proc mixed data=serology;
class id gender group type;
model (il2 il5 il6) = gender group type/ solution;
repeated /type=un subject=id r corr;
run;

We are considering each variable of interest one at a time to see whether there's difference among group or type, adjusted for gender, using unstructured covariance matrix structure. Might someone be willing to confirm or comment on whether the proc mixed here does what I stated, whether it is appropriate in this case, or provide any suggestion for improvement?

 

My other question is about the dates. Is it possible to adopt the "dates of sample collection" in the above proc mixed for more accurate longitudinal analysis? Let's denote the date of sample collection by DOSC, would changing the repeated statement to "repeated DOSC" do the job?

 

Thank you!

RC

5 REPLIES 5
StatsMan
SAS Super FREQ

That code should do what you want. I am not sure how you will apply the DOSC effect, though. TYPE is your repeated effect in the design you describe. If you have missing data, meaning that you might only collect one sample for some subjects, then include TYPE on the REPEATED statement before the / to make sure you identify the TYPE observations correctly for each subject. 

hellorc
Obsidian | Level 7

Hi StatsMan, thank you so much for your reply. For the DOSC effect, we came across this post (https://communities.sas.com/t5/Statistical-Procedures/Longitudinal-Data-Analysis-with-time-as-a-cont...), which defined a separate time variable exactly as DOSC, and included it in class and repeated statement.

 

data serology;
set serology;
time=dosc;
run;

proc mixed data=serology;
class id gender group type time;
model (il2 il5 il6) = gender group type dosc/ solution;
repeated time/type=un subject=id r corr;
run;

My concern here is that we might be double-counting same effect with type and DOSC. In the above code, is type no longer a repeated effect? Instead, DOSC seems to be a continuous variable representing the blood sample collection date. I am confused.

 

 

 

 

 

djmangen
Obsidian | Level 7

In theory, time is of course continuous, and the use of it in a longitudinal analysis would be similar to assessing length of exposure.  But as a practical matter in your data is it continuous?  Is there variability in the length of exposure.

 

Also, I believe that you would have some serious confounding including both measures of time in the model.  I'd want to think about it some more, but my suspicion is that including both Type and time is not appropriate.

SteveDenham
Jade | Level 19

If you add in DOSC, you have two repeated measures, most likely with correlations. See below for my first cut at code for this.

One thing I noticed is that you have multiple dependent variables specified in your MODEL statement. SAS/STAT version 15.x documentation explicitly says: "The MODEL statement names a single dependent variable..."  (emphasis added).

 

So here is what I would try for a doubly repeated analysis:

 

proc mixed data=serology;
class id gender group type dosc;
model ilt2 = gender group type dosc/ solution;
repeated dosc type  /type=un@un subject=id r corr;
run;

What I would worry about here is the number of levels for dosc, how many unique days a single id is collected, and why you might think that dosc would have a different effect at different times of the year. If you have many levels of dosc, consider changing to PROC GLIMMIX where you can look at spline or polynomial effects for dosc, rather than soaking up a lot of degrees of freedom with a categorical approach.

 

SteveDenham

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1066 views
  • 0 likes
  • 4 in conversation