BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ehdezsanabria
Obsidian | Level 7

Hello! This is the first time that I encounter a data set this complicated and I need help to fit a model. I am working with a data set that contains variables measured in 80 subjects. These individuals were supplied either treatment A or B, so each treatment contained 40 volunteers. In treatment A, 28 volunteers had previous severe symptoms, whereas 12 did not present symptoms. In treatment B, 27 people had severe symptoms and 13 subjects did not present symptoms. All the measured variables were recorded at 3 time points (visits). Moreover, volunteers were classified according to their age into age_categories (20s, 30s, 40s, 50s and 60s). 

 

In this case, I think it is a multilevel model because I have symptoms nested within age, which are nested within treatment and within time point. I am confused about what to do but I tried the following syntax:

 

proc mixed data = indices;
class name group age_cat symptoms;
model Total_species = group age_cat symptoms group*age_cat age_cat*symptoms group*symptoms age_cat*group*symptoms/ s;
random intercept /sub=name group = age_cat;
random intercept /sub=group(symptoms) type = vc g gcorr v=2 vcorr=2 group = age_cat;
repeated;
run;

 

I also tried these others:

 

%macro metabolites(varsel);
PROC mixed data=indices covtest;
CLASS name visit treatment age_cat symptoms;
MODEL &varsel = visit*treatment*age_cat*symptoms/ddfm=kr;
repeated visit/type=cs subject=name;
lsmeans visit*group*age_cat*symptoms/pdiff;
RUN;
%mend;
%metabolites(Pielou);%metabolites(Shannon);%metabolites(Simpson);%metabolites(Fisher);
%metabolites(Inverse_Simpson);%metabolites(Total_Species);
quit;

%macro metabolites(varsel); PROC mixed data=indices covtest;where visit="V3"; CLASS name treatment age_cat symptoms; MODEL &varsel = treatment*age_cat*symptoms/ddfm=kr; lsmeans treatment*age_cat*symptoms/pdiff; RUN; %mend;%metabolites(Pielou);%metabolites(Shannon);%metabolites(Simpson);%metabolites(Fisher);
%metabolites(Inverse_Simpson);%metabolites(Total_Species);
quit;

 

I was wondering if anyone could provide me some advice on how to write the correct syntax and how to interpret it, or even if I am using the correct procedure. Thank you in advance!

 

Emma

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I like this approach the best of those you present:

PROC mixed data=indices covtest;
CLASS name visit treatment age_cat symptoms;
MODEL &varsel = visit*treatment*age_cat*symptoms/ddfm=kr;
repeated visit/type=cs subject=name;
lsmeans visit*group*age_cat*symptoms/pdiff;
RUN;

I like this "means model" approach, as I strongly suspect that you may not have anything close to balanced data, and missing cells that would lead to nonestimable lsmeans for lower order interactions and main effects might be expected.

 

Some things to consider.  The use of categorical age is probably OK, but if you believe that there is a strictly linear dependence on age, you may wish to change to a continuous covariate (by removing age_cat from the CLASS statement, and coding the actual age).  The pro reasons for this is that there is likely more similarity in response between a 29 year old and a 31 year old than between a 29 year old and a 21 year old.  The same is likely near any of the cutpoints.  The con reason is that this assumes a linear response across all ages, whcih may not be anywhere near a valid assumption.

 

Also, I am not a big fan of the compound symmetry variance structure for measurements that are repeated in time.  There is very little reason to believe that the covariance between the first and third visit is exactly equal to that between the first and second visit or the second and third visit.  Time dependent structures or a completely unstructured type don't place such strict assumptions on the data.

 

Steve Denham

 

 

 

 

View solution in original post

8 REPLIES 8
SteveDenham
Jade | Level 19

I like this approach the best of those you present:

PROC mixed data=indices covtest;
CLASS name visit treatment age_cat symptoms;
MODEL &varsel = visit*treatment*age_cat*symptoms/ddfm=kr;
repeated visit/type=cs subject=name;
lsmeans visit*group*age_cat*symptoms/pdiff;
RUN;

I like this "means model" approach, as I strongly suspect that you may not have anything close to balanced data, and missing cells that would lead to nonestimable lsmeans for lower order interactions and main effects might be expected.

 

Some things to consider.  The use of categorical age is probably OK, but if you believe that there is a strictly linear dependence on age, you may wish to change to a continuous covariate (by removing age_cat from the CLASS statement, and coding the actual age).  The pro reasons for this is that there is likely more similarity in response between a 29 year old and a 31 year old than between a 29 year old and a 21 year old.  The same is likely near any of the cutpoints.  The con reason is that this assumes a linear response across all ages, whcih may not be anywhere near a valid assumption.

 

Also, I am not a big fan of the compound symmetry variance structure for measurements that are repeated in time.  There is very little reason to believe that the covariance between the first and third visit is exactly equal to that between the first and second visit or the second and third visit.  Time dependent structures or a completely unstructured type don't place such strict assumptions on the data.

 

Steve Denham

 

 

 

 

ehdezsanabria
Obsidian | Level 7

Dear Steve,

 

Thank you for your kind advice and my apologies for my delayed answer. I agree with your suggestions that the response between age categories may be different "on the extremes" of the category (like you explained, comparing a 29 yo vs 31 yo vs 21 yo). Previously I also tried leaving the model with age as a continuous variable, but I had too many diverse ages and then the design was not balanced. This is to say that I had for example, only one 21 yo person with not severe symptoms, whereas I had 4 people of 24 yo, 3 of them showing not severe symptoms and 1 presenting severe symptoms. So I decided to code the age as a categorical variable. 

 

I agree that ar(1) or un covariance structures are more accurate, so I will definitely change that. My final objective is to find out whether the variables (total_species, for example) are significantly different between treatments and whether these differences are influenced by the age of the subjects and the symptoms that they reported before the start of the treatment.

 

I think I can safely draw up some assumptions with your suggestions made to the syntax I presented below. Just out of curiosity, is this really a multilevel model? 

 

Once again, I appreciate your time and consideration.

 

Best regards,

 

Emma

SteveDenham
Jade | Level 19

I don't know whether it is truly a multilevel model in the hierarchical sense, but it is certainly multilevel in the sense that subjects are measured repeatedly, which represents another level of detail.

 

Steve Denham

ehdezsanabria
Obsidian | Level 7

OK, thank you for the clarification. Just another small question to improve the visualization of the results: with the comparison of means you get several comparisons that are not relevant because they don't have biological meaning. For instance, comparing a variable measured during the first visit on a volunteer of the group A, in his 20s, without symptoms, with a variable measured during the third visit on a volunteer from the group B, in his 60s, with severe symptoms. I know that you can use the contrast statement, but I was wondering what would be the correct way to do it. Thank you again for your time!

SteveDenham
Jade | Level 19

The LSMESTIMATE statement provides precise control over the least squares means to be compared, and any adjustments for multiplicity.  I would strongly recommend its use.  One step back is the use of the SLICE option.  By using the sliceby and diff option, you can get almost any simple effect comparison you might wish to have.

 

Steve Denham

ehdezsanabria
Obsidian | Level 7
Dear Steve,

Thank you for the suggestion, it has been extremely useful!
Rick_SAS
SAS Super FREQ

I believe you can use the STORE statement to save your model to an item store, then use PROC PLM and 

the EFFECTPLOT statement to visualize the slices of the model at various levels of the parameters.  I haven't done this for a mixed model, but the following articles might be useful:

How to use PROC PLM

How to use the EFFECTPLOT statement

ehdezsanabria
Obsidian | Level 7

Dear Rick,

 

Thank you for your advice, I will definitely try EFFECTPLOT to generate a figure like the one you showed:

 

Thank you for the kind suggestion!

 

Emma

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1877 views
  • 6 likes
  • 3 in conversation