BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gabmv
Calcite | Level 5

Hello all,

First of all, I am pretty new to mixed model but to analyze an experiment, I need help to understand what I am doing. Thanks in advance.

My data set is the following:

130 groups (suspc.) in which 4 random samples within the group are analyzed for 2 consecutive year (the 4 samples the following year are chosen at random as well, not the same as the previous year)  (8 samples per group in total). These sample are analyzed to get the concentration of 2 components, A and B. The data set is unbalanced: data on concentration of A and/or B may be missing for some samples.

Thus, random var=(sample within the group) ;         fixed var=(group), (component), (year) ;        y=(concentration)

Then, I want to analyze which groups have a greater concentration of A and B and if the year has a significance importance. I also want to see if there is a significant correlation between the concentration of A and B in general (not for every single group).

So I don't really know where to start other than building my data set in long form and trying to run a PROC MIXED.

I am relatively new to SAS and mixed models.

Thank you very much guys,

Gabriel,

Undergrad Student in ChemE

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I think the major difference is that I invented a variable that doesn't exist anywhere in the design, and I apologize.  This last code that you present is now in agreement with the final edited code I have upstream.

Now suppose the components ARE correlated in some way.  You could model that situation as follows:

proc mixed data=WORK.FEZN ;

class year accession component subject;

model conc=year|accession|component;

repeated component/subject=subject type=unr;

lsmeans year|accession|component/diff adjdfe=row adjust=simulate(seed=1);

run;

The type=unr output will give the correlation between the two components, which I assume are iron and zinc concentrations.  Under this model, you do not assume a priori that the two are uncorrelated, which may or may not be the case given hydrological stressors in agronomic or ecological experiments, for example.  This does require hovever that the concentrations are obtained from the same subject.

Steve Denham

View solution in original post

13 REPLIES 13
SteveDenham
Jade | Level 19

Hi Gabriel,

It looks like a good start.  I would recommend getting a copy of SAS for Mixed Models, 2nd ed. by Littell et al., and looking through the examples there.  Unbalanced data of the kind you are talking about should not be a problem.  Two question-s-are the components A and B correlated in some manner, or ate they independent?  What kind of distribution do the concentrations follow?  That is, for many biological analytes the data are lognormally distributed rather than normally distributed.

I have a tentative model in mind, but answers to those questions would help a lot.

Steve Denham

gabmv
Calcite | Level 5

Hello Steve,

Thank you very much for your answer,

The concentration of the 2 components are not correlated; they are independent. The data for the concentration are normally distributed.

Gabriel,

SteveDenham
Jade | Level 19

This will be simpler than I feared.

Try:

proc mixed data=yourdata;

by component;

class year group;

model concentration=class year cless*year;

lsmeans class year class*year/diff adjdfe=row adjust=simulate(seed=1);

run;

In this case the residual error is due to the samples within a group-year combination, and does not need to be specified in a random statement.

Steve Denham

gabmv
Calcite | Level 5

Thank you Steve,

Your program definitely is helpful.

I am still a little bit confused on why the sample with the group should not be specified in a random statement. I guess my initial statement was not exactly clear. For each 130 groups, I have over 20 to 50 subjects and I randomly chose only 4 per year. I guess calling the subjects chosen sample was confusing.

Thank you for your very helpful input!

Gabriel

SteveDenham
Jade | Level 19

I'll try to answer the question by saying, as I see it, you have 4 measurments of component A for each class by year cell.  There is no other design factor involved.  If I am missing something, then I will incorporate it as I get filled in.

What does concern me a little is the use of only 4, rather than the entire dataset at each time point.  PROC MIXED can easily handle more data.

Steve Denham

gabmv
Calcite | Level 5

To clarify my statement:

I have 130 different accessions (can be seen as a subspecies or something similar) that I want to evaluate to find which one maximize the concentration of component A and B. To do so, I randomly selected 4 subjects per year per accession got the concentration of component A and B for every sample.

There is only 2 time points: 2 different years. It was too time-extensive to have more than 8 samples total per accession.

2080 concentration values (1040 for A and 1040 for B, 520 per component per year. 130*4 per component )

My database looks like this 2080x5

year / accession / subject / component / concentration

Hopefully that clarify a bit. I realize that my initial post was really confusing.

SteveDenham
Jade | Level 19

So accession is a factor.  Now comes the question--is it random (the 130 represent some sort of sample from an entire universe of accessions, and you wish to make inferences about that universe) or fixed (you wish to make inferences about those 130 specific accessions).  For the fist case, try:

proc mixed data=yourdata;

by component;

class year accession;

model concentration=year;

random accession year*accession;

lsmeans year/diff adjdfe=row adjust=simulate(seed=1);

run;

For the second case:

proc mixed data=yourdata;

by component;

class year accession;

model concentration=year accession year*accession;

lsmeans  year accession year*accession/diff adjdfe=row adjust=simulate(seed=1);

run;

Steve Denham

Message was edited by: Steve Denham

Message was edited AGAIN by: Steve Denham

gabmv
Calcite | Level 5

Thank you SO much,

the 130 accession represent the entire collection I want to evaluate!

Gabriel

gabmv
Calcite | Level 5

Just for clarification,

In the second code, what does "class" refer to in the lsmeans statement.

And also, I still don't see why "subject" would not be a random statement. (The 8 subjects evaluated per accession)

You are very helpful, thank you

Gabriel

SteveDenham
Jade | Level 19

Cut and paste error explains the 'class' in the lsmeans statement.  I've gone back and edited the post.

You can put subject in as a random statement, but the results will be exactly the same as not including it.  The reason is that it only indexes the lowest level of observation--it is just another name for the residual error in this design, and you don't have to specify it.

Steve Denham

gabmv
Calcite | Level 5

Okay, I see, neglecting the random statement makes a lot of sense.

So, I end up having:

data FeZn;

    infile '/folders/myfolders/FeZn.csv' dlm=',' firstobs=2;

    input Sample Accession Year comp $ conc;

run;

proc mixed data=WORK.FEZN ;

by comp;

class year accession;

model conc=class year accession class*year class*accession year*accession class*year*accession;

lsmeans ear accession class*year class*accession year*accession class*year*accession/diff adjdfe=row adjust=simulate(seed=1);

run;

I do not understand why class is present in either model or lsmeans statement

So I did:

proc mixed data=WORK.FEZN ;

by comp;

class year accession;

model conc=year accession year*accession;

lsmeans year accession year*accession/diff adjdfe=row adjust=simulate(seed=1);

run;

what is the major difference here?

SteveDenham
Jade | Level 19

I think the major difference is that I invented a variable that doesn't exist anywhere in the design, and I apologize.  This last code that you present is now in agreement with the final edited code I have upstream.

Now suppose the components ARE correlated in some way.  You could model that situation as follows:

proc mixed data=WORK.FEZN ;

class year accession component subject;

model conc=year|accession|component;

repeated component/subject=subject type=unr;

lsmeans year|accession|component/diff adjdfe=row adjust=simulate(seed=1);

run;

The type=unr output will give the correlation between the two components, which I assume are iron and zinc concentrations.  Under this model, you do not assume a priori that the two are uncorrelated, which may or may not be the case given hydrological stressors in agronomic or ecological experiments, for example.  This does require hovever that the concentrations are obtained from the same subject.

Steve Denham

gabmv
Calcite | Level 5

Thank you Steve for all your help, everything works flawlessly now!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 6057 views
  • 4 likes
  • 2 in conversation