BookmarkSubscribeRSS Feed
celdelmas
Calcite | Level 5

Hi all,

 

am analyzing a data set of quantitative traits measured on 15 species (3 individuals per species).

I would like to test the predictive relationship between one particular variable (Y) and the other traits (X1 to X14).

I thought that I should either realize a regression analyses (PROC REG) between pairs of traits on trait means per species OR realize a mixed model with the species entered as random for examples:

 

Solution1:

PROC REG data=mean.data;

model Ym=X1m;

run;

 

Ym and X1m are species means for these traits.

 

Solution2:

PROC MIXED data=mydata;
class species code_ind;
model Y=X1 / solution;
random int/sub=species type=un ;
random int/subject=species(code_ind) type=un;
run;

 

The solution 2 did not converge maybe because I don't have enough individuals per species?

 

Can I use a PROC MIXED on this dataset? (Y follows a normal distribution, X1 to X14 are either proportion or length measurements)

 

If someone could advise me on this analysis, it will be of great help,

 

Thanks a lot,

 

Chloe

 

 

5 REPLIES 5
SteveDenham
Jade | Level 19

One question and one observation.  The question first--what does the variable code_ind represent?  That is not clear from your design.  Accommodating it correctly will enable the model to more accurately represent your data.

 

The observation--you are currently fitting the covariance matrix between species twice, and the second random statement fits the covariance on a more "granular" basis.  With 15 species, there are 105 parameters to estimate with only one unstructured statement.  Here you are at least doubling that (if not even more).  And on that note, you almost certainly lack sufficient data to fit such a model, if in fact there are only 45 observations in the dataset.  Try starting with a variance component due only to species, like:

 

random intercept/subject=species type=vc;

 

See how that behaves.  Then, based on what code_ind represents, you might be able to fit something that also incorporates that variable.

 

Steve Denham

celdelmas
Calcite | Level 5

Hello Steve,

 

Thank you very much for your reply.

 

- code_ind is the identification of the individual for each species. Basically it is the name of the species + a number or letter juxtaposed at the end

I ahev three individuals per species, that's why I think I should as a "species" random effect in my model.

 

- The PROC MIXED did not converge either when I use only one random statement as the one you suggested.

 

- It is converging of course withour random statement.

 

I appreciate your help, thank you very much

SteveDenham
Jade | Level 19

Is the model "approaching" convergence, or just jumping around?  Look at the objective function in the iteration history.  If you are stopping at iteration 50, you will need to specify MAXITER= <some number larger than 50> in the PROC MIXED statement.  Try setting it at 500, and then looking at the objective function values.  You may have to work with the CONVF, CONVG or CONVH options to get convergence.

 

But that may be answering the wrong question.  You have 15 species represented.  By including species as a random effect, you are going to get estimates and errors for a broad inference space that assumes that the 15 species are a random sample of all possible species in the area.  Is that a reasonable assumption?  It may not be--your interest may be in only those 15 species, in which case a fixed effect  might be a better approach.  In that case, your model statement would look something like:

 

model Y=X1 species species*X1 / solution;

 

and there would be no random statement.  Think on this a while.

 

And (naturally) there is yet another approach, that of getting a BLUP estimate of the slope for each species.  But I don't think that is quite where you need to be (yet).

 

Steve Denham

celdelmas
Calcite | Level 5

Thank you for your answer!

The model stops at N=43 iterations even with the MAXITER statement.

If I use a PROC GLM with species as a fixed effect (model Y=X1 species species*X1 / solution) I cannot obtain a statistic value for the interaction and I have to look at Type I SS to get a result for X1.

 

 Proc GLM data=mydata  ;
   class species ;
   model Y=X1 species  X1*species/ solution;
   run;

Results:

 

Source DDL Type I SS Carré moyen Valeur F Pr > F
X1 1 93.3435020 93.3435020 Infin <.0001
species 19 107.4136910 5.6533522 Infin <.0001
X1*species 19 0.0000000 0.0000000 . .

 

Source DDL Type III SS Carré moyen Valeur F Pr > F
X1 1 0.00000000 0.00000000 . .
species 19 1.05943956 0.05575998 Infin <.0001
prop_lignifi*species 19 0.00000000 0.00000000 . .

 

If I ommit the interaction and look at Type 1 SS, I get results (same as above)

 

 

I thought of using BLUP but I was not sure this is appropriated.

 

Finally maybe doing a regression on the averaged values per species could be the better (simplier but correct) way to test my hypothesis (X1 affects Y)

 

Thanks again for your comments!

SteveDenham
Jade | Level 19

Consider fitting the fixed effect model as:

Proc GLM data=mydata  ;
   class species ;
   model Y=species  X1*species/ noint solution;
   run;

This should give a separate intercept and slope for each species, and should result in Type III F tests that don't go to infinity.

 

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2183 views
  • 0 likes
  • 2 in conversation