Source | DDL | Type I SS | Carré moyen | Valeur F | Pr > F |
---|---|---|---|---|---|

X1 | 1 | 93.3435020 | 93.3435020 | Infin | <.0001 |

species | 19 | 107.4136910 | 5.6533522 | Infin | <.0001 |

X1*species | 19 | 0.0000000 | 0.0000000 | . | . |

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- testing predictive relationship with random effect...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-27-2015 08:32 AM

Hi all,

I am analyzing a data set of quantitative traits measured on 15 species (3 individuals per species).

I would like to test the predictive relationship between one particular variable (Y) and the other traits (X1 to X14).

I thought that I should either realize a regression analyses (PROC REG) between pairs of traits on trait means per species OR realize a mixed model with the species entered as random for examples:

Solution1:

PROC REG data=mean.data;

model Ym=X1m;

run;

Ym and X1m are species means for these traits.

Solution2:

PROC MIXED data=mydata;

class species code_ind;

model Y=X1 / solution;

random int/sub=species type=un ;

random int/subject=species(code_ind) type=un;

run;

The solution 2 did not converge maybe because I don't have enough individuals per species?

Can I use a PROC MIXED on this dataset? (Y follows a normal distribution, X1 to X14 are either proportion or length measurements)

If someone could advise me on this analysis, it will be of great help,

Thanks a lot,

Chloe

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2015 08:12 AM

One question and one observation. The question first--what does the variable code_ind represent? That is not clear from your design. Accommodating it correctly will enable the model to more accurately represent your data.

The observation--you are currently fitting the covariance matrix between species twice, and the second random statement fits the covariance on a more "granular" basis. With 15 species, there are 105 parameters to estimate with only one unstructured statement. Here you are at least doubling that (if not even more). And on that note, you almost certainly lack sufficient data to fit such a model, if in fact there are only 45 observations in the dataset. Try starting with a variance component due only to species, like:

random intercept/subject=species type=vc;

See how that behaves. Then, based on what code_ind represents, you might be able to fit something that also incorporates that variable.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2015 08:30 AM

Hello Steve,

Thank you very much for your reply.

- code_ind is the identification of the individual for each species. Basically it is the name of the species + a number or letter juxtaposed at the end

I ahev three individuals per species, that's why I think I should as a "species" random effect in my model.

- The PROC MIXED did not converge either when I use only one random statement as the one you suggested.

- It is converging of course withour random statement.

I appreciate your help, thank you very much

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2015 08:48 AM - edited 10-29-2015 09:03 AM

Is the model "approaching" convergence, or just jumping around? Look at the objective function in the iteration history. If you are stopping at iteration 50, you will need to specify MAXITER= <some number larger than 50> in the PROC MIXED statement. Try setting it at 500, and then looking at the objective function values. You may have to work with the CONVF, CONVG or CONVH options to get convergence.

But that may be answering the wrong question. You have 15 species represented. By including species as a random effect, you are going to get estimates and errors for a broad inference space that assumes that the 15 species are a random sample of all possible species in the area. Is that a reasonable assumption? It may not be--your interest may be in only those 15 species, in which case a fixed effect might be a better approach. In that case, your model statement would look something like:

model Y=X1 species species*X1 / solution;

and there would be no random statement. Think on this a while.

And (naturally) there is yet another approach, that of getting a BLUP estimate of the slope for each species. But I don't think that is quite where you need to be (yet).

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2015 05:59 PM

Thank you for your answer!

The model stops at N=43 iterations even with the MAXITER statement.

If I use a PROC GLM with species as a fixed effect (model Y=X1 species species*X1 / solution) I cannot obtain a statistic value for the interaction and I have to look at Type I SS to get a result for X1.

```
Proc GLM data=mydata ;
class species ;
model Y=X1 species X1*species/ solution;
run;
```

Results:

Source | DDL | Type I SS | Carré moyen | Valeur F | Pr > F |
---|---|---|---|---|---|

X1 | 1 | 93.3435020 | 93.3435020 | Infin | <.0001 |

species | 19 | 107.4136910 | 5.6533522 | Infin | <.0001 |

X1*species | 19 | 0.0000000 | 0.0000000 | . | . |

Source | DDL | Type III SS | Carré moyen | Valeur F | Pr > F |
---|---|---|---|---|---|

X1 | 1 | 0.00000000 | 0.00000000 | . | . |

species | 19 | 1.05943956 | 0.05575998 | Infin | <.0001 |

prop_lignifi*species | 19 | 0.00000000 | 0.00000000 | . | . |

If I ommit the interaction and look at Type 1 SS, I get results (same as above)

I thought of using BLUP but I was not sure this is appropriated.

Finally maybe doing a regression on the averaged values per species could be the better (simplier but correct) way to test my hypothesis (X1 affects Y)

Thanks again for your comments!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-04-2015 01:38 PM

Consider fitting the fixed effect model as:

```
Proc GLM data=mydata ;
class species ;
model Y=species X1*species/ noint solution;
run;
```

This should give a separate intercept and slope for each species, and should result in Type III F tests that don't go to infinity.

Steve Denham