BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello,
I have a dataset and a set of Apriori models and I am going to use model selection and AIC to rank the models. My models have fixed and random effects. I have two random class variables, year and unit, and a suite of continuous variables. Below is a simplified sample dataset. One thing I have to consider is that some, but not all experimental units were sampled each year.
From research with SAS so far, I have found that the default estimator used in proc mixed is REML, and that REML only considers the random effects. Since the formula that calculates each AIC value includes a bias correction term based on the number of parameters, it seems that the REML method would be inappropriate for models including fixed effects. In order to consider the fixed effects, I need to specify the ML method. I have found that the ML method counts each unique observation in a class variable as a separate parameter. For example each year is counted as a separate parameter in the model. This would seem to inflate the bias correction term for AIC, as it uses the number of parameters for the calculation. I would welcome any suggestions on the best way to proceed with this analysis. I am wondering whether or not SAS is the best environment to perform model selection, and I plan on calculating AIC values manually as a check. Any recommendations or insight on how best to proceed with this analysis are welcome.

Thanks
y year unit x3 x4 x5 x6
43 2005 A 23 37 19 7
34 2005 B 14 48 28 31
50 2005 C 19 24 48 48
4 2005 D 47 9 46 20
28 2005 E 37 36 6 12
7 2005 F 9 27 22 19
40 2005 G 31 9 15 32
45 2006 A 17 4 29 6
24 2006 C 29 23 7 38
37 2006 D 9 26 34 32
18 2006 F 11 45 50 18
18 2006 G 27 10 16 42
17 2007 B 6 34 7 29
49 2007 C 14 2 17 26
27 2007 D 12 13 31 46
18 2007 E 4 22 46 44
28 2007 F 50 45 5 16
5 2007 G 47 23 16 16
22 2007 H 29 5 29 36
40 2007 I 9 45 15 32
7 REPLIES 7
sfleming
Calcite | Level 5
I recommend the book

Linear Mixed Models: A Practical Guide Using Statistical Software
by Brady T. West, Kathleen B. Welch, and Andrzej T. Galecki
Chapman & Hall/CRC
2007

The approach they take is to use REML when comparing models that differ in terms of the random effects and ML when comparing models that differ in terms of the fixed effects.
deleted_user
Not applicable
Thanks...just purchased the book. I am unclear how you can use REML for random effects and ML for fixed effects when my models include both fixed and random effects?
sfleming
Calcite | Level 5
Let's say you are comparing Model A and Model B. If the only difference between the 2 models is the random effects that are included (the fixed effects in both models are the same), then fit both models using REML. Vice Versa for 2 models that only differ in the fixed effects included in the models.
deleted_user
Not applicable
What if you have models A and B, and both the fixed and random effects differ between models?
Dale
Pyrite | Level 9
Nested models which differ only in the parameterization of the random effects can be tested using either REML or ML. Generally, REML is preferred for tests which only involve a modification of the random effect parameterization because REML results in estimates which don't (usually) have the same bias as is observed for ML estimates of the covariance structure.

Nested models which differ only in the parameterization of the fixed effects must be tested using ML. Thus, whenever your comparison is of nested models in which both both fixed and random effects are parameterized differently, then you must use ML.
deleted_user
Not applicable
Thanks Dale. That is what I've gathered from what I've been reading recently. My only question now is whether it is appropriate to include each level of a categorical variable as a separate parameter. This would seem to inflate the bias correction term used in calculating AIC, especially when you have many levels.

Bryan
Dale
Pyrite | Level 9
The number of parameters for a categorical variable would be the number of levels minus 1. It would not be appropriate to compute AIC with a penalty of 1 for the categorical variable as a whole, if that is what you are suggesting.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1667 views
  • 0 likes
  • 3 in conversation