Re: Model selection in Proc Mixed

deleted_user · Posted 03-22-2010 01:47 PM

Hello,
I have a dataset and a set of Apriori models and I am going to use model selection and AIC to rank the models. My models have fixed and random effects. I have two random class variables, year and unit, and a suite of continuous variables. Below is a simplified sample dataset. One thing I have to consider is that some, but not all experimental units were sampled each year.
From research with SAS so far, I have found that the default estimator used in proc mixed is REML, and that REML only considers the random effects. Since the formula that calculates each AIC value includes a bias correction term based on the number of parameters, it seems that the REML method would be inappropriate for models including fixed effects. In order to consider the fixed effects, I need to specify the ML method. I have found that the ML method counts each unique observation in a class variable as a separate parameter. For example each year is counted as a separate parameter in the model. This would seem to inflate the bias correction term for AIC, as it uses the number of parameters for the calculation. I would welcome any suggestions on the best way to proceed with this analysis. I am wondering whether or not SAS is the best environment to perform model selection, and I plan on calculating AIC values manually as a check. Any recommendations or insight on how best to proceed with this analysis are welcome.

Thanks
y year unit x3 x4 x5 x6
43 2005 A 23 37 19 7
34 2005 B 14 48 28 31
50 2005 C 19 24 48 48
4 2005 D 47 9 46 20
28 2005 E 37 36 6 12
7 2005 F 9 27 22 19
40 2005 G 31 9 15 32
45 2006 A 17 4 29 6
24 2006 C 29 23 7 38
37 2006 D 9 26 34 32
18 2006 F 11 45 50 18
18 2006 G 27 10 16 42
17 2007 B 6 34 7 29
49 2007 C 14 2 17 26
27 2007 D 12 13 31 46
18 2007 E 4 22 46 44
28 2007 F 50 45 5 16
5 2007 G 47 23 16 16
22 2007 H 29 5 29 36
40 2007 I 9 45 15 32

sfleming · Posted 03-24-2010 04:25 PM

I recommend the book

Linear Mixed Models: A Practical Guide Using Statistical Software
by Brady T. West, Kathleen B. Welch, and Andrzej T. Galecki
Chapman & Hall/CRC
2007

The approach they take is to use REML when comparing models that differ in terms of the random effects and ML when comparing models that differ in terms of the fixed effects.

deleted_user · Posted 03-25-2010 03:01 PM

Thanks...just purchased the book. I am unclear how you can use REML for random effects and ML for fixed effects when my models include both fixed and random effects?

sfleming · Posted 03-25-2010 03:49 PM

Let's say you are comparing Model A and Model B. If the only difference between the 2 models is the random effects that are included (the fixed effects in both models are the same), then fit both models using REML. Vice Versa for 2 models that only differ in the fixed effects included in the models.

deleted_user · Posted 03-25-2010 04:23 PM

What if you have models A and B, and both the fixed and random effects differ between models?

Dale · Posted 03-26-2010 03:56 PM

Nested models which differ only in the parameterization of the random effects can be tested using either REML or ML. Generally, REML is preferred for tests which only involve a modification of the random effect parameterization because REML results in estimates which don't (usually) have the same bias as is observed for ML estimates of the covariance structure.

Nested models which differ only in the parameterization of the fixed effects must be tested using ML. Thus, whenever your comparison is of nested models in which both both fixed and random effects are parameterized differently, then you must use ML.

deleted_user · Posted 03-26-2010 04:45 PM

Thanks Dale. That is what I've gathered from what I've been reading recently. My only question now is whether it is appropriate to include each level of a categorical variable as a separate parameter. This would seem to inflate the bias correction term used in calculating AIC, especially when you have many levels.

Bryan

Dale · Posted 03-29-2010 03:15 PM

The number of parameters for a categorical variable would be the number of levels minus 1. It would not be appropriate to compute AIC with a penalty of 1 for the categorical variable as a whole, if that is what you are suggesting.

SAS Innovate 2025: Call for Content