I'm calling out to @SteveDenham because I've often gleaned very useful guidance from your posts for analytical questions that I've had and your area of expertise seems relevant to my issue. (For reference, I'm using SAS 9.4 TS Level 1M5 on an X64_10Pro platform.) For some time, I have been using BIC for model selection in Proc Mixed (e.g. to compare between heterogeneous and homogeneous variance models and to compare covariance structures in repeated measures analysis). From my understanding, BIC should impose a greater penalty for increasing number of parameters as compared with AIC or AICC. In some recent analyses, I realized that use of BIC was actually more likely than other IC to lead to selection of more complex models. This behavior appears to be related to a couple of issues with reported BIC values (in Proc Mixed when using REML, at least). The SAS/STAT 14.3 user’s guide (here) indicates that, in agreement with the listed reference (Schwarz, The Annals of Statistics 1978 Vol. 6 Issue 2 Pages 461-464), BIC is calculated as: BIC = -2L + d log n where d represents the number of estimated covariance parameters However, this is apparently not the formula that is actually used to compute the reported BIC values. From multiple simulations, it appears that what Proc Mixed is actually reporting is: BIC = -2L + d log n - d which causes it to greatly discount the influence of additional covariance parameters, which is precisely the opposite of the expected behavior. Is this a bug in the code, or am I missing something here? As a side note, the user’s ability to understand BIC, as compared with the other information criteria, is greatly compromised because of the differing (and very complicated) definitions of n that are used. For example, the simulated data set below is used as an example of randomized complete block design experiment, with repeated measures on individual subjects. In this situation, n* (as used to determine AICC) is calculated as n – rank(X) and equals 48, whereas n (as used to determine BIC) represents the number of levels of the blocking factor (the first, and only, specified random effect), which equals 4. It is difficult to reconcile the use of BIC given the confusing and changing definitions of n. As above, I'm wondering if this is the intended behavior, as it does not seem to agree with the definition of BIC in other sources that I read. As things now stand, I am discontinuing the use of BIC in favor of AICC, and now questioning a lot of published data (mine and others). Any insight would be much appreciated. Simulated data to exemplify the issues described: data;
input ID Block Trt$ time Resp;
cards;
1 1 A 1 1.726894073
2 1 B 1 2.092813457
3 1 C 1 2.72720316
4 1 D 1 3.414096871
5 2 A 1 1.616575827
6 2 B 1 2.448749843
7 2 C 1 3.197066773
8 2 D 1 4.218032532
9 3 A 1 2.250938172
10 3 B 1 3.167495069
11 3 C 1 3.463748709
12 3 D 1 4.300078354
13 4 A 1 2.800758394
14 4 B 1 3.408282671
15 4 C 1 3.932933885
16 4 D 1 5.291471566
1 1 A 2 1.68963511
2 1 B 2 2.1472738
3 1 C 2 2.573448703
4 1 D 2 3.853775305
5 2 A 2 2.234276687
6 2 B 2 2.554918183
7 2 C 2 2.852388857
8 2 D 2 3.941084149
9 3 A 2 2.728948071
10 3 B 2 3.203471147
11 3 C 2 3.620032518
12 3 D 2 4.17829973
13 4 A 2 2.442563804
14 4 B 2 3.440094189
15 4 C 2 4.049735686
16 4 D 2 4.785880324
1 1 A 3 1.73878971
2 1 B 3 1.61274462
3 1 C 3 2.809355931
4 1 D 3 3.553253027
5 2 A 3 1.549664697
6 2 B 3 2.464931929
7 2 C 3 3.219585446
8 2 D 3 3.958149394
9 3 A 3 2.621761763
10 3 B 3 3.129957787
11 3 C 3 3.79504407
12 3 D 3 4.649424914
13 4 A 3 3.000206689
14 4 B 3 3.469261887
15 4 C 3 3.987422634
16 4 D 3 5.245902679
1 1 A 4 1.491970466
2 1 B 4 1.982509609
3 1 C 4 2.964069871
4 1 D 4 3.164638885
5 2 A 4 1.693782424
6 2 B 4 2.553224482
7 2 C 4 3.172781663
8 2 D 4 4.075483509
9 3 A 4 2.336685246
10 3 B 4 3.002418224
11 3 C 4 3.767810137
12 3 D 4 4.881787515
13 4 A 4 2.880757109
14 4 B 4 3.865221255
15 4 C 4 4.01673442
16 4 D 4 4.823744484
;
*Model 1 using unstructured covariance;
proc mixed ranks;
class id block trt time;
model resp = trt|time /ddfm=kr2;
random block;
repeated time/ sub=id type=un;
run;
*Model 2 using variance components covariance structure;
proc mixed ranks;
class id block trt time;
model resp = trt|time /ddfm=kr2;
random block;
repeated time/ sub=id type=vc;
run;
... View more