02-23-2014 03:51 PM
I've been using proc reg to generate all possible models sorted by AIC. I've run into a problem though. I have three categorical variables, and proc reg does not accept them as-is. I changed their values from text to numbers (e.g. "urban" because 1 and "suburban" became 0 for my "level of urbanization" category). I threw these back into the model statement, but instead of increasing the number of possible models, it decreased from over 2000 to around 300. Does this make sense? The number of possible models should increase with added variables, right?
Here is my code--the categorical variables are X4, X5, X6:
proc reg data=chill outest=est;
model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13/ selection=adjrsq sse aic ;
output out=out p=p r=r; run; quit;
proc reg data=chill outest=est0;
model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 / noint selection=adjrsq sse aic ;
output out=out0 p=p r=r; run; quit;
set est est0; run;
proc sort data=estout; by _aic_;
proc print data=estout(obs=8); run;
Did I do something wrong? Or does it make sense for the number of models to decrease?
02-23-2014 04:24 PM
There is something I am not getting. Both model inputs are the same. The only difference is the absence of an intercept term in the second procedure call. That doesn't correspond to your description. Besides, how could you get a list of (2000?) models from proc reg with character regressors?
Note, when your categorical regressors have N categories, you need N-1 dummy variables to replace them in a regression setting.
02-23-2014 04:53 PM
That's just to get all possible models with and without intercepts. Most of my variables are continuous variables--when I leave out the character variables I end up with far more models. Mathematically I wasn't sure if that made sense. Sorry if I didn't explain myself correctly.
02-23-2014 07:15 PM
Well, I just read that the all possible models function only works when you have 10 or less independent variables. Answered my own question, I suppose. Thanks!
02-24-2014 08:34 AM
Following up on PG's response--How do you plan to deal with all possible models when some of the independent variables are exclusive categories? Does it make any sense at all to include (for instance) 'urban', and exclude 'suburban', especially when this will exclude a large part of your database?
I have some major doubts about any analysis produced in this manner.