Hi there--
I've been using proc reg to generate all possible models sorted by AIC. I've run into a problem though. I have three categorical variables, and proc reg does not accept them as-is. I changed their values from text to numbers (e.g. "urban" because 1 and "suburban" became 0 for my "level of urbanization" category). I threw these back into the model statement, but instead of increasing the number of possible models, it decreased from over 2000 to around 300. Does this make sense? The number of possible models should increase with added variables, right?
Here is my code--the categorical variables are X4, X5, X6:
proc reg data=chill outest=est;
model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13/ selection=adjrsq sse aic ;
output out=out p=p r=r; run; quit;
proc reg data=chill outest=est0;
model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 / noint selection=adjrsq sse aic ;
output out=out0 p=p r=r; run; quit;
data estout;
set est est0; run;
proc sort data=estout; by _aic_;
proc print data=estout(obs=8); run;
Did I do something wrong? Or does it make sense for the number of models to decrease?
There is something I am not getting. Both model inputs are the same. The only difference is the absence of an intercept term in the second procedure call. That doesn't correspond to your description. Besides, how could you get a list of (2000?) models from proc reg with character regressors?
Note, when your categorical regressors have N categories, you need N-1 dummy variables to replace them in a regression setting.
PG
That's just to get all possible models with and without intercepts. Most of my variables are continuous variables--when I leave out the character variables I end up with far more models. Mathematically I wasn't sure if that made sense. Sorry if I didn't explain myself correctly.
Well, I just read that the all possible models function only works when you have 10 or less independent variables. Answered my own question, I suppose. Thanks!
Following up on PG's response--How do you plan to deal with all possible models when some of the independent variables are exclusive categories? Does it make any sense at all to include (for instance) 'urban', and exclude 'suburban', especially when this will exclude a large part of your database?
I have some major doubts about any analysis produced in this manner.
Steve Denham
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.