BookmarkSubscribeRSS Feed
econdon
Calcite | Level 5

Hi there--

I've been using proc reg to generate all possible models sorted by AIC. I've run into a problem though. I have three categorical variables, and proc reg does not accept them as-is. I changed their values from text to numbers (e.g. "urban" because 1 and "suburban" became 0 for my "level of urbanization" category). I threw these back into the model statement, but instead of increasing the number of possible models, it decreased from over 2000 to around 300. Does this make sense? The number of possible models should increase with added variables, right?

Here is my code--the categorical variables are X4, X5, X6:

proc reg data=chill outest=est;

model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13/ selection=adjrsq sse aic ;

output out=out p=p r=r; run; quit;

proc reg data=chill outest=est0;

model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 / noint selection=adjrsq sse aic ;

output out=out0 p=p r=r; run; quit;

data estout;

set est est0; run;

proc sort data=estout; by _aic_;

proc print data=estout(obs=8); run;

Did I do something wrong? Or does it make sense for the number of models to decrease?

4 REPLIES 4
PGStats
Opal | Level 21

There is something I am not getting. Both model inputs are the same. The only difference is the absence of an intercept term in the second procedure call. That doesn't correspond to your description. Besides, how could you get a list of (2000?) models from proc reg with character regressors?

Note, when your categorical regressors have N categories, you need N-1 dummy variables to replace them in a regression setting.

PG

PG
econdon
Calcite | Level 5

That's just to get all possible models with and without intercepts. Most of my variables are continuous variables--when I leave out the character variables I end up with far more models. Mathematically I wasn't sure if that made sense. Sorry if I didn't explain myself correctly.

econdon
Calcite | Level 5

Well, I just read that the all possible models function only works when you have 10 or less independent variables. Answered my own question, I suppose. Thanks!

SteveDenham
Jade | Level 19

Following up on PG's response--How do you plan to deal with all possible models when some of the independent variables are exclusive categories?  Does it make any sense at all to include (for instance) 'urban', and exclude 'suburban', especially when this will exclude a large part of your database?

I have some major doubts about any analysis produced in this manner.

Steve Denham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2190 views
  • 0 likes
  • 3 in conversation