Help using Base SAS procedures

PROC REG with categorical variables and all possible subsets of models

Reply
Occasional Contributor
Posts: 13

PROC REG with categorical variables and all possible subsets of models

Hi there--

I've been using proc reg to generate all possible models sorted by AIC. I've run into a problem though. I have three categorical variables, and proc reg does not accept them as-is. I changed their values from text to numbers (e.g. "urban" because 1 and "suburban" became 0 for my "level of urbanization" category). I threw these back into the model statement, but instead of increasing the number of possible models, it decreased from over 2000 to around 300. Does this make sense? The number of possible models should increase with added variables, right?

Here is my code--the categorical variables are X4, X5, X6:

proc reg data=chill outest=est;

model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13/ selection=adjrsq sse aic ;

output out=out p=p r=r; run; quit;

proc reg data=chill outest=est0;

model y1=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 / noint selection=adjrsq sse aic ;

output out=out0 p=p r=r; run; quit;

data estout;

set est est0; run;

proc sort data=estout; by _aic_;

proc print data=estout(obs=8); run;

Did I do something wrong? Or does it make sense for the number of models to decrease?

Respected Advisor
Posts: 4,654

Re: PROC REG with categorical variables and all possible subsets of models

There is something I am not getting. Both model inputs are the same. The only difference is the absence of an intercept term in the second procedure call. That doesn't correspond to your description. Besides, how could you get a list of (2000?) models from proc reg with character regressors?

Note, when your categorical regressors have N categories, you need N-1 dummy variables to replace them in a regression setting.

PG

PG
Occasional Contributor
Posts: 13

Re: PROC REG with categorical variables and all possible subsets of models

That's just to get all possible models with and without intercepts. Most of my variables are continuous variables--when I leave out the character variables I end up with far more models. Mathematically I wasn't sure if that made sense. Sorry if I didn't explain myself correctly.

Occasional Contributor
Posts: 13

Re: PROC REG with categorical variables and all possible subsets of models

Well, I just read that the all possible models function only works when you have 10 or less independent variables. Answered my own question, I suppose. Thanks!

Respected Advisor
Posts: 2,655

Re: PROC REG with categorical variables and all possible subsets of models

Following up on PG's response--How do you plan to deal with all possible models when some of the independent variables are exclusive categories?  Does it make any sense at all to include (for instance) 'urban', and exclude 'suburban', especially when this will exclude a large part of your database?

I have some major doubts about any analysis produced in this manner.

Steve Denham

Ask a Question
Discussion stats
  • 4 replies
  • 723 views
  • 0 likes
  • 3 in conversation