turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Run regressions with combinations of multiple vari...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2015 03:55 PM

I have 20 variables in the same data file. I need to try to run regressions with var0 being the dependent variable, and the independent variables being any five variables out of the rest 19 variables. Is there any quick way to run all the possible regressions in this case?

Accepted Solutions

Solution

10-13-2015
12:04 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to liziwu

10-12-2015 04:04 PM - edited 10-12-2015 04:23 PM

Hi. New situation, six variables, VAR1 is dependent. How many regressions taking VAR2-VAR6 three variables at a time ... 10 combinations ...

**var2 var3 var4****var2 var3 var5****var2 var3 var6****var2 var4 var5****var2 var4 var6****var2 var5 var6****var3 var4 var5****var3 var4 var6****var3 var5 var6**

Given your 19 variables taken 5 at a time, that's 11,628 regressions. You can write a macro (no problem, see below), but do you really want to " *... run all the possible regressions ...* " ?

*** generate all combinations of 5 vars taken 3 at a time;**

**data vars (keep=v1-v3);****array v(2:6) $4 ('var2' 'var3' 'var4' 'var5' 'var6');****ncomb=comb(5,3);****do j=1 to ncomb;**** call lexcomb(j, 3, of v(*));**** put (v1-v3) ($5.);**** output;****end; **

*** macro variable with total number of combinations, 5 vars taken 3 at a time;****call symputx('regs',ncomb);****run;**

*** read combination, run regeression;**

**%macro reg;****%do j=1 %to ®s;****data _null_;**** rec=&j;**** set vars point=rec;**** call symput('indpt',catx(' ',of v1-v3));**** stop;****run;**

**proc reg data=x;****model var1 = &indpt;****run;**

**quit;****%end;****%mend;**

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to liziwu

10-02-2015 04:07 PM - edited 10-02-2015 04:10 PM

I suppose you could write a macro to do this, but that doesn't sound like a trivial thing to do; and anyway, using Partial Least Squares regression (PROC PLS) ought to produce superior results (lower mean squared error for coefficients and predicted values) than using PROC REG as you are trying to do.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

10-12-2015 12:27 PM

I tend to write a macro to do this because I have to utimately run Vector Autoregression (proc varmax), but don't really know how to start.

Thank you!

Lizi

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to liziwu

10-02-2015 05:17 PM

Sounds like you want to select 5 variables for your model. If you want the selection to be based on r-square, adjusted r-square or Mallows CP then **proc reg** can do the search for you. Something like

```
proc reg data=myData;
model x0 = x1-x20 / selection=CP start=5 stop=5 best=100;
run;
```

will find the 100 best 5-variable linear models according to the CP criteria.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

10-03-2015 11:20 AM

Okay, that works, but I still think this is a poor choice of analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

10-03-2015 01:02 PM

Agreed, statistical procedures for variable selection should never be used blindly. Simulations and sensitivity analyses have shown their instability over and over.

On model selection issues, read http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_glmselect_de...

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

10-03-2015 02:43 PM

I'm sorry if I am sounding like a cantakerous old model builder, I agree that the problems using variable selection techniques are well documented, and I agree that you shouldn't use these procedures blindly, but the original problem as stated tries to use these problematic model building procedures blindly with silly and meaningless restrictions (model can have only five variables).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

10-13-2015 09:18 AM

I want to nominate @PaigeMiller's response for the Hall of Fame, if we had one. To quote John Tukey twice:

There is no point in being precise when you don't know what you're talking about.

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

Steve Denham

Solution

10-13-2015
12:04 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to liziwu

10-12-2015 04:04 PM - edited 10-12-2015 04:23 PM

Hi. New situation, six variables, VAR1 is dependent. How many regressions taking VAR2-VAR6 three variables at a time ... 10 combinations ...

**var2 var3 var4****var2 var3 var5****var2 var3 var6****var2 var4 var5****var2 var4 var6****var2 var5 var6****var3 var4 var5****var3 var4 var6****var3 var5 var6**

Given your 19 variables taken 5 at a time, that's 11,628 regressions. You can write a macro (no problem, see below), but do you really want to " *... run all the possible regressions ...* " ?

*** generate all combinations of 5 vars taken 3 at a time;**

**data vars (keep=v1-v3);****array v(2:6) $4 ('var2' 'var3' 'var4' 'var5' 'var6');****ncomb=comb(5,3);****do j=1 to ncomb;**** call lexcomb(j, 3, of v(*));**** put (v1-v3) ($5.);**** output;****end; **

*** macro variable with total number of combinations, 5 vars taken 3 at a time;****call symputx('regs',ncomb);****run;**

*** read combination, run regeression;**

**%macro reg;****%do j=1 %to ®s;****data _null_;**** rec=&j;**** set vars point=rec;**** call symput('indpt',catx(' ',of v1-v3));**** stop;****run;**

**proc reg data=x;****model var1 = &indpt;****run;**

**quit;****%end;****%mend;**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

10-13-2015 12:04 PM

Thanks a lot, Mike!

I know it's unsual to "run all possible regressions", but this will allow me to exclude those with higher forecast errors.

Thanks again,

Lizi

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to liziwu

10-13-2015 12:56 PM - edited 10-13-2015 12:56 PM

Just because something can be programmed, it doesn't mean that it produces a good result.

In particular, the models that wind up with higher forecast errors may be due to lots of reasons, including multi-collinearity among the X-variables, and in any event @PGStats has provided a link that explains why this type of model selection is misleading at best, and should be avoided "because it violates every principle of statistical estimation and hypothesis testing".

So I'll say it again, your restriction of having exactly 5 independent variables in the model is silly and meaningless and most likely misleading; and again I recommend PROC PLS on all 19 independent variables.