Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Multiple Regression Using Proc Reg: Data Reduction for Variable Selec...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-07-2017 01:13 PM
(1738 views)

Hi All,

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period. I have about 900 variables which I need to reduce down before I run proc reg.

When I build a logistic model I use proc varclust for data reduction.

**Question**: Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?

Any suggestions are greatly appreciated

Thanks!

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@RobertNYC wrote:

Hi All,

Question: Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?

I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks so much for your feadback. Would you happen to have an example of proc pls code?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The online help has several examples.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know

which kind of distribution the y vaiable conform to .

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know

which kind of distribution the y vaiable conform to .

My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Without reducing variables you can reduce the dimensionality by using PCA analysis.

From there you can build a model for the data.

From there you can build a model for the data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@lakshmi_74 wrote:

Without reducing variables you can reduce the dimensionality by using PCA analysis.

From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.

--

Paige Miller

Paige Miller

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.