turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Multiple Regression Using Proc Reg: Data Reductio...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-07-2017 01:13 PM

Hi All,

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period. I have about 900 variables which I need to reduce down before I run proc reg.

When I build a logistic model I use proc varclust for data reduction.

**Question**: Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?

Any suggestions are greatly appreciated

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-07-2017 01:48 PM - edited 04-07-2017 01:48 PM

RobertNYC wrote:

Hi All,

Question: Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?

I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-07-2017 01:56 PM

Thanks so much for your feadback. Would you happen to have an example of proc pls code?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-07-2017 02:04 PM

The online help has several examples.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-07-2017 11:19 PM

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know

which kind of distribution the y vaiable conform to .

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-11-2017 02:49 PM

Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know

which kind of distribution the y vaiable conform to .

My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2017 03:06 PM

Without reducing variables you can reduce the dimensionality by using PCA analysis.

From there you can build a model for the data.

From there you can build a model for the data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2017 03:48 PM

lakshmi_74 wrote:

Without reducing variables you can reduce the dimensionality by using PCA analysis.

From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.