## Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Frequent Contributor
Posts: 101

# Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Hi All,

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period.  I have about 900 variables which I need to reduce down before I run proc reg.

When I build a logistic model I use proc varclust for data reduction.

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?

Any suggestions are greatly appreciated

Thanks!

Posts: 2,065

## Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

[ Edited ]

RobertNYC wrote:

Hi All,

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?

I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

--
Paige Miller
Frequent Contributor
Posts: 101

## Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Thanks so much for your feadback.  Would you happen to have an example of proc pls code?

Posts: 2,065

Super User
Posts: 10,214

## Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know

which kind of distribution the y vaiable conform to .

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

Posts: 2,065

## Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know

which kind of distribution the y vaiable conform to .

My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

--
Paige Miller
Contributor
Posts: 57

## Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.
Posts: 2,065

## Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

lakshmi_74 wrote:
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.

--
Paige Miller
Discussion stats
• 7 replies
• 186 views
• 0 likes
• 4 in conversation