Programming the statistical procedures from SAS

Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Reply
Frequent Contributor
Posts: 98

Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Hi All,

 

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period.  I have about 900 variables which I need to reduce down before I run proc reg. 

 

When I build a logistic model I use proc varclust for data reduction. 

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 

 

Any suggestions are greatly appreciated

 

Thanks!

Trusted Advisor
Posts: 1,431

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

[ Edited ]

RobertNYC wrote:

Hi All,

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 


I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

 

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

Frequent Contributor
Posts: 98

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Thanks so much for your feadback.  Would you happen to have an example of proc pls code? 

Trusted Advisor
Posts: 1,431

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Grand Advisor
Posts: 9,458

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .

 

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

Trusted Advisor
Posts: 1,431

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection


Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .


My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

Contributor
Posts: 56

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.
Trusted Advisor
Posts: 1,431

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection


lakshmi_74 wrote:
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.

Ask a Question
Discussion stats
  • 7 replies
  • 120 views
  • 0 likes
  • 4 in conversation