Programming the statistical procedures from SAS

Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Reply
Frequent Contributor
Posts: 101

Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Hi All,

 

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period.  I have about 900 variables which I need to reduce down before I run proc reg. 

 

When I build a logistic model I use proc varclust for data reduction. 

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 

 

Any suggestions are greatly appreciated

 

Thanks!

Respected Advisor
Posts: 2,065

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

[ Edited ]
Posted in reply to RobertNYC

RobertNYC wrote:

Hi All,

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 


I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

 

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

--
Paige Miller
Frequent Contributor
Posts: 101

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Posted in reply to PaigeMiller

Thanks so much for your feadback.  Would you happen to have an example of proc pls code? 

Respected Advisor
Posts: 2,065

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Posted in reply to RobertNYC
Super User
Posts: 10,214

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Posted in reply to RobertNYC

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .

 

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

Respected Advisor
Posts: 2,065

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection


Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .


My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

--
Paige Miller
Contributor
Posts: 57

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Posted in reply to RobertNYC
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.
Respected Advisor
Posts: 2,065

Re: Multiple Regression Using Proc Reg: Data Reduction for Variable Selection

Posted in reply to lakshmi_74

lakshmi_74 wrote:
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.

--
Paige Miller
Ask a Question
Discussion stats
  • 7 replies
  • 186 views
  • 0 likes
  • 4 in conversation