BookmarkSubscribeRSS Feed
RobertNYC
Obsidian | Level 7

Hi All,

 

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period.  I have about 900 variables which I need to reduce down before I run proc reg. 

 

When I build a logistic model I use proc varclust for data reduction. 

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 

 

Any suggestions are greatly appreciated

 

Thanks!

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@RobertNYC wrote:

Hi All,

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 


I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

 

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

--
Paige Miller
RobertNYC
Obsidian | Level 7

Thanks so much for your feadback.  Would you happen to have an example of proc pls code? 

Ksharp
Super User

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .

 

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

PaigeMiller
Diamond | Level 26

@Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .


My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

--
Paige Miller
lakshmi_74
Quartz | Level 8
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.
PaigeMiller
Diamond | Level 26

@lakshmi_74 wrote:
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1570 views
  • 0 likes
  • 4 in conversation