BookmarkSubscribeRSS Feed
RobertNYC
Obsidian | Level 7

Hi All,

 

I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period.  I have about 900 variables which I need to reduce down before I run proc reg. 

 

When I build a logistic model I use proc varclust for data reduction. 

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 

 

Any suggestions are greatly appreciated

 

Thanks!

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@RobertNYC wrote:

Hi All,

 

Question:  Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model? 


I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.

 

I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.

--
Paige Miller
RobertNYC
Obsidian | Level 7

Thanks so much for your feadback.  Would you happen to have an example of proc pls code? 

Ksharp
Super User

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .

 

After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.

PaigeMiller
Diamond | Level 26

@Ksharp wrote:

My answer is no as Paige.

Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know 

which kind of distribution the y vaiable conform to .


My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.

--
Paige Miller
lakshmi_74
Quartz | Level 8
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.
PaigeMiller
Diamond | Level 26

@lakshmi_74 wrote:
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.

Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1603 views
  • 0 likes
  • 4 in conversation