Hi All,
I want to build some multiple linear regression models using proc reg to predict estimated customer product spend over a 12 month period. I have about 900 variables which I need to reduce down before I run proc reg.
When I build a logistic model I use proc varclust for data reduction.
Question: Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?
Any suggestions are greatly appreciated
Thanks!
@RobertNYC wrote:
Hi All,
Question: Any suggestions on a good way to reduce input variables before using proc reg? Can I use proc varclust for data reduction when building a linear model?
I'm going to answer with "No". There are no good ways to do this in PROC REG (or elsewhere). Stepwise regression has been discredited by many statisticians.
I would suggest leaving all 900 variables in the model and use PROC PLS. It will assign high (either positive or negative) weights/loadings to the variables that are predictive of the response variable, and it will assign weights/loadings close to zero to the variables that are not important.
Thanks so much for your feadback. Would you happen to have an example of proc pls code?
The online help has several examples.
My answer is no as Paige.
Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know
which kind of distribution the y vaiable conform to .
After that I would suggest PROC ADAPATIVE which can take into account the non-linear effect.
@Ksharp wrote:
My answer is no as Paige.
Use PROC HPGENSELECT to reduce the number of variables, but you need firstly know
which kind of distribution the y vaiable conform to .
My problem with PROC HPGENSELECT is that it takes a widely discredited idea (stepwise, backwards or forward selection) and makes it into a HP procedure, meaning you can run it on huge amounts of data. The result of the method is still suspect, according to many statisticians. Note: I have no experience with the LASSO technique, so I am not going to comment on that.
@lakshmi_74 wrote:
Without reducing variables you can reduce the dimensionality by using PCA analysis.
From there you can build a model for the data.
Which is why I keep recommending Partial Least Squares analysis, it also reduces the dimensionality, but does so in a way that is superior to PCA in this situation -- PLS finds dimensions that are predictive of Y, which is not something PCA tries to do.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.