True! And it can incorporate 's work regarding validation, using the CROSSVAL options.
I was still afraid that 11K right-hand side variables could overwhelm even PLS, but given your endorsement for that size, I will defer. It is the method for use.
And if the OP is interested in variable reduction, they can use the output as an exploratory tool in identifying strongly determining factors and strongly redundant variables.
Steve Denham
Well, I didn't say the OP has enough hardware resources, but that's a different issue. PLS doesn't require inverting a matrix if you use the NIPALS Algorithm (which is the default in PROC PLS), so it doesn't really require huge amounts of memory, and its pretty fast. If her machine can handle it, then that's the way to go.
Hi Steve,
I also am considering another way : Cronbach’s Coefficient Alpha
What do you say ? Make some sense ?
Xia Keshan
I don't think so. Chronbach's Alpha is a measure if internal consistency. I would start with a factor analysis but, rather than try to simply eliminate variables, would attempt to group like measures together (e.g., sum or mean of the items in a group) in order to reduce the number of variables.
How about PROC GLMSELECT to select subset of variables ?
GLMSELECT doesn't support methods such as LASSO, only stepwise-like methods, so it runs into the same problems I outlined in my earlier post.
Steve Denham
GLMSELECT has LASSO!
That is good news.
Steve Denham
Thanks so much for your reply. You just gave me so much information and I need some time to understand it since I am totally totally a new user. At least I know stepwise is not the solution.:)
Thanks so much for your reply. You just gave me so much information and I need some time to understand it since I am totally totally a new user. At least I know stepwise is not the solution.
BTW, what is OP short for?
OP is short for Original Poster.
I'm not a statistician, but have taken numerous statistic's courses. I think that factor analysis, or PCP, will be a good first step to reduce your number of variables. I.e., you could create combined scores that take into account collections of grouped measures. However, that would require business knowledge to determine whether the results (i.e., what to combine) seem to make sense.
PG's proposal was simply to create a model on a sample of your data and see if it held up on another sample of your data.
Yes that's what I am thinking now. Do the variable reduction first. I tried PCA today and there are 160 components in the model where MSE is 0.0106. However, I didn't get the coefficient of these components, so I don't know how to explain it.
I guess what PG means is the N fold cross-validation.
Thanks,
Elaine
If you are looking for interpretation based on the original variables as key drivers then on the same lines as suggested by Art there is a procedure proc varclus that provides groups of variables which are internally highly correlated but have very small association with variables belong to other groups. You can use your experience along with information provided by proc varclus to select few number of variables for further analysis.
I did try PCA and used the first 160 components. The MSE is 0.0106. But I didn't get the coefficient of these components and what are these components. I got the results but just don't know how to explain it.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.