How to build a model within 11,651 variables? - Page 2

SteveDenham · Posted 12-19-2014 02:16 PM

True! And it can incorporate 's work regarding validation, using the CROSSVAL options.

I was still afraid that 11K right-hand side variables could overwhelm even PLS, but given your endorsement for that size, I will defer. It is the method for use.

And if the OP is interested in variable reduction, they can use the output as an exploratory tool in identifying strongly determining factors and strongly redundant variables.

Steve Denham

PaigeMiller · Posted 12-19-2014 02:27 PM

Well, I didn't say the OP has enough hardware resources, but that's a different issue. PLS doesn't require inverting a matrix if you use the NIPALS Algorithm (which is the default in PROC PLS), so it doesn't really require huge amounts of memory, and its pretty fast. If her machine can handle it, then that's the way to go.

--
Paige Miller

Ksharp · Posted 12-20-2014 07:36 AM

Hi Steve,

I also am considering another way : Cronbach’s Coefficient Alpha

What do you say ? Make some sense ?

Xia Keshan

art297 · Posted 12-20-2014 10:50 AM

I don't think so. Chronbach's Alpha is a measure if internal consistency. I would start with a factor analysis but, rather than try to simply eliminate variables, would attempt to group like measures together (e.g., sum or mean of the items in a group) in order to reduce the number of variables.

Ksharp · Posted 12-23-2014 03:55 AM

How about PROC GLMSELECT to select subset of variables ?

SteveDenham · Posted 12-30-2014 09:22 AM

GLMSELECT doesn't support methods such as LASSO, only stepwise-like methods, so it runs into the same problems I outlined in my earlier post.

Steve Denham

gergely_batho · Posted 12-30-2014 05:16 PM

GLMSELECT has LASSO!

SteveDenham · Posted 12-31-2014 12:37 PM

That is good news.

Steve Denham

call_me_elaine · Posted 12-19-2014 06:17 PM

Thanks so much for your reply. You just gave me so much information and I need some time to understand it since I am totally totally a new user. At least I know stepwise is not the solution.:)

call_me_elaine · Posted 12-19-2014 06:15 PM

Thanks so much for your reply. You just gave me so much information and I need some time to understand it since I am totally totally a new user. At least I know stepwise is not the solution.

BTW, what is OP short for?

art297 · Posted 12-19-2014 06:22 PM

OP is short for Original Poster.

I'm not a statistician, but have taken numerous statistic's courses. I think that factor analysis, or PCP, will be a good first step to reduce your number of variables. I.e., you could create combined scores that take into account collections of grouped measures. However, that would require business knowledge to determine whether the results (i.e., what to combine) seem to make sense.

PG's proposal was simply to create a model on a sample of your data and see if it held up on another sample of your data.

call_me_elaine · Posted 12-19-2014 06:36 PM

Yes that's what I am thinking now. Do the variable reduction first. I tried PCA today and there are 160 components in the model where MSE is 0.0106. However, I didn't get the coefficient of these components, so I don't know how to explain it.

I guess what PG means is the N fold cross-validation.

Thanks,

Elaine

stat_sas · Posted 12-20-2014 03:00 PM

If you are looking for interpretation based on the original variables as key drivers then on the same lines as suggested by Art there is a procedure proc varclus that provides groups of variables which are internally highly correlated but have very small association with variables belong to other groups. You can use your experience along with information provided by proc varclus to select few number of variables for further analysis.

call_me_elaine · Posted 12-19-2014 04:01 PM

I did try PCA and used the first 160 components. The MSE is 0.0106. But I didn't get the coefficient of these components and what are these components. I got the results but just don't know how to explain it.

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Re: How to build a model within 11,651 variables?

Registration is open

SAS Training: Just a Click Away