Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- Using PCA for modeling and turning back the coefficients

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-08-2014 07:37 AM
(2428 views)

Hi ,

How would i proceed if due to multicollinearity between variables i was to use pca, derive some components, use them to run the model, how would i be able to reverse the coefficients in order to get the true effect of the variables in the model?

Thank you in advance

25 REPLIES 25

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think you are into an area that the research is still up in the air. Time series on principal components strikes me as a very difficult, but interesting, area. What does PROC PANEL give you (or not give you)? It deals with multivariate time series.

My fear about the PCA approach is that the loadings on the components will differ at each time point, meaning that you aren't really looking at the same "variables". I don't know enough about copulas (and PROC COPULA) to suggest them as an approach, but the theory there is much more developed.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you want to derive components for a regression model, then PROC PLS will do a better job than Principal Components. The DETAILS option in PROC PLS computes the regression coefficients for you.

You could also use the METHOD=PCR option in PROC PLS to force the procedure to provide Principal Components Regression and related components and model coefficients, but I would not recommend this, as the PLS model (and components) will fit better.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I would suggest use proc varclus to retain original variables in the model instead of components. This will help to reduce multicollinearity as well as to measure true effect of original variables.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The problem with PCA, and the problem with VARCLUS, in this situation, is that they find combinations of predictor variables that may or may not be predictive of the response variable(s). PLS specifically tries to find components that are predictive of the response variable(s) and hence will product better fits.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The whole idea of finding significance of original variables in a situations where the predictors are multicollinear seems to me to be the wrong question to ask.

You will always be misled in this multicollinear situation by asking which are the "real" or "significant" predictors. It is impossible to tell, empirically.

So PLS doesn't answer the question. It gives you a (hopefully) good predictive model, and in many situations, it also gives interpretable loadings to help you understand what *combinations of variables* are predictive.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

... but can not measure significance of a predictor in explaining a response variable

correct, it cannot do this, because logically, in the multicollinearity situation, this idea of "significance of a predictor" makes no sense

I am not sure what type of rotation involved in PLS that will make the interpretation even more complicated.

No rotation is used

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks again!

This is going to be a interesting discussion. Without rotation, if one factor is evenly loaded on two or more variables, how can we decide which of these variables more predictive?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

stat@sas wrote:

This is going to be a interesting discussion. Without rotation, if one factor is evenly loaded on two or more variables, how can we decide which of these variables more predictive?

I keep saying, you can't do this. PLS reports combinations of variables are predictive. PLS does not attempt to single out an individual variable. Nor should you attempt to single out an individual variable.

Furthermore, in the case of multicollinearity, you cannot logically single out a variable to be "more predictive". For example, you have X1, X2 and X3 all with correlations of about 0.8 with each other. You also have Y, predicted by X1, X2 and X3. Can you say using any logical method that X1 is the variable that is "more predictive" if X2 and X3 are moving together with X1? No of course not. You may run a statistical procedure that reports slopes and statistical significances and one of those will be the "winner", but that doesn't take into account the logical impossibility of separating the three effects which are correlated into a single "winner". Thus, PLS reports the combination of X1 X2 and X3 is predictive, and does not single one out. Ordinary Least Squares regression fails miserably in this situation (although the algorithm will certainly produce results)

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks **PaigeMiller.**

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not sure exactly what you mean by "how to extract the factors", can you explain further what these "factors" are (since "factors" is not really a term used in PLS).

If by "factors" you mean "loadings", then its easy to obtain those from PROC PLS. If you mean something else, then you need to explain what you mean.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The "factors" I am interested in are the "Number of Extracted Factors" that is displayed as part of the "Percent Variation Accounted for by Partial Least Squares Factors".

With PROC FACTOR, you can OUTSTAT a dataset which can be used by PROC SCORE to generate "factors" to use in a regression analysis. I want to do the same in terms of extracting these "factors" from PROC PLS.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.