BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Season
Pyrite | Level 9

Thank you very much, Koen, for your more detailed explanation!


@sbxkoenk wrote:

I would like to raise a brief question for the sake of selecting a possible "shortcut": do you think that in the situation I encounter, only "predictive Partial Least Squares regression", but not other kinds of PLS, is suitable for reaching the goal I previously mentioned (tackle collinearity and conducting variable selection at the same time in a multivariate linear regression)? If so, maybe I do not need to know every kind of PLS to reach my goal.

If I am right your output block has only one response variable, so you are doing multiple regression analysis and NOT multivariate regression!


I am not sure what the noun "output block" means, but the model I am attempting to build does contain merely one dependent variable. I know that Bayesian neural network and structural equation model can be used to deal with situations in which more than one dependent variable is modeled. These situations are far more complicated than the one I am encountering. I am building a regression model with only one dependent variable.

I wonder whether a composite (more than one) dependent variable, instead of solely one dependent variable is involved in the modeling equals "output block has more than one response variable". But anyway, I appreciate your pointing out the subtle yet maybe significant difference of my nomenclature of the analysis I am attempting to perform.

sbxkoenk
SAS Super FREQ

@Season wrote:

I am not sure what the noun "output block" means, but the model I am attempting to build does contain merely one dependent variable. I know that Bayesian neural network and structural equation model can be used to deal with situations in which more than one dependent variable is modeled. These situations are far more complicated than the one I am encountering. I am building a regression model with only one dependent variable.

I wonder whether a composite (more than one) dependent variable, instead of solely one dependent variable is involved in the modeling equals "output block has only one response variable". But anyway, I appreciate your pointing out the subtle yet maybe significant difference of my nomenclature of the analysis I am attempting to perform.


PROC PLS , HPPLS and PLSMOD can also deal with more than one dependent variable.

 

Partial Least Squares (PLS) is very popular in process manufacturing as an efficient approach for multivariate statistical process monitoring.
Multivariate Analysis (MVA) can mean two things:

  • there are only input variables (correspondence analysis, PCA, Factor analysis, Multidimensional scaling (MDS) ...)
  • if there is an output block, then Multivariate Analysis (MVA) means "solving problems where more than one dependent variable is analyzed" (all dependent variables are simultaneously explained / predicted in just one model).

But PROC PLS works perfectly fine as well if your output block has only one variable!
Partial Least Squares was for example available in Enterprise Miner 15.2 (but it is no longer available in VIYA Visual Data Mining and Machine Learning with Model Studio).

 

Good luck,
Koen

Season
Pyrite | Level 9

Thank you for your reply!


@sbxkoenk wrote:

@Season wrote:

I wonder whether a composite (more than one) dependent variable, instead of solely one dependent variable is involved in the modeling equals "output block has only one response variable".


Sorry for the mistake I made here. I did mean "output block has more than one response variable". 


@sbxkoenk wrote:
Multivariate Analysis (MVA) can mean two things:
  • there are only input variables (correspondence analysis, PCA, Factor analysis, Multidimensional scaling (MDS) ...)
  • if there is an output block, then Multivariate Analysis (MVA) means "solving problems where more than one dependent variable is analyzed" (all dependent variables are simultaneously explained / predicted in just one model).

Thank you for your introduction to the difference between the two concepts! The statistics I learnt in school deal with at most one dependent variable. So I did not know these knowledge before.

Season
Pyrite | Level 9

Thank you, Rick, for giving a direct answer to the question I raised in the first place! I come to acknowledge the fact rather than being a variable selection method, principal component analysis is a data-dimension lowering method. Yet I would like to point out the reason why I planned to choose the method in the first place: I wish to tackle collinearity in multivariate linear regression. I set up a higher goal, which is to tackle collinearity while conducting variable selection simultaneously. I want my model to be concise. But to the best of my knowledge, principal component analysis does not seem to be able to reach that goal, so I came here to see whether I was wrong. Now that my notion is correct, I am going to abandon principal component analysis and try other methods to reach my goal.

I would also like to consult on other methods dealing with collinearity. I am interested in the features (e.g. advantages and disadvantages) of partial least squares, ridge regression and LASSO. I wonder if you could give me a helping hand on that.

PaigeMiller
Diamond | Level 26

I set up a higher goal, which is to tackle collinearity while conducting variable selection simultaneously.

 

If you still insist on having a step for variable selection (as I mentioned, it is not a necessary step if you use PLS), then you can interpret PLS as handling collinearity while conducting variable selection simultaneously (the variables with loadings close to zero are not selected, the variable with loadings not close to zero are selected).

--
Paige Miller
Season
Pyrite | Level 9

Thank you for your reply. Well, actually, I am not "insisting" on variable selection... I just want to try different modeling strategies and pick up a model that works "best" eventually. I am not that concerned on whether variable selection is performed in the process. My top priority is the "goodness" of the model. Conciseness of the model is one of the priorities that follow. When it comes to these issues, models with less variables and similar "goodness" to the "full" model may be better. That is when variable selection could play a more major role.

Season
Pyrite | Level 9

Hello, everyone. I read a piece of information about PCA today, with a method of using PCA to perform variable selection during linear regression described in it:

(1) Standardize all of the independent variables;

(2) Calculate the eigenvalues of the correlation matrix of the original variables and the corresponding eigenvectors (since the original variables has been standardized);

(3) Find the principal component with the smallest eigenvalue, namely the last one;

(4) Find the variable whose absolute value of the coefficient is the largest in that (the last) principal component;

(5) Delete that variable;

(6) Repeat steps (2) to (5) until the last eigenvalue does not drastically decrease (the cut-off value for "drastic" was not mentioned);

(7) Use the remaining variables to do linear regression.

I wonder if you could give some comments on that method.

DanObermiller
SAS Employee

I have seen this method used before, and even tried it myself. The biggest problem is that PCA is performed on the independent variables. There is nothing in PCA that says the lowest eigenvalues correspond to the worst predictors. So removing predictors based on the smallest eigenvalue does not mean you are keeping the best variables.

 

If you want to try this approach, I would amend it slightly. Perform PCA. Use all PCs to predict Y and then only keep the significant PCs. This will only work if you have more observations than variables. If you have more variables, perhaps consider a forward stepwise regression to choose the PCs that are significant.

 

Once you have the set of significant PCs there two common options: 1) From the significant PCs, choose the original variables that have the highest loadings to build your final model or 2) from the insignificant PCs, choose the original variables with the highest loadings to remove from your final model.

 

Note that both of these approaches have some issues, but they still viable approaches.

PaigeMiller
Diamond | Level 26

@DanObermiller wrote:

I have seen this method used before, and even tried it myself. The biggest problem is that PCA is performed on the independent variables. There is nothing in PCA that says the lowest eigenvalues correspond to the worst predictors. So removing predictors based on the smallest eigenvalue does not mean you are keeping the best variables.

 

If you want to try this approach, I would amend it slightly. Perform PCA. Use all PCs to predict Y and then only keep the significant PCs. This will only work if you have more observations than variables. If you have more variables, perhaps consider a forward stepwise regression to choose the PCs that are significant.

 

Once you have the set of significant PCs there two common options: 1) From the significant PCs, choose the original variables that have the highest loadings to build your final model or 2) from the insignificant PCs, choose the original variables with the highest loadings to remove from your final model.

 

Note that both of these approaches have some issues, but they still viable approaches.


I still don't see how this can be better from a logical point of view than Partial Least Squares, which goes ahead and finds vectors that are predictive, rather than starting with vectors that are not chosen because they are predictive, and then finding some that are predictive.

 

But both the method mentioned by @Season and the modified method from @DanObermiller still assume that variable selection is an important step, and yet the paper from Tobias (and from hundreds or thousands of other authors) using PLS simply skips the variable selection step and they get useful models.

--
Paige Miller
DanObermiller
SAS Employee

I agree with @PaigeMiller in that PLS seems like a better approach. The critical question is do you absolutely NEED variable selection. PCA is not designed for doing variable selection. There are some approaches where it can be used, but the tool was not designed for that purpose.

Season
Pyrite | Level 9

@PaigeMiller wrote:


I still don't see how this can be better from a logical point of view than Partial Least Squares, which goes ahead and finds vectors that are predictive, rather than starting with vectors that are not chosen because they are predictive, and then finding some that are predictive.


I agree with your opinion on the disadvantages of that method. I am not an advocator of that method neither. I posted this method online and invited other people to make comments on that because I think that during the discussion and opinion-exchange process, one can deepen his or her understanding on principal component analysis, variable selection, and possibly other issues on statistics as well. That is good for all of the people participating in the discussion.

Aside from the disadvantages I mentioned in my reply to @DanObermiller, a major limitation of the method is the arbitrariness of the definition of "drastic" decrease of eigenvalues. In statistics, not all situations have an objective criterion to distinguish "good" from "bad". The choose of the method for the selection of smoothing parameters and degrees of freedom of the spline effects in generalized additive models is one of the situations. Is there a number that can tell you which method is better, generalized cross-validation (GCV) and unbiased risk estimator (UBRE)? To the best of my knowledge, in many cases, the answer is "no". I learnt from experience of building generalized additive models that in many cases the two methods produce similar yet different results, both of which are useful and "valid" (no overfitting observed). If that is the case, both models are acceptable.

However, in the case of variable selection, there are already statistical methods other than the method I mentioned in dealing with collinearity. These methods are based on objective criteria. So why should we abandon the objective one and embrace the subjective and defective one?


@PaigeMiller wrote:

But both the method mentioned by @Season and the modified method from @DanObermiller still assume that variable selection is an important step, and yet the paper from Tobias (and from hundreds or thousands of other authors) using PLS simply skips the variable selection step and they get useful models.


Yes, I have been fully acknowledged that variable selection may not be a necessary process in regression. Yet, as I have mentioned, I am not an advocator of the variable selection method I mentioned. I am neither an advocator of "variable selection is a must in regression" theory now.

Season
Pyrite | Level 9

Thank you for your comments and the amended method you offer!


@DanObermiller wrote:

The biggest problem is that PCA is performed on the independent variables. There is nothing in PCA that says the lowest eigenvalues correspond to the worst predictors. So removing predictors based on the smallest eigenvalue does not mean you are keeping the best variables.


I totally agree with your opinion. That is a major limitation of principal component analysis when it comes to regression. I have read articles discussing the arbitrariness of variable selection based on P-values, as they are computed from the data from a single sample. Therefore, resampling strategies like Bootstrap, Jackknife and cross-validation have been introduced to the variable selection process in some cases (e.g. building a prediction model). I think this method is worse than the variable selection based on P-values, as both the regression coefficients and their corresponding t-statistics from which P-values are computed take information both pertaining to the matrix of independent variables (X-matrix) and the vector of dependent variables into account when ordinary least squares is applied. However, only using the information of X-matrix, this method has the deficiency of, as you have mentioned, the lack of correlation between the largest load in the principal component with the smallest eigenvalue and insignificance in determining/predicting the dependent variable. I think with deductive reasoning that this method is only better than simply deleting professionally relatively insignificant or unimportant variables when dealing with collinearity, as the latter is based on nothing but the researcher's subjective perception on the significance and/or importance of the independent variables when it comes to determining/predicting the dependent variable. The method I mentioned takes data into account more or less, but that is not enough.
The only advantage for the method I mentioned may be the resulting unbiased estimate of the regression coefficients. Linear regression based on principal components (also known as principal component regression) produces biased estimates of the regression coefficients. To the best of my knowledge, things are same (i.e. producing biased estimates of the regression coefficients) for ridge regression and LASSO. I do not know if that is also the case for partial least squares. But the sacrifice for an unbiased estimate of the regression coefficients may be huge- the researcher may have deleted statistically and professionally significant variables in the variable selection process without noticing the situation. The only circumstance I can think of in which this method is a good one is that the researcher only wants to know the relation between the dependent variable and the independent variables that remain after the variable selection method I mentioned. In other words, the researcher does not care about the effects of the variables deleted. If that is the researcher's goal, this method may be a good one. But I don't think that this situation is common.

PaigeMiller
Diamond | Level 26

This method ignores the fact that variables are deleted without regard to whether or not they are good predictors or not. The Y variable is simply not used here. You could delete some very predictor variables and not realize it.

--
Paige Miller
Season
Pyrite | Level 9

Thank you for your comments! As you have mentioned, the fact that principal component analysis takes only the X-matrix into account is a major inherent limitation of it.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 29 replies
  • 1275 views
  • 22 likes
  • 5 in conversation