@PaigeMiller wrote:
I still don't see how this can be better from a logical point of view than Partial Least Squares, which goes ahead and finds vectors that are predictive, rather than starting with vectors that are not chosen because they are predictive, and then finding some that are predictive.
I agree with your opinion on the disadvantages of that method. I am not an advocator of that method neither. I posted this method online and invited other people to make comments on that because I think that during the discussion and opinion-exchange process, one can deepen his or her understanding on principal component analysis, variable selection, and possibly other issues on statistics as well. That is good for all of the people participating in the discussion.
Aside from the disadvantages I mentioned in my reply to @DanObermiller, a major limitation of the method is the arbitrariness of the definition of "drastic" decrease of eigenvalues. In statistics, not all situations have an objective criterion to distinguish "good" from "bad". The choose of the method for the selection of smoothing parameters and degrees of freedom of the spline effects in generalized additive models is one of the situations. Is there a number that can tell you which method is better, generalized cross-validation (GCV) and unbiased risk estimator (UBRE)? To the best of my knowledge, in many cases, the answer is "no". I learnt from experience of building generalized additive models that in many cases the two methods produce similar yet different results, both of which are useful and "valid" (no overfitting observed). If that is the case, both models are acceptable.
However, in the case of variable selection, there are already statistical methods other than the method I mentioned in dealing with collinearity. These methods are based on objective criteria. So why should we abandon the objective one and embrace the subjective and defective one?
@PaigeMiller wrote:
But both the method mentioned by @Season and the modified method from @DanObermiller still assume that variable selection is an important step, and yet the paper from Tobias (and from hundreds or thousands of other authors) using PLS simply skips the variable selection step and they get useful models.
Yes, I have been fully acknowledged that variable selection may not be a necessary process in regression. Yet, as I have mentioned, I am not an advocator of the variable selection method I mentioned. I am neither an advocator of "variable selection is a must in regression" theory now.
... View more