@David_M wrote:
My original goal (which I hope I clearly stated and apologies if I didn't) was to reduce the number of highly correlated 35 mixed type variables to a lesser number that are not or much less correlated with each other. Won't the PROC CORR function with the /corrb option show correlation coefficients that could be used to target high correlation variables for elimination?
Maybe, maybe not. You could, for example, remove one of the two variables which have coefficients highly correlated with each other. You may remove the best predictor of the two, which would not be good. But multicollinearity takes other forms, such as a linear combination of three or more variables produces an almost constant result. In that case, maybe the correlation of the coefficients won't show that because it is looking at pairwise correlations, but this multicollinearity cannot be seen pairwise, it can be seen if you look at the three or more variables. This type of multicollinearity will impact the quality of the model fit, but may not be seen from pairwise correlations.
Won't the PLS procedure you suggested do something similar?
No, that's not what PLS does. PLS leaves all the variables in the model, and fits the model in such a way that it is robust to the effects of multicollinearity. In the paper by Tobias, he has 1,000 predictor variables, and the PLS model uses all 1,000 variables, and Tobias decides that the model is useful.
... View more