11-03-2014 04:27 AM
I have a dilemma and although I have searched the internet for answers I'm still confused.
I have a dataset with about 400 variables. I would like to reduce their number by applying proc varclus and then retaining a variable/cluster by using the centroid method.
Now, if I understood correctly this procedure is based on the R-squared that implies linearity. It's a powerfull hypothesis that I cannot test on all 400 variables.
My question is, does the procedure work for nonlinear relationships or not?
Are there any papers, as far as you know that treat this subject (proc varclus and non-linearity) that I could read?
11-03-2014 05:23 AM
Yes, proc varclus (like PCA and FACTOR) implies linearity (and also normality).
On the other hand in a datamining context it usually works (unless you have extreme nonlinearities).
Search for "nonlinear PCA", "PROC PRINQUAL" ,"PROC NEURAL" if you worry about nonlinearity.
nonlinear PCA here: