05-07-2015 04:23 PM
This is probably a reflection of my naivete but how, once running the principal component node, does one identify which dimensions (which means to me fields or variables) that the PCA selected to keep? I have fiddled with this node for a while and notice that when the max selector is taken off, in my model at least, the PCA node selects the same number of PCs that i have as input(independent) variables. Additionally in the results it has the inputs listed alphabetically and in another table has the exact same number of eigenvalues listed by value size but the naming convention is PC-1 PC-2 etc rather than the name of the field chosen.So i am guessing there is some kind of one-to-one correlation but I am not figuring out how this worked. I have a basic understanding of how PCAs work and what eigenvalues and vectors are.
05-07-2015 09:39 PM
The method uses all variables, but transforms them via linear combinations. You can then choose to use only a certain amount of the eigenvectors in your regression, but you still need all of your original variables to create the eigenvectors.
05-08-2015 09:11 AM
A PC is a linear combination of the input variables, not a single variable. You can see from the Principal Components Coefficient plot or table how each variable contributes to each PC. Or you can view the Score Code to see the linear equation for creating each PC variable, e.g. (JOB and REASON are nominal inputs):