turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Principal Components

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-07-2015 04:23 PM

This is probably a reflection of my naivete but how, once running the principal component node, does one identify which dimensions (which means to me fields or variables) that the PCA selected to keep? I have fiddled with this node for a while and notice that when the max selector is taken off, in my model at least, the PCA node selects the same number of PCs that i have as input(independent) variables. Additionally in the results it has the inputs listed alphabetically and in another table has the exact same number of eigenvalues listed by value size but the naming convention is PC-1 PC-2 etc rather than the name of the field chosen.So i am guessing there is some kind of one-to-one correlation but I am not figuring out how this worked. I have a basic understanding of how PCAs work and what eigenvalues and vectors are.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cophbulls

05-07-2015 09:39 PM

The method uses all variables, but transforms them via linear combinations. You can then choose to use only a certain amount of the eigenvectors in your regression, but you still need all of your original variables to create the eigenvectors.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-08-2015 07:57 AM

OK,but that still doesn't answer the question of how do I know which PC is which variable when the naming convention on the PC is PC-1, PC-2 etc

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cophbulls

05-08-2015 09:11 AM

A PC is a linear combination of the input variables, not a single variable. You can see from the Principal Components Coefficient plot or table how each variable contributes to each PC. Or you can view the Score Code to see the linear equation for creating each PC variable, e.g. (JOB and REASON are nominal inputs):

PC_1 =

2.4684632E-7*JOB_1_+

8.5513858E-7*JOB_2_+

9.0167328E-7*JOB_3_+

2.0593746E-6*JOB_4_+

1.6843513E-6*JOB_5_+

1.2073992E-7*JOB_6_+

2.8210926E-7*JOB_7_+

2.4579164E-7*REASON_1_+

4.0859942E-6*REASON_2_+

1.8184475E-6*REASON_3_+ ...