BookmarkSubscribeRSS Feed
potiu
Fluorite | Level 6

Dear Sir,

 

I have few questions regarding principle component analysis in Enterprise Miner. Below is my data process flow:

process flow.PNG

The transformation node is to convert categorical data to dummy since principle component only allow numerical value. I have tested 2 types of principle component nodes. The classification algorithms that I plan to use is Decision tree and Logistic regression.

The setting for the principle component nodes are below:

principle component setting.PNG

Principle component node setting

HP principal component setting.PNG

HP principal component node setting

 

The result for the nodes:

result pronciple component.PNG

principal component number.PNG

We select the number of component when eigenvalue is more than 1. In this case, there is 42 components but the selected number of component is 20. My first question is that does the Apply maximum number to Yes under Max Number cutoff section of the properties setting limit the component number to be 20 even though the actual number is 42?

 

Second question is when the principal component node and HP principal component node to be used for dimensional reduction.

My last question is whether Variable selection node can use to replace principle component node in dimensional reduction?

 

Can anyone explain more on this issue? 

 

Thank you in advance.

 

 

Regards,

Potiu

1 REPLY 1
PaigeMiller
Diamond | Level 26

I can't really explain the difference ... but you are doing a lot of work to FORCE your data into the form needed for principal (not "principle") components, specifically continuous variables, and my first thought is to not do this. The results of principal components could be highly dependent on how you perform this transformation from categorical variables to continuous variables. There may be some better way of handling the non-continuous variables. But since you didn't really say much about your data, it's hard to say.

 

Next, since you have a regression node, and I'm assuming that the output of principal components will be fed into the regression node ... DON'T DO THIS. Principal components is not looking to see whether or not the variables it selects are actually good predictors in the regression. Principal components could miss the variables that are good predictors in the regression. What should you do? Partial least squares (or PLS) regression! This picks combinations of variables that are good predictors, and as an extra added bonus, it has no trouble at all handling categorical variables as categorical. And so it's a lot simpler to do, there's no transformation of variables and there's no prior selecting of variables needed, PLS handles all of this.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1786 views
  • 0 likes
  • 2 in conversation