Obsidian | Level 7

## Principal Component Analysis - Optimal number of retained factors

I am trying to do dimension reduction using Principal Component Analysis.  The dataset have 25 variables and 300K obs. The data is for segmentation using 2-stage clustering (K-means clustering then Linkage clustering)

What's the good practices for deciding the number retained factors. Is criteria #1 good enough?

Criteria #1:  eigenvalue>1

->  5 Factors with 54% variation explained.  Is the variation explained too low?  Should i use eigenvalue>0.7 and

Criteria #2:  eigenvalue>0.7 and Variation explained > 0.7

-> 10 Factors with 78% Variation explained

``````ods graphics on;

scree score;
var _numeric_

run;
ods graphics off;``````

 # Eigenvalue Difference Proportion Cumulative 1 4.59175884 3.04985582 0.2551 0.2551 2 1.54190302 0.12646714 0.0857 0.3408 3 1.41543588 0.23647927 0.0786 0.4194 4 1.17895661 0.09521203 0.0655 0.4849 5 1.08374458 0.16097769 0.0602 0.5451 6 0.92276689 0.04595209 0.0513 0.5964 7 0.8768148 0.00522994 0.0487 0.6451 8 0.87158485 0.06006623 0.0484 0.6935 9 0.81151862 0.05330799 0.0451 0.7386 10 0.75821063 0.06929076 0.0421 0.7807 11 0.68891987 0.05741897 0.0383 0.819 12 0.6315009 0.02247774 0.0351 0.8541 13 0.60902316 0.01814238 0.0338 0.8879 14 0.59088079 0.03822865 0.0328 0.9207
1 ACCEPTED SOLUTION

Accepted Solutions
Diamond | Level 26

## Re: Principal Component Analysis - Optimal number of retained factors

Honestly, I think the answer is totally subjective here. I don't believe that there is a universally accepted answer. The scree plot might indicate 7 factors.

However, I would say that if you (for example) choose the 5 factor solution, but find that factor 6 has a clear interpretation that makes sense in your application, that's a (again subjective) reason to include factor 6.

As far as the question about is 54% of the explained variability enough ... again there is no universal answer here, especially since every situation is different. For some data in some fields of application, 54% might be fantastic, while in other fields of application 54% might be poor.

--
Paige Miller
3 REPLIES 3
Diamond | Level 26

## Re: Principal Component Analysis - Optimal number of retained factors

Honestly, I think the answer is totally subjective here. I don't believe that there is a universally accepted answer. The scree plot might indicate 7 factors.

However, I would say that if you (for example) choose the 5 factor solution, but find that factor 6 has a clear interpretation that makes sense in your application, that's a (again subjective) reason to include factor 6.

As far as the question about is 54% of the explained variability enough ... again there is no universal answer here, especially since every situation is different. For some data in some fields of application, 54% might be fantastic, while in other fields of application 54% might be poor.

--
Paige Miller
Opal | Level 21

## Re: Principal Component Analysis - Optimal number of retained factors

Obsidian | Level 7

## Re: Principal Component Analysis - Optimal number of retained factors

Hey,

an additional criteria would be the parallel analysis by Horn (1965; https://link.springer.com/article/10.1007%2FBF02289447).

Bye,

Daniel

Discussion stats
• 3 replies
• 1541 views
• 0 likes
• 4 in conversation