BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

I am trying to do dimension reduction using Principal Component Analysis.  The dataset have 25 variables and 300K obs. The data is for segmentation using 2-stage clustering (K-means clustering then Linkage clustering)

 

 

What's the good practices for deciding the number retained factors. Is criteria #1 good enough?

 

Criteria #1:  eigenvalue>1 

->  5 Factors with 54% variation explained.  Is the variation explained too low?  Should i use eigenvalue>0.7 and 

 

Criteria #2:  eigenvalue>0.7 and Variation explained > 0.7

-> 10 Factors with 78% Variation explained 

  

ods graphics on;

proc factor data=myData preplot plots=(scree initloadings preloadings loadings) method=principal rotate=varimax 
scree score;
var _numeric_

run;
ods graphics off;

  

PC.png

 

 
#EigenvalueDifferenceProportionCumulative
14.591758843.049855820.25510.2551
21.541903020.126467140.08570.3408
31.415435880.236479270.07860.4194
41.178956610.095212030.06550.4849
51.083744580.160977690.06020.5451
60.922766890.045952090.05130.5964
70.87681480.005229940.04870.6451
80.871584850.060066230.04840.6935
90.811518620.053307990.04510.7386
100.758210630.069290760.04210.7807
110.688919870.057418970.03830.819
120.63150090.022477740.03510.8541
130.609023160.018142380.03380.8879
140.590880790.038228650.03280.9207
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Honestly, I think the answer is totally subjective here. I don't believe that there is a universally accepted answer. The scree plot might indicate 7 factors.

 

However, I would say that if you (for example) choose the 5 factor solution, but find that factor 6 has a clear interpretation that makes sense in your application, that's a (again subjective) reason to include factor 6.

 

As far as the question about is 54% of the explained variability enough ... again there is no universal answer here, especially since every situation is different. For some data in some fields of application, 54% might be fantastic, while in other fields of application 54% might be poor.

 

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

Honestly, I think the answer is totally subjective here. I don't believe that there is a universally accepted answer. The scree plot might indicate 7 factors.

 

However, I would say that if you (for example) choose the 5 factor solution, but find that factor 6 has a clear interpretation that makes sense in your application, that's a (again subjective) reason to include factor 6.

 

As far as the question about is 54% of the explained variability enough ... again there is no universal answer here, especially since every situation is different. For some data in some fields of application, 54% might be fantastic, while in other fields of application 54% might be poor.

 

--
Paige Miller
Daniel_Paul
Obsidian | Level 7

Hey,

 

an additional criteria would be the parallel analysis by Horn (1965; https://link.springer.com/article/10.1007%2FBF02289447). 

 

Bye, 

Daniel 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1544 views
  • 0 likes
  • 4 in conversation