BookmarkSubscribeRSS Feed
fengyuwuzu
Pyrite | Level 9

 

I tried to run cluster analysis using the following code,but in the work.tree data, some of the ID (DUPI) were replaced with blanks.

 

proc aceclus data=cluster.data_cluster_with_trajactory out=Ace p=.03 noprint;
	var  betting_days count_game_types mean_wager sites_wagered sum_wager total_bet_times total_times_over_days ;
run;

ods graphics on;

proc cluster data=Ace method=ward ccc pseudo print=15 out=tree
   plots=den(height=rsq);
   id DUPI;
   var can1-can7;
run;

ods graphics off;

Capture.PNG

 

 

 

I also got the warnings as below:

 

WARNING: Ties for minimum distance between clusters have been detected at 37734 level(s) in the cluster history.
WARNING: The MAXPOINTS option value 200 is less than the number of clusters (44887). This may result in a dendrogram that
         is difficult to read. The dendrogram will not be displayed. You can use the PLOTS(MAXPOINTS=) option in the PROC
         CLUSTER statement to change this maximum.
NOTE: The data set WORK.TREE has 89773 observations and 21 variables.

 

any hints and suggestions? Thank you.

4 REPLIES 4
stat_sas
Ammonite | Level 13

As most of the distances are equal to 0 resulting 37K levels in cluster history table. Seems like its hard to accommodate this number for tree.

fengyuwuzu
Pyrite | Level 9

All the variables are heavily positively skewed. I will do log transform and try again. Maybe this can make somem difference

 

Maybe I should use fastclus, which is for k-means clustering, and Cluster is for hierarchical clustering. Am I correct?

stat_sas
Ammonite | Level 13

Transformation may not solve the problem. A quick check for collinearity may be helpful to avoid including correlated variables in the analysis.

fengyuwuzu
Pyrite | Level 9
Thank you. Indeed they are highly correlated.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1896 views
  • 0 likes
  • 2 in conversation