BookmarkSubscribeRSS Feed
fengyuwuzu
Pyrite | Level 9

 

I tried to run cluster analysis using the following code,but in the work.tree data, some of the ID (DUPI) were replaced with blanks.

 

proc aceclus data=cluster.data_cluster_with_trajactory out=Ace p=.03 noprint;
	var  betting_days count_game_types mean_wager sites_wagered sum_wager total_bet_times total_times_over_days ;
run;

ods graphics on;

proc cluster data=Ace method=ward ccc pseudo print=15 out=tree
   plots=den(height=rsq);
   id DUPI;
   var can1-can7;
run;

ods graphics off;

Capture.PNG

 

 

 

I also got the warnings as below:

 

WARNING: Ties for minimum distance between clusters have been detected at 37734 level(s) in the cluster history.
WARNING: The MAXPOINTS option value 200 is less than the number of clusters (44887). This may result in a dendrogram that
         is difficult to read. The dendrogram will not be displayed. You can use the PLOTS(MAXPOINTS=) option in the PROC
         CLUSTER statement to change this maximum.
NOTE: The data set WORK.TREE has 89773 observations and 21 variables.

 

any hints and suggestions? Thank you.

4 REPLIES 4
stat_sas
Ammonite | Level 13

As most of the distances are equal to 0 resulting 37K levels in cluster history table. Seems like its hard to accommodate this number for tree.

fengyuwuzu
Pyrite | Level 9

All the variables are heavily positively skewed. I will do log transform and try again. Maybe this can make somem difference

 

Maybe I should use fastclus, which is for k-means clustering, and Cluster is for hierarchical clustering. Am I correct?

stat_sas
Ammonite | Level 13

Transformation may not solve the problem. A quick check for collinearity may be helpful to avoid including correlated variables in the analysis.

fengyuwuzu
Pyrite | Level 9
Thank you. Indeed they are highly correlated.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1872 views
  • 0 likes
  • 2 in conversation