05-05-2016 10:16 AM - edited 05-05-2016 10:43 AM
I tried to run cluster analysis using the following code,but in the work.tree data, some of the ID (DUPI) were replaced with blanks.
proc aceclus data=cluster.data_cluster_with_trajactory out=Ace p=.03 noprint; var betting_days count_game_types mean_wager sites_wagered sum_wager total_bet_times total_times_over_days ; run; ods graphics on; proc cluster data=Ace method=ward ccc pseudo print=15 out=tree plots=den(height=rsq); id DUPI; var can1-can7; run; ods graphics off;
I also got the warnings as below:
WARNING: Ties for minimum distance between clusters have been detected at 37734 level(s) in the cluster history.
WARNING: The MAXPOINTS option value 200 is less than the number of clusters (44887). This may result in a dendrogram that
is difficult to read. The dendrogram will not be displayed. You can use the PLOTS(MAXPOINTS=) option in the PROC
CLUSTER statement to change this maximum.
NOTE: The data set WORK.TREE has 89773 observations and 21 variables.
any hints and suggestions? Thank you.
05-05-2016 10:46 AM
All the variables are heavily positively skewed. I will do log transform and try again. Maybe this can make somem difference
Maybe I should use fastclus, which is for k-means clustering, and Cluster is for hierarchical clustering. Am I correct?