I tried to run cluster analysis using the following code,but in the work.tree data, some of the ID (DUPI) were replaced with blanks.
proc aceclus data=cluster.data_cluster_with_trajactory out=Ace p=.03 noprint;
var betting_days count_game_types mean_wager sites_wagered sum_wager total_bet_times total_times_over_days ;
run;
ods graphics on;
proc cluster data=Ace method=ward ccc pseudo print=15 out=tree
plots=den(height=rsq);
id DUPI;
var can1-can7;
run;
ods graphics off;
I also got the warnings as below:
WARNING: Ties for minimum distance between clusters have been detected at 37734 level(s) in the cluster history.
WARNING: The MAXPOINTS option value 200 is less than the number of clusters (44887). This may result in a dendrogram that
is difficult to read. The dendrogram will not be displayed. You can use the PLOTS(MAXPOINTS=) option in the PROC
CLUSTER statement to change this maximum.
NOTE: The data set WORK.TREE has 89773 observations and 21 variables.
any hints and suggestions? Thank you.
As most of the distances are equal to 0 resulting 37K levels in cluster history table. Seems like its hard to accommodate this number for tree.
All the variables are heavily positively skewed. I will do log transform and try again. Maybe this can make somem difference
Maybe I should use fastclus, which is for k-means clustering, and Cluster is for hierarchical clustering. Am I correct?
Transformation may not solve the problem. A quick check for collinearity may be helpful to avoid including correlated variables in the analysis.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.