Hi,
I have no experiance in clustering, thus I would be grateful If anyone could help me to choose optimal method.
I am going to group about 1,5 mln customers by one variable (I ve got more but all of them are highly corellated), aboute 50 % of observation have value 0 in clustering variable.
I am using fastclust procedure:
proc fastclus data=wyn2 least=1 maxc=4;
var zasilenia_za_ost_3m;
run;
What I received is 4 groups with
1499995 in the first group
2 in the second cluster
1 in the third cluster
2 in the fourth cluster.
It`s working that way as well even if I remove observation with 0 in clust var.
Thus, my question is which method/procedure would be the best in this case?
Thank you.