BookmarkSubscribeRSS Feed
mjbstats
Calcite | Level 5
Hello,

I am attempting to identify optimal clustering of a relatively small data set (n=900 observations) using only one dependent variable (which corresponds to a categorical variable, zip code).

My interpretation of the SAS documentation is as follows:

1) I want disjoint clusters, in that I want groupings of the zip codes based on their similarity with respect to the dependent variable, and I do not want any zip code assigned to more than one cluster.

2) A number of procedures might work, but the simplicity of my application seems to indicate that FASTCLUS or CLUSTER are good starting procs.

3) My criteria for choosing "optimal" clustering is good differentiation between the clusters based on the dependent variable--which means I want to minimize the within-cluster variance and maximize the between-cluster variance.

4) I find a lot of the advanced Clustering Analysis discussion confusing (e.g., the role of nonparametric probability density estimates in various methods).

My clusters tend to be poorly separated. Some observations are clearly apart from others (and can be clustered as such), but the rest of the data is somewhat uniformly distributed across the range of values. Still, even for the uniformly distributed data, we'd like to break the observations into reasonable groups based on where they fall within the range of values. We're shooting for 15-20 or so clusters.

Can anyone provide some guidance as to appropriate procedures and smoothing parameters for my application?

Many thanks.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 1150 views
  • 0 likes
  • 1 in conversation