Hi, My boss has asked me to show the distribution of a variable (square footage of 17,000 public library outlets/branches), breaking it into 5 categories using natural breaks. Putting aside arguments of the "best" number of categories or the problems with categorizing a continuous variable... I've used proc rank for creating deciles, quantiles, etc., but the only place I've seen that uses "natural breaks" (outside of distance sporting events) is chloropleth maps. I've Googled for a solution. The closest I got, other than a bunch of articles on analyses using the Jenks optimization in ArcGIS, was a post to the UGA SAS-Listserv in 2000. I've seen a few possible links to using proc cluster, but that always seems to be when doing a multivariate analysis. My question: Does anyone know whether SAS has a procedure that does this? I could probably write up code to do the iterations, but if there is a ready-made proc, there's no reason to do that. It also looks like the head/tail breaks approach is better (would be more appropriate for this data, since it has a long tail) and would be more straight-forward if I had to write something myself. My current strategy is to use a decile approach and look at the distributions within deciles to determine reasonable break points. If anyone has any suggestions, macros, or knowledge about how to do this in SAS without reinventing the wheel, I'd appreciate it. I haven't attached data since it is a simple univariate descriptive statistical analysis - the only "difficult" part is determining the break points. Thanks, Deanne
... View more