Hi, I am trying to cluster a monotonically increasing variable into buckets. As an example, take logsalary from the sashelp baseball dataset DATA baseball;
SET sashelp.baseball;
WHERE NOT MISSING(logsalary);
KEEP logSalary;
RUN; Sort the data from smallest to largest proc sort
data = baseball
out = FULL;
by logsalary;
RUN; Now, use proc cluster to create optimal bins for logsalary PROC CLUSTER
DATA = FULL
OUTTREE = cluster_logsalary
METHOD = ward
CCC PSEUDO PRINT = 25;
VAR logsalary;
RUN; My issue is that the output generated creates bins; however, within these bins, logsalary is no longer monotonically increasing which I am making an essential criteria for my binning - Is there a way to bin logsalary with minimised variance but keep the monotonic characteristics?
... View more