SAS Procedures

Report Inappropriate Content · Posted 07-01-2019 11:04 AM

Hi,

I am trying to cluster a monotonically increasing variable into buckets. As an example, take logsalary from the sashelp baseball dataset

DATA baseball;
	SET sashelp.baseball;
	WHERE NOT MISSING(logsalary);
	KEEP logSalary;
RUN;

Sort the data from smallest to largest

proc sort
	data = baseball
        out = FULL;
	by logsalary;
RUN;

Now, use proc cluster to create optimal bins for logsalary

PROC CLUSTER 
	DATA = FULL
	OUTTREE = cluster_logsalary
	METHOD = ward
	CCC PSEUDO PRINT = 25; 
	VAR logsalary; 
RUN;

My issue is that the output generated creates bins; however, within these bins, logsalary is no longer monotonically increasing which I am making an essential criteria for my binning - Is there a way to bin logsalary with minimised variance but keep the monotonic characteristics?

PaigeMiller · Posted 07-01-2019 11:13 AM

It's not clear to me why you think this clustering of a single variable is not monotonic, what is the evidence in the output of PROC CLUSTER that says it is not monotonic? What even is the definition of monotonic here?

In general, clustering is used on multiple X variables, and there is no such criteria as monotonic that applies to multiple X variables in the clustering algorithms. Perhaps PROC HPBIN will give you what you want, as far as I know it should, since it operates on variables individually, and if you have values 10 20 30 40 50 60, it should not bin 10 and 60 together (if that's what you mean by monotonic).

--
Paige Miller

Report Inappropriate Content · Posted 07-01-2019 11:31 AM

I need the bins for logsalary to remain monotonically increasing i.e. the median value of each bin increases.

PROC HPBIN does not work well with poorly distributed data and so, you get equally spaced bins with 1 bin containing 99% of the data

PaigeMiller · Posted 07-01-2019 11:34 AM

@782822 wrote:

I need the bins for logsalary to remain monotonically increasing i.e. the median value of each bin increases.

I still have no idea what you see in the PROC CLUSTER output that indicates things are not monotonic, and so without further explanation, I consider this statement to be questionable and in my opinion, incorrect.

PROC HPBIN does not work well with poorly distributed data and so, you get equally spaced bins with 1 bin containing 99% of the data

I think that the QUANTILE method in PROC HPBIN avoids this problem.

--
Paige Miller

Report Inappropriate Content · Posted 07-01-2019 11:41 AM

I have also used quantile in the past but would like an alternative to having equal number of obs per bucket

PaigeMiller · Posted 07-01-2019 01:14 PM

So, honestly, I can't help because I still don't know what you mean by "monotonic" in this context, and I also don't know what criteria you do want (I know you don't want QUANTILE and you don't want BUCKET, which is not the same as stating what you do want).

--
Paige Miller

PGStats · Posted 07-01-2019 05:47 PM

Seems like you are looking for a way to estimate cutpoints on logsalary that would define homogeneous subsets. If the distribution of logsalary shows multiple modes, you might want to look at proc fmm to estimate the subcomponents of the logsalary distribution.

PG

SAS Procedures

SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

Re: SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

Re: SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

Re: SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

Re: SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

Re: SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

Re: SAS Proc cluster or Fastclus with a monotonically increasing continuous variable

[SAS 활용 노하우] MONOTONIC function in PROC SQL

[SAS 활용 노하우] Cluster Analysis

How can I decide the best number of clusters within 'Proc Fastclus'?

[SAS 활용 노하우] Clustering Part1

Optimal monotonic binning

Follow Us

What is...

SAS Procedures

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...