SAS Programming

alexandraIFCT · Posted 11-08-2023 09:03 AM

Good morning,
I am looking to program the elbow method in order to know how many clusters to select to dichotomize my quantitative variable, could anyone help me?
Thanks in advance,
Sincerely,

PaigeMiller · Posted 11-08-2023 09:09 AM

I'm not aware of any programming of the elbow method in SAS. But maybe others know how it can be done.

However, here are discussions about determining the number of clusters, both of which indicate that there is no universally agreed upon method.

https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-does-clustering-node-have-elbow-method-to-sel...

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_introclus_sect010.htm

There is also the very simple idea of treating continuous variables as continuous variables, instead of categories, in whatever analysis you want to do, which is easier to do than creating clusters.

--
Paige Miller

ballardw · Posted 11-08-2023 10:55 AM

Dichotomous means two. So you have already decided there will be two clusters if you "dichotomize" anything.

So the question would be where the breakpoint should be. I would imagine Proc Freq might give an idea if there is anything really worth treating as a "cluster"

@alexandraIFCT wrote:

Good morning,
I am looking to program the elbow method in order to know how many clusters to select to dichotomize my quantitative variable, could anyone help me?
Thanks in advance,
Sincerely,

alexandraIFCT · Posted 11-08-2023 11:05 AM

Excuse me I used the wrong term, I don't necessarily want to make 2 groups, I wanted to use the proc fastclus to determine clusters but you have to put a number of clusters you want and that's where I don't know how to choose.

PaigeMiller · Posted 11-08-2023 11:12 AM

People sometimes present a very narrow view of the problem ... "how do I determine the number of clusters?" I encourage you to present a wider view of the problem: "how do I determine the number of clusters if I want to perform analyses such as _____________ and ______________ on the clusters for data coming from the field of __________ "?

Context makes a difference. Depending on what you are doing, I could see different answers.

--
Paige Miller

alexandraIFCT · Posted 11-08-2023 11:16 AM

I have a biological marker on which I would like to carry out a prognostic analysis of survival, this marker is a continuous variable but medical interpretation is difficult on a continuous variable, hence my desire to make groups.

PaigeMiller · Posted 11-08-2023 12:36 PM

Thank you, that's very helpful to me. Some thoughts

Sounds like your prognostic analysis of survival is a kind of prediction in a model of some sort, you could use a decision tree model to create buckets of the biological marker that are predictive, and obtain predictions. Doing the creation of buckets without regard to predictive ability (as it seems you were suggesting in your original message) sometimes leaves you with buckets make sense from one point of view but are not as predictive as you might get with a decision tree or similar model.
Despite the fact that "medical interpretation is difficult on a continuous variable" sometimes the best predictions come from models which treat the biological marker as continuous, rather than forcing arbitrary buckets onto this continuous variable. But I don't work in biological sciences or medicine, and so you have to do what will work in that field.
In my field, which has nothing to do with biological sciences or medicine, we sometimes simply use buckets that are meaningful to the people in the field, rather than have a statistical tool create buckets that have little meaning. Example: (I work in banking) people are happy with the pre-defined buckets for FICO of 700-719 and 720-739 and similar. I could use a statistical method that comes up with buckets like 683-717, but I doubt that would be acceptable or accepted.

Again, I don't work in your field and don't now what the norms are for this type of analysis, but I like the first choice above best, unless I felt I could sell people on the second choice, in which case I would do that (especiallly if the model predicted better using a continuous rather the discrete variable).

--
Paige Miller

SAS Programming

Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Follow Us

What is...

SAS Programming

Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Re: Program elbow method

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...