BookmarkSubscribeRSS Feed
alexandraIFCT
Calcite | Level 5

Good morning,
I am looking to program the elbow method in order to know how many clusters to select to dichotomize my quantitative variable, could anyone help me?
Thanks in advance,
Sincerely,

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I'm not aware of any programming of the elbow method in SAS. But maybe others know how it can be done.

 

However, here are discussions about determining the number of clusters, both of which indicate that there is no universally agreed upon method.

https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-does-clustering-node-have-elbow-method-to-sel...

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_introclus_sect010.htm

 

There is also the very simple idea of treating continuous variables as continuous variables, instead of categories, in whatever analysis you want to do, which is easier to do than creating clusters.

--
Paige Miller
ballardw
Super User

<Pedantic mode: ON>

 

Dichotomous means two. So you have already decided there will be two clusters if you "dichotomize" anything.

 

<Pendantic mode: OFF>

 

So the question would be where the breakpoint  should be. I would imagine Proc Freq might give an idea if there is anything really worth treating as a "cluster"

 


@alexandraIFCT wrote:

Good morning,
I am looking to program the elbow method in order to know how many clusters to select to dichotomize my quantitative variable, could anyone help me?
Thanks in advance,
Sincerely,


 

 

alexandraIFCT
Calcite | Level 5

Excuse me I used the wrong term, I don't necessarily want to make 2 groups, I wanted to use the proc fastclus to determine clusters but you have to put a number of clusters you want and that's where I don't know how to choose.

PaigeMiller
Diamond | Level 26

People sometimes present a very narrow view of the problem ... "how do I determine the number of clusters?" I encourage you to present a wider view of the problem: "how do I determine the number of clusters if I want to perform analyses such as _____________ and ______________ on the clusters for data coming from the field of __________ "?


Context makes a difference. Depending on what you are doing, I could see different answers.

--
Paige Miller
alexandraIFCT
Calcite | Level 5

I have a biological marker on which I would like to carry out a prognostic analysis of survival, this marker is a continuous variable but medical interpretation is difficult on a continuous variable, hence my desire to make groups.

PaigeMiller
Diamond | Level 26

Thank you, that's very helpful to me. Some thoughts

 

  • Sounds like your prognostic analysis of survival is a kind of prediction in a model of some sort, you could use a decision tree model to create buckets of the biological marker that are predictive, and obtain predictions. Doing the creation of buckets without regard to predictive ability (as it seems you were suggesting in your original message) sometimes leaves you with buckets make sense from one point of view but are not as predictive as you might get with a decision tree or similar model.
  • Despite the fact that "medical interpretation is difficult on a continuous variable" sometimes the best predictions come from models which treat the biological marker as continuous, rather than forcing arbitrary buckets onto this continuous variable. But I don't work in biological sciences or medicine, and so you have to do what will work in that field.
  • In my field, which has nothing to do with biological sciences or medicine, we sometimes simply use buckets that are meaningful to the people in the field, rather than have a statistical tool create buckets that have little meaning. Example: (I work in banking) people are happy with the pre-defined buckets for FICO of 700-719 and 720-739 and similar. I could use a statistical method that comes up with buckets like 683-717, but I doubt that would be acceptable or accepted.

Again, I don't work in your field and don't now what the norms are for this type of analysis, but I like the first choice above best, unless I felt I could sell people on the second choice, in which case I would do that (especiallly if the model predicted better using a continuous rather the discrete variable).

--
Paige Miller

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1507 views
  • 1 like
  • 3 in conversation