- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good morning,
I am looking to program the elbow method in order to know how many clusters to select to dichotomize my quantitative variable, could anyone help me?
Thanks in advance,
Sincerely,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not aware of any programming of the elbow method in SAS. But maybe others know how it can be done.
However, here are discussions about determining the number of clusters, both of which indicate that there is no universally agreed upon method.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_introclus_sect010.htm
There is also the very simple idea of treating continuous variables as continuous variables, instead of categories, in whatever analysis you want to do, which is easier to do than creating clusters.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
<Pedantic mode: ON>
Dichotomous means two. So you have already decided there will be two clusters if you "dichotomize" anything.
<Pendantic mode: OFF>
So the question would be where the breakpoint should be. I would imagine Proc Freq might give an idea if there is anything really worth treating as a "cluster"
@alexandraIFCT wrote:
Good morning,
I am looking to program the elbow method in order to know how many clusters to select to dichotomize my quantitative variable, could anyone help me?
Thanks in advance,
Sincerely,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Excuse me I used the wrong term, I don't necessarily want to make 2 groups, I wanted to use the proc fastclus to determine clusters but you have to put a number of clusters you want and that's where I don't know how to choose.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
People sometimes present a very narrow view of the problem ... "how do I determine the number of clusters?" I encourage you to present a wider view of the problem: "how do I determine the number of clusters if I want to perform analyses such as _____________ and ______________ on the clusters for data coming from the field of __________ "?
Context makes a difference. Depending on what you are doing, I could see different answers.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a biological marker on which I would like to carry out a prognostic analysis of survival, this marker is a continuous variable but medical interpretation is difficult on a continuous variable, hence my desire to make groups.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, that's very helpful to me. Some thoughts
- Sounds like your prognostic analysis of survival is a kind of prediction in a model of some sort, you could use a decision tree model to create buckets of the biological marker that are predictive, and obtain predictions. Doing the creation of buckets without regard to predictive ability (as it seems you were suggesting in your original message) sometimes leaves you with buckets make sense from one point of view but are not as predictive as you might get with a decision tree or similar model.
- Despite the fact that "medical interpretation is difficult on a continuous variable" sometimes the best predictions come from models which treat the biological marker as continuous, rather than forcing arbitrary buckets onto this continuous variable. But I don't work in biological sciences or medicine, and so you have to do what will work in that field.
- In my field, which has nothing to do with biological sciences or medicine, we sometimes simply use buckets that are meaningful to the people in the field, rather than have a statistical tool create buckets that have little meaning. Example: (I work in banking) people are happy with the pre-defined buckets for FICO of 700-719 and 720-739 and similar. I could use a statistical method that comes up with buckets like 683-717, but I doubt that would be acceptable or accepted.
Again, I don't work in your field and don't now what the norms are for this type of analysis, but I like the first choice above best, unless I felt I could sell people on the second choice, in which case I would do that (especiallly if the model predicted better using a continuous rather the discrete variable).
Paige Miller