10-07-2013 12:48 PM
Check out the section of the SAS/STAT documentation titled: "Introduction to Clustering Procedures". There is a wealth of information on methods for clustering cases based on "similarity" across the variables at hand, and on methods for clustering varibales across the cases at hand.
10-15-2013 01:37 PM
Thank you very much. I read the section but i think I didnt explained my self well.
I am trying to predict cancellation rate of policies by premium size. The Clustering Procedures helped to create Homogeneous groups of premium size but not by cancellation .
for example - 30 pepole canceled their policies and 20 of them payed premium between 0-100, 5 between 100-1050 and 5 payed over 10000 premium.
I would like that SAS will help me create this 3 groups .
is their an automatic Procedure to do that?
thank you very much.
10-15-2013 02:12 PM
So, now I'm confused about what you want.
It seems you already have the groupings, you just explained what they were, so what exactly is your question?
10-15-2013 02:58 PM
You could do the clustering as described by Steve Denham, on the premium values.
I'm skeptical that this is a good approach however, I tend to believe that any form of automatic grouping of continuous variables is a poor approach that throws away the continuous information contained in the data. Furthermore, it sounds like you want to do the grouping without taking into account the relationship between premium size and number of cancellations, which may or may not be a good idea, but sounds to me like a bad idea.
I'm guesing that you want to determine a relationship between the percentage of people who cancel and the premium size. If that is the case, then logistic regression seems like a much better idea, that does not rely on grouping the data and throwing away the continuous nature of the data, and explicitly models the relationship between premium size and percentage of people who cancel.