You could do the clustering as described by Steve Denham, on the premium values. I'm skeptical that this is a good approach however, I tend to believe that any form of automatic grouping of continuous variables is a poor approach that throws away the continuous information contained in the data. Furthermore, it sounds like you want to do the grouping without taking into account the relationship between premium size and number of cancellations, which may or may not be a good idea, but sounds to me like a bad idea. I'm guesing that you want to determine a relationship between the percentage of people who cancel and the premium size. If that is the case, then logistic regression seems like a much better idea, that does not rely on grouping the data and throwing away the continuous nature of the data, and explicitly models the relationship between premium size and percentage of people who cancel.
... View more