Programming the statistical procedures from SAS

automatic grouping

Reply
New Contributor
Posts: 3

automatic grouping

Hello,

does anyone knows how can i do automatic grouping in sas? (create Homogeneous groups)

Thank you

Inbal

Respected Advisor
Posts: 2,655

Re: automatic grouping

Check out the section of the SAS/STAT documentation titled: "Introduction to Clustering Procedures".  There is a wealth of information on methods for clustering cases based on "similarity" across the variables at hand, and on methods for clustering varibales across the cases at hand.

Steve Denham

New Contributor
Posts: 3

Re: automatic grouping

Thank you very much. I read the section but i think I didnt explained my self well.

I am trying to predict cancellation rate of policies by premium size. The Clustering Procedures helped to create Homogeneous groups of premium size but not by cancellation .

for example -  30 pepole canceled their policies and 20 of them payed premium between 0-100, 5 between 100-1050 and 5 payed over 10000 premium.

I would like that SAS will help me create this 3 groups .

is their an automatic Procedure to do that?

thank you very much.

Trusted Advisor
Posts: 1,432

Re: automatic grouping

So, now I'm confused about what you want.

It seems you already have the groupings, you just explained what they were, so what exactly is your question?

New Contributor
Posts: 3

Re: automatic grouping

i want sas to recognize this groups .i gave example but i have big data with continuous values of premium .

Trusted Advisor
Posts: 1,432

Re: automatic grouping

You could do the clustering as described by Steve Denham, on the premium values.

I'm skeptical that this is a good approach however, I tend to believe that any form of automatic grouping of continuous variables is a poor approach that throws away the continuous information contained in the data. Furthermore, it sounds like you want to do the grouping without taking into account the relationship between premium size and number of cancellations, which may or may not be a good idea, but sounds to me like a bad idea.

I'm guesing that you want to determine a relationship between the percentage of people who cancel and the premium size. If that is the case, then logistic regression seems like a much better idea, that does not rely on grouping the data and throwing away the continuous nature of the data, and explicitly models the relationship between premium size and percentage of people who cancel.

Ask a Question
Discussion stats
  • 5 replies
  • 297 views
  • 0 likes
  • 3 in conversation