BookmarkSubscribeRSS Feed
inbal
Calcite | Level 5

Hello,

does anyone knows how can i do automatic grouping in sas? (create Homogeneous groups)

Thank you

Inbal

5 REPLIES 5
SteveDenham
Jade | Level 19

Check out the section of the SAS/STAT documentation titled: "Introduction to Clustering Procedures".  There is a wealth of information on methods for clustering cases based on "similarity" across the variables at hand, and on methods for clustering varibales across the cases at hand.

Steve Denham

inbal
Calcite | Level 5

Thank you very much. I read the section but i think I didnt explained my self well.

I am trying to predict cancellation rate of policies by premium size. The Clustering Procedures helped to create Homogeneous groups of premium size but not by cancellation .

for example -  30 pepole canceled their policies and 20 of them payed premium between 0-100, 5 between 100-1050 and 5 payed over 10000 premium.

I would like that SAS will help me create this 3 groups .

is their an automatic Procedure to do that?

thank you very much.

PaigeMiller
Diamond | Level 26

So, now I'm confused about what you want.

It seems you already have the groupings, you just explained what they were, so what exactly is your question?

--
Paige Miller
inbal
Calcite | Level 5

i want sas to recognize this groups .i gave example but i have big data with continuous values of premium .

PaigeMiller
Diamond | Level 26

You could do the clustering as described by Steve Denham, on the premium values.

I'm skeptical that this is a good approach however, I tend to believe that any form of automatic grouping of continuous variables is a poor approach that throws away the continuous information contained in the data. Furthermore, it sounds like you want to do the grouping without taking into account the relationship between premium size and number of cancellations, which may or may not be a good idea, but sounds to me like a bad idea.

I'm guesing that you want to determine a relationship between the percentage of people who cancel and the premium size. If that is the case, then logistic regression seems like a much better idea, that does not rely on grouping the data and throwing away the continuous nature of the data, and explicitly models the relationship between premium size and percentage of people who cancel.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1670 views
  • 0 likes
  • 3 in conversation