Programming the statistical procedures from SAS

Combine cluster analysis with proc GENMOD

Reply
Contributor
Posts: 25

Combine cluster analysis with proc GENMOD

Bonjour tout le monde/Good afternoon everyone!

I've tried to use cluster analysis to combine small groups of similar risks (same caracteristics) to allow easier incorporation into GLMs (proc GENMOD here).

I've met some difficulties to make the link between step 1 and step 2. I have a traditionnal insurance table.

Question 1 (step 1): May I add the dependant variable  freq_adj as SUPPLEMENTARY VARIABLE?

Question 2 (step 2): PROC GENMOD I obtain my four clusters but how applied to proc GENMOD? Where can I integrate my clusters?

**STEP 1: MIX CLUSTERING ANALYSIS

PROC FASTCLUS DATA=ins.insurance MAXC=20 MAXITER=50 CONVERGE=0.01 MEAN=centres OUT=partial CLUSTER=cluster DELETE=5 DRIFT;

VAR ageconducteur region ;

RUN;

PROC CLUSTER DATA=centres OUTTREE=tree METHOD=ward CCC PSEUDO PRINT=10;

VAR ageconducteur region;

COPY cluster;

RUN;

PROC SORT DATA=tree;

BY _ncl_;

RUN;

PROC TREE DATA=tree NCL=4 OUT=segm1 ;

COPY presegm ;

RUN ;

PROC SORT DATA=partial; BY cluster; RUN;

PROC SORT DATA=segm1; BY cluster; RUN;

DATA segm;

MERGE partial segm1;

BY cluster;

RUN;

**STEP 2: PROC GENMOD

PROC GENMOD DATA = ???; ODS OUTPUT ParameterEstimates=Genmod1_Param ;

class ageconducteur ;

weight exposition; 

MODEL freq_adj = ageconducteur region /  maxiter=2000  dist = poisson link = log;

format ageconducteur forage.;   output out=poisson; RUN; QUIT;

Thanks for your help.

Ce message a été modifié par : CHARBIT Jonathan

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

Hi,

What is the source of dataset segm1?

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

An oversight on my behalf Smiley Happy thanks. I modified

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

What is freq_adj? Is that frequency variable based on 4 clusters?

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

freq_adj is my dependant variable (number of claims). This variable didn't integrated in cluster analysis because I don't manage to make the link between cluster analysis and proc genmod,gam...

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

Please correct me if I am wrong

freq_adj is included in the ins.insurance data and you just used predictors to run cluster analysis and ended up with four clusters solution right?

Now you want to run model for freq_adj using dataset that has 4 clusters right?

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

I'am lost :smileysilly:

yes it is, freq_adj is included in the ins.insurance data.

"you just used predictors to run cluster analysis". Must I run a regression model before cluster analysis?

Yes I want to run model for freq_adj (number of claims) using dataset that has 4 clusters right thanks to cluster analysis.

Thanks for your time.

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

So you are trying to run 4 models for 4 clusters after merging freq_adj variable to cluster dataset with the objective to produce better results within each cluster right?

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

Yes that was one of my ideas combining groups of similar risks and use proc genmod for each cluster to extract predictors.

I don't know if in the area of insurance (or another) is an acceptable method and how incorporate in GLMS?

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

Idea looks right but clustering can produce better predictions as compared to overall model if freq_adj is significantly different across 4 clusters.

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

Ok I'm going to develop that idea.

According to your experiency, what is the best method to check if there is a significant heterogeineity across 4 clusters? How can I compare an overall model (a single GLM) and 4 GLMS?

Have a good day.

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

Proc ANOVA can be used to check differences among 4 clusters. To learn more, why did you use first PROC FASTCLUS then PROC CLUSTER for cluster solution and why creating 4 clusters only?

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

Because I have a big data (many clients) so I began with PROC FASTCLUS then I took back mean (mean=CENTRES) to run PROC CLUSTER. That is a MIX CAH method.

4 clusters because in my PROC CLUSTER I interpreted the CCC,semi-partial R sqared...indicators and what the dendogram showed.

I will run PROC ANOVA to see if there is a significant difference between clusters.

So if the stat test is no significant, my predictors will be less acurate than overall model.

Step 1: MIX CAH

Step 2: PROC ANOVA

Step 3: GLMS for each cluster if anova release a significant difference between clusters.

Right?


I think the number of claims or claim costs can be very volatile between clusters depending on the guarantee. I'm gonna to see.

Thanks.

Trusted Advisor
Posts: 1,195

Re: Combine cluster analysis with proc GENMOD

Seems like a right approach. What is step3? Why are you using GLM?

Contributor
Posts: 25

Re: Combine cluster analysis with proc GENMOD

GLM to estimate the pure prenium (frequency of claims*claim costs)...I've began that since I red the following text:

I've found this text on casact.org (Casualty Actuarial Society):

"Cluster analysis applies a collection of different algorithms to group these units into clusters based on historical

experience, modeled experience, or well-defined similarity rules. This allows easier incorporation into

GLMs."

it is essential to take into account the heterogeneity in pricing yet.....I don't understand their reasoning.

GLM to estimate the pure prenium (frequency of claims*claim costs).

1) Classical method: (This method is without STEP 1)

The average claim frequency for customers in Area A1 and in the ageGroup 20-29 is then:

0,044 * 0,689 * 0,472 = 0,014

intercept=0.044

In the same way we calculate the average claim size for this group to be

61037 * 1,873 * 0,789 = 90211

The pure premium for this group is then 0,014*90211=1263.

(SAS souce)=http://www2.sas.com/proceedings/forum2008/333-2008.pdf

Do you understand my questions?

Thanks for your help.

Ask a Question
Discussion stats
  • 16 replies
  • 551 views
  • 6 likes
  • 2 in conversation