Solved: Re: Simulating claims data using Poisson and Lognormal distributions

brophymj · Posted 09-08-2018 11:50 AM

Hi,

I am trying to simulate claim amounts and claim numbers over 10,000 simulations. I can figure out how to determine the number of claims but I want to then use that information to simulate the amount of each claim.

The code below simulates the number of claims for 10,000 observations. However, I want to go a step further and to simulate claim amounts for each similation. For example, if the Poisson generates 212 for the first observation I want to produce 212 claim amounts with a lognormal distribution across the columns? My lognormal parameters are mu = 9.1 and shape = 1.5.m

data poisson(keep = x);
call streaminit (4321);
lambda = 212;
do i = 1 to 10000;
x = rand("Poisson",lambda);
output;
end;
run;

The resulting dataset would be a 10000 rows by approximately 300 colums.

I would also like to impose caps on the severity but I think I can do this.

Thanks

PGStats · Posted 09-08-2018 03:35 PM

Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:

data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;

PG

View solution in original post

PGStats · Posted 09-08-2018 02:24 PM

A long data format will be a lot simpler to use for almost any data manipulation or analysis, start with this:

data poisson(keep = simNo claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

PG

brophymj · Posted 09-08-2018 03:03 PM

Many thanks for this PG, is there any easy way to have the simulations as rows and the claims for each simulation as columns.

so the final dataset would be 10,000 by 250(or thereabouts).

PGStats · Posted 09-08-2018 03:35 PM

Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:

data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;

PG

brophymj · Posted 09-09-2018 03:08 AM

Thanks PG, why is the approach the wrong way to go? Just to give some background on what I’m doing... I’m trying to simulate large claims data for a policy. The policy has an expected number of claims of 212 per year and I’m trying to determine average cost of claims in excess of €1m. Once I simulate the above I will imposte a condition on the amount like “if amount < 1000000 then net = 0; else net = amount - 1000000;” I will then sum up the net amount over all claims in each simulation and then average that amount over the 10000 simulation. This will give me the expected claims cost in excess of €1m based on the assumption I’ve used.

Rick_SAS · Posted 09-09-2018 06:44 AM

The long data format is preferred because it enables you to efficiently process each simulated sample by using BY-group processing in procedures such as PROC MEANS or in the DATA step. See "Simulation in SAS: The slow way or the BY way"

PGStats · Posted 09-09-2018 02:28 PM

See how it could be done with a long data format:

data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

proc sql;
create table exClaims as
select
    simNo,
    mean(amount) as meanClaim,
    sum( case when amount > 1e6 then amount - 1e6 else 0 end ) as netClaims
from poisson
group by simNo;
quit;

proc univariate data=exClaims;
var meanClaim netClaims;
histogram;
format meanClaim netClaims e7.2;
run;

PG

brophymj · Posted 09-10-2018 05:36 AM

Many thanks for your help, PG.

brophymj · Posted 09-09-2018 06:55 AM

Thanks Rick, that makes sense.

SAS Innovate 2025: Save the Date