BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brophymj
Quartz | Level 8

 

Hi, 

 

I am trying to simulate claim amounts and claim numbers over 10,000 simulations. I can figure out how to determine the number of claims but I want to then use that information to simulate the amount of each claim. 

 

The code below simulates the number of claims for 10,000 observations. However, I want to go a step further and to simulate claim amounts for each similation. For example, if the Poisson generates 212 for the first observation I want to produce 212 claim amounts with a lognormal distribution across the columns? My lognormal parameters are mu = 9.1 and shape = 1.5.m 

 

data poisson(keep = x);
call streaminit (4321);
lambda = 212;
do i = 1 to 10000;
x = rand("Poisson",lambda);
output;
end;
run;

 

The resulting dataset would be a 10000 rows by approximately 300 colums.

 

I would also like to impose caps on the severity but I think I can do this. 

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:

 

data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;
PG

View solution in original post

8 REPLIES 8
PGStats
Opal | Level 21

A long data format will be a lot simpler to use for almost any data manipulation or analysis, start with this:

 

data poisson(keep = simNo claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;
PG
brophymj
Quartz | Level 8

Many thanks for this PG, is there any easy way to have the simulations as rows and the claims for each simulation as columns.

so the final dataset would be 10,000 by 250(or thereabouts).

 

 

PGStats
Opal | Level 21

Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:

 

data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;
PG
brophymj
Quartz | Level 8
Thanks PG, why is the approach the wrong way to go? Just to give some background on what I’m doing... I’m trying to simulate large claims data for a policy. The policy has an expected number of claims of 212 per year and I’m trying to determine average cost of claims in excess of €1m. Once I simulate the above I will imposte a condition on the amount like “if amount < 1000000 then net = 0; else net = amount - 1000000;” I will then sum up the net amount over all claims in each simulation and then average that amount over the 10000 simulation. This will give me the expected claims cost in excess of €1m based on the assumption I’ve used.
Rick_SAS
SAS Super FREQ

The long data format is preferred because it enables you to efficiently process each simulated sample by using BY-group processing in procedures such as PROC MEANS or in the DATA step. See "Simulation in SAS: The slow way or the BY way"

PGStats
Opal | Level 21

See how it could be done with a long data format:

 

data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
    nbClaims = rand("Poisson",lambda);
    do claimNo = 1 to nbClaims;
        amount = rand("lognormal", mu, shape);
        output;
        end;
    end;
run;

proc sql;
create table exClaims as
select
    simNo,
    mean(amount) as meanClaim,
    sum( case when amount > 1e6 then amount - 1e6 else 0 end ) as netClaims
from poisson
group by simNo;
quit;

proc univariate data=exClaims;
var meanClaim netClaims;
histogram;
format meanClaim netClaims e7.2;
run;
PG
brophymj
Quartz | Level 8

Many thanks for your help, PG. 

brophymj
Quartz | Level 8
Thanks Rick, that makes sense.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1940 views
  • 6 likes
  • 3 in conversation