Hi,
I am trying to simulate claim amounts and claim numbers over 10,000 simulations. I can figure out how to determine the number of claims but I want to then use that information to simulate the amount of each claim.
The code below simulates the number of claims for 10,000 observations. However, I want to go a step further and to simulate claim amounts for each similation. For example, if the Poisson generates 212 for the first observation I want to produce 212 claim amounts with a lognormal distribution across the columns? My lognormal parameters are mu = 9.1 and shape = 1.5.m
data poisson(keep = x);
call streaminit (4321);
lambda = 212;
do i = 1 to 10000;
x = rand("Poisson",lambda);
output;
end;
run;
The resulting dataset would be a 10000 rows by approximately 300 colums.
I would also like to impose caps on the severity but I think I can do this.
Thanks
Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:
data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;
A long data format will be a lot simpler to use for almost any data manipulation or analysis, start with this:
data poisson(keep = simNo claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
Many thanks for this PG, is there any easy way to have the simulations as rows and the claims for each simulation as columns.
so the final dataset would be 10,000 by 250(or thereabouts).
Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:
data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;
The long data format is preferred because it enables you to efficiently process each simulated sample by using BY-group processing in procedures such as PROC MEANS or in the DATA step. See "Simulation in SAS: The slow way or the BY way"
See how it could be done with a long data format:
data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
proc sql;
create table exClaims as
select
simNo,
mean(amount) as meanClaim,
sum( case when amount > 1e6 then amount - 1e6 else 0 end ) as netClaims
from poisson
group by simNo;
quit;
proc univariate data=exClaims;
var meanClaim netClaims;
histogram;
format meanClaim netClaims e7.2;
run;
Many thanks for your help, PG.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.