Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Simulating claims data using Poisson and Lognormal distributions

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-08-2018 11:50 AM
(1881 views)

Hi,

I am trying to simulate claim amounts and claim numbers over 10,000 simulations. I can figure out how to determine the number of claims but I want to then use that information to simulate the amount of each claim.

The code below simulates the number of claims for 10,000 observations. However, I want to go a step further and to simulate claim amounts for each similation. For example, if the Poisson generates 212 for the first observation I want to produce 212 claim amounts with a lognormal distribution across the columns? My lognormal parameters are mu = 9.1 and shape = 1.5.m

data poisson(keep = x);

call streaminit (4321);

lambda = 212;

do i = 1 to 10000;

x = rand("Poisson",lambda);

output;

end;

run;

The resulting dataset would be a 10000 rows by approximately 300 colums.

I would also like to impose caps on the severity but I think I can do this.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:

```
data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;
```

PG

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A long data format will be a lot simpler to use for almost any data manipulation or analysis, start with this:

```
data poisson(keep = simNo claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks for this PG, is there any easy way to have the simulations as rows and the claims for each simulation as columns.

so the final dataset would be 10,000 by 250(or thereabouts).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sure. But, in the long run, a wide data structure like the one you propose, is the wrong way to go. Anyway, this is how to get it:

```
data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
proc transpose data=poisson out=claims(drop=_name_) prefix=claim_;
by simNo nbClaims;
id claimNo;
var amount;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks PG, why is the approach the wrong way to go? Just to give some background on what I’m doing... I’m trying to simulate large claims data for a policy. The policy has an expected number of claims of 212 per year and I’m trying to determine average cost of claims in excess of €1m. Once I simulate the above I will imposte a condition on the amount like “if amount < 1000000 then net = 0; else net = amount - 1000000;” I will then sum up the net amount over all claims in each simulation and then average that amount over the 10000 simulation. This will give me the expected claims cost in excess of €1m based on the assumption I’ve used.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

See how it could be done with a long data format:

```
data poisson(keep = simNo nbClaims claimNo amount);
call streaminit (4321);
lambda = 212;
mu = 9.1;
shape = 1.5;
do simNo = 1 to 10000;
nbClaims = rand("Poisson",lambda);
do claimNo = 1 to nbClaims;
amount = rand("lognormal", mu, shape);
output;
end;
end;
run;
proc sql;
create table exClaims as
select
simNo,
mean(amount) as meanClaim,
sum( case when amount > 1e6 then amount - 1e6 else 0 end ) as netClaims
from poisson
group by simNo;
quit;
proc univariate data=exClaims;
var meanClaim netClaims;
histogram;
format meanClaim netClaims e7.2;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks for your help, PG.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Rick, that makes sense.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.