05-07-2015 10:21 AM
We are in the middle of doing a large reserve analysis for my organization. Background: I am using Decision Trees to investigate the triggers of reserve increases. For example, a claims rep might estimate a claim to cost $10,000 when a claim comes in. But I wish to help them predict when it will explode and wind up costing my organization much more money in the end.
To build such models it will take multiple sub-models based on different time periods. A lot has to do with the “maturity” of the particular claim. For example, a model may look drastically different when a claim is looked at one month after an injury versus six months later. Six months later they may have gone through multiple operations - i.e. much more expensive.
So, because I need to do different models for differ time periods, this is where the community expertise may help. I am doing a pre-run to assess where we are seeing typical jumps in the reserves for past claims. For example, one claim might start at $5,000 and jump up to $15,000 one month later, and then jump up to $50,000 two months later. It would look a bit like a reverse scree plot. Prior analytical work that was done before I came here said that 15, 30, 45, 90, & 180 days were the magic numbers of how I should break apart the different models.
So I have arranged for an extra data pull from our IT department. Below is an example of what a typical claim in this this data pull looks like:
|Claim Number||Reserve1 Date||Reserve1 Amount||<all the way to>||Reserve 12 Date||Reserve12 Amount||Ultimate Reserve||Ultimate SD|
12 is looking like our magic number as about 95% of claims collected need these 12 time periods to eventually be near closure.
For a recent 5-year period I have 10,000 claims. Each claim may serve as an example in a pure time-series analysis where the maximum number of points for that claim would be 12. But obviously I cannot repeat this analysis 10,000 times.
What do you recommend? I am thinking of “stacking” the data so that these 10,000 claims turn into 120,000 rows if we assumed each claim had all 12 time periods. I know for sure I will take varying averages across different time periods, but can you think of anything else? It does not appear that the ETS module for time-series analyses may accommodate different dates per row, or am I missing something?
Thank you very much in advance. Any help or ideas are valued and appreciated. Sorry about the length.