BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
StatThesis4Help
Calcite | Level 5

Dear discussion board users,

Any information/help is welcome since I'm trying to get my data working for my thesis. I'm testing a Asset Liability program containing lots of data (over 1 million records) from a insurance company.

I would like to narrow the different files down by aggregating similar policies. So I have these data sets: (I could have left out the Inception date and maturity date on the 3 underlying data)

POLICY DATA

Group Policy_numberInception_dateMaturity_dateBirth_date_life_1Birth_date_life2Initial_Payment
21000101-07-199030-06-202201-07-196001-01-19000
21000201-07-199030-06-202201-07-196001-01-190010
21000301-07-199030-06-202201-07-196001-01-19000
31000501-07-199230-06-202901-07-196001-01-19001000
31000601-07-199230-06-202901-07-196001-01-19000

COVERAGE DATA

GroupPolicy_numberInception_dateMaturity_dateCoverage_ID*
21000101-07-199030-06-20221
21000101-07-199030-06-20222
21000201-07-199030-06-20221
21000301-07-199030-06-20221
31000501-07-199230-06-20291
31000601-07-199230-06-20291
31000601-07-199230-06-20292
31000601-07-199230-06-20293

FUND DATA

GroupPolicy_numberInception_dateMaturity_dateFUND_ID*
21000101-07-199030-06-20221
21000201-07-199030-06-20221
21000301-07-199030-06-20221
31000501-07-199230-06-20291
31000501-07-199230-06-20292
31000501-07-199230-06-20293
31000601-07-199230-06-20291

PAYMENTS DATA

GroupPolicy_numberInception_dateMaturity_datePayment_ID*
21000101-07-199030-06-20221
21000201-07-199030-06-20221
21000301-07-199030-06-20221
21000601-07-199230-06-20291
31000601-07-199230-06-20292

**10005 not included in Payments because there was a initial payment, recorded in POLICY.

So far this is een extract example with limited rows (varying from around 600.000) and columns (each file more than 20 columns).

The aggregation process normally would go with SQL GROUP BY etc... I know all that, however the policy information is lost and hence the linkage to the other data is lost.

What I would like to see, is aggregate on the policy data, leaving the other files intact. BUT, changing their corresponding policy number which were aggregated to something unique, put into the underlying 3 tables.

POLICY DATA_AGGREGATED_STEP1

Group Policy_numberInception_dateMaturity_dateBirth_date_life_1Birth_date_life2SUM_OF_Initial_Payment
2200001 /* (this was10001-10003) */01-07-199030-06-202201-07-196001-01-190010
3200002 /* (this was10005) */01-07-199230-06-202901-07-196001-01-19001000
3200003 /* (this was10006) */01-07-199230-06-202901-07-196001-01-19000

IMPORTANT: it is allowed to have non-unique policynumbers in the underlying files, but the POLICY data should consist of UNIQUE policies only.

So the only thing that should change now in the COVERAGE, FUND and PAYMENTS DATA is the Policy_number to their appointed Newly unique policy number. And normally with SQL which policy was linked to which is lost...

Any suggestions or questions are very much appreciated since I need to get on with the analyses!

Thanks again in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
StatThesis4Help
Calcite | Level 5

So what I did was created a SQL grouping the same policy, assigning them a unique number with MONOTONIC() and then rewriting these unique_id to the policies with the same properties with a inner join. So the mapping policy file and the policy file were combined. Aggregation with as much as 25.000 50%.

Thanks anyway for helping and good luck with this valuable information Smiley Happy

View solution in original post

3 REPLIES 3
Reeza
Super User

How are you pulling together policies 10001-10003? I don't see anything in the 3 files that links them (can be missing something) or says they are the same, besides the same dates and those don't seem like they'd be unique to me in a 1million records anyways.

StatThesis4Help
Calcite | Level 5

Reeza, thanks for your question.

So I chose for the policies to aggregate on some variables and SUM_OF_Initial_Payment just now leaving several grouping variables such as the dates.

The dates are the same, so they shouldn't be much trouble. The aggregation process could well be under 5% of the original depending on the variables chosen to group, although I need to represent the portfolio quiet representative.

So I'm figuring out how to aggregate this (this is possible with group), but leaving the underlying data linked to one another. Because SQL Group By, will want to summarize the policy numbers, since all of them are unique. None of them could incorporate assigning a new policy number and assign it to the underlying data.

StatThesis4Help
Calcite | Level 5

So what I did was created a SQL grouping the same policy, assigning them a unique number with MONOTONIC() and then rewriting these unique_id to the policies with the same properties with a inner join. So the mapping policy file and the policy file were combined. Aggregation with as much as 25.000 50%.

Thanks anyway for helping and good luck with this valuable information Smiley Happy

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 677 views
  • 0 likes
  • 2 in conversation