Dear discussion board users, Any information/help is welcome since I'm trying to get my data working for my thesis. I'm testing a Asset Liability program containing lots of data (over 1 million records) from a insurance company. I would like to narrow the different files down by aggregating similar policies. So I have these data sets: (I could have left out the Inception date and maturity date on the 3 underlying data) POLICY DATA Group Policy_number Inception_date Maturity_date Birth_date_life_1 Birth_date_life2 Initial_Payment 2 10001 01-07-1990 30-06-2022 01-07-1960 01-01-1900 0 2 10002 01-07-1990 30-06-2022 01-07-1960 01-01-1900 10 2 10003 01-07-1990 30-06-2022 01-07-1960 01-01-1900 0 3 10005 01-07-1992 30-06-2029 01-07-1960 01-01-1900 1000 3 10006 01-07-1992 30-06-2029 01-07-1960 01-01-1900 0 COVERAGE DATA Group Policy_number Inception_date Maturity_date Coverage_ID * 2 10001 01-07-1990 30-06-2022 1 2 10001 01-07-1990 30-06-2022 2 2 10002 01-07-1990 30-06-2022 1 2 10003 01-07-1990 30-06-2022 1 3 10005 01-07-1992 30-06-2029 1 3 10006 01-07-1992 30-06-2029 1 3 10006 01-07-1992 30-06-2029 2 3 10006 01-07-1992 30-06-2029 3 FUND DATA Group Policy_number Inception_date Maturity_date FUND_ID * 2 10001 01-07-1990 30-06-2022 1 2 10002 01-07-1990 30-06-2022 1 2 10003 01-07-1990 30-06-2022 1 3 10005 01-07-1992 30-06-2029 1 3 10005 01-07-1992 30-06-2029 2 3 10005 01-07-1992 30-06-2029 3 3 10006 01-07-1992 30-06-2029 1 PAYMENTS DATA Group Policy_number Inception_date Maturity_date Payment_ID * 2 10001 01-07-1990 30-06-2022 1 2 10002 01-07-1990 30-06-2022 1 2 10003 01-07-1990 30-06-2022 1 2 10006 01-07-1992 30-06-2029 1 3 10006 01-07-1992 30-06-2029 2 **10005 not included in Payments because there was a initial payment, recorded in POLICY. So far this is een extract example with limited rows (varying from around 600.000) and columns (each file more than 20 columns). The aggregation process normally would go with SQL GROUP BY etc... I know all that, however the policy information is lost and hence the linkage to the other data is lost. What I would like to see, is aggregate on the policy data, leaving the other files intact. BUT, changing their corresponding policy number which were aggregated to something unique, put into the underlying 3 tables. POLICY DATA_AGGREGATED_STEP1 Group Policy_number Inception_date Maturity_date Birth_date_life_1 Birth_date_life2 SUM_OF_Initial_Payment 2 200001 /* (this was10001-10003) */ 01-07-1990 30-06-2022 01-07-1960 01-01-1900 10 3 200002 /* (this was10005) */ 01-07-1992 30-06-2029 01-07-1960 01-01-1900 1000 3 200003 /* (this was10006) */ 01-07-1992 30-06-2029 01-07-1960 01-01-1900 0 IMPORTANT: it is allowed to have non-unique policynumbers in the underlying files, but the POLICY data should consist of UNIQUE policies only. So the only thing that should change now in the COVERAGE, FUND and PAYMENTS DATA is the Policy_number to their appointed Newly unique policy number. And normally with SQL which policy was linked to which is lost... Any suggestions or questions are very much appreciated since I need to get on with the analyses! Thanks again in advance.
... View more