This is in regards to insurance offer reporting. Due to flaws in the source system processes, duplicate offers can be generated. But they're not true duplicates, because they do get unique offer IDs, and can be generated up to 3 days apart. They're only duplicate in the sense that they should be considered the same offer from a business perspective.
Our current reporting counts these as individual offers. But now we need to ensure that the "duplicates" are no longer included.
HAVE
OFFER_ID
OFFER_DATE
CUSTOMER_ID
101
2022-05-19
12345
117
2022-05-19
12345
127
2022-05-21
12345
133
2022-06-13
12345
139
2022-05-01
84637
144
2022-06-03
87609
146
2022-06-03
87609
151
2022-05-14
24556
159
2022-05-18
24556
161
2022-05-18
24556
165
2022-05-19
24556
167
2022-05-22
24556
177
2022-06-27
12345
WANT
OFFER_ID
OFFER_DATE
CUSTOMER_ID
101
2022-05-19
12345
133
2022-06-13
12345
139
2022-05-01
84637
144
2022-06-03
87609
151
2022-05-14
24556
159
2022-05-18
24556
177
2022-06-27
12345
I basically want to select the first OFFER_ID & OFFER_DATE per CUSTOMER_ID. But only when the OFFER_DATE of the offers are within max 3 days of each other. If an offer was generated after 4+ days, it should be considered a unique new offer. This rule has been agreed upon among the business users.
Consider the WANT-rows for CUSTOMER_ID=24556. The offers generated on 2022-05-18 and 2022-05-22 are 4 days apart, yet they should still be treated as the same offer because they're "linked" by the offer generated on 2022-05-19. So basically: "2022-05-18 is within 3 days of 2022-05-19, which is within 3 days of 2022-05-22, therefore these 3 offers should be counted as 1".
Any advice on how to implement this? I imagine a self-join on CUSTOMER_ID is involved, but I'm unsure on how to group based on the 3-day-rule.
... View more