This is in regards to insurance offer reporting. Due to flaws in the source system processes, duplicate offers can be generated. But they're not true duplicates, because they do get unique offer IDs, and can be generated up to 3 days apart. They're only duplicate in the sense that they should be considered the same offer from a business perspective.
Our current reporting counts these as individual offers. But now we need to ensure that the "duplicates" are no longer included.
HAVE
OFFER_ID | OFFER_DATE | CUSTOMER_ID |
101 | 2022-05-19 | 12345 |
117 | 2022-05-19 | 12345 |
127 | 2022-05-21 | 12345 |
133 | 2022-06-13 | 12345 |
139 | 2022-05-01 | 84637 |
144 | 2022-06-03 | 87609 |
146 | 2022-06-03 | 87609 |
151 | 2022-05-14 | 24556 |
159 | 2022-05-18 | 24556 |
161 | 2022-05-18 | 24556 |
165 | 2022-05-19 | 24556 |
167 | 2022-05-22 | 24556 |
177 | 2022-06-27 | 12345 |
WANT
OFFER_ID | OFFER_DATE | CUSTOMER_ID |
101 | 2022-05-19 | 12345 |
133 | 2022-06-13 | 12345 |
139 | 2022-05-01 | 84637 |
144 | 2022-06-03 | 87609 |
151 | 2022-05-14 | 24556 |
159 | 2022-05-18 | 24556 |
177 | 2022-06-27 | 12345 |
I basically want to select the first OFFER_ID & OFFER_DATE per CUSTOMER_ID. But only when the OFFER_DATE of the offers are within max 3 days of each other. If an offer was generated after 4+ days, it should be considered a unique new offer. This rule has been agreed upon among the business users.
Consider the WANT-rows for CUSTOMER_ID=24556. The offers generated on 2022-05-18 and 2022-05-22 are 4 days apart, yet they should still be treated as the same offer because they're "linked" by the offer generated on 2022-05-19. So basically: "2022-05-18 is within 3 days of 2022-05-19, which is within 3 days of 2022-05-22, therefore these 3 offers should be counted as 1".
Any advice on how to implement this? I imagine a self-join on CUSTOMER_ID is involved, but I'm unsure on how to group based on the 3-day-rule.
Hello @EinarRoed,
Try this:
proc sort data=have;
by customer_id offer_date offer_id;
run;
data want;
set have;
by customer_id;
if first.customer_id | offer_date-3>lag(offer_date);
run;
proc sort data=want;
by offer_id;
run;
Hello @EinarRoed,
Try this:
proc sort data=have;
by customer_id offer_date offer_id;
run;
data want;
set have;
by customer_id;
if first.customer_id | offer_date-3>lag(offer_date);
run;
proc sort data=want;
by offer_id;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.