BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
EinarRoed
Pyrite | Level 9

This is in regards to insurance offer reporting. Due to flaws in the source system processes, duplicate offers can be generated. But they're not true duplicates, because they do get unique offer IDs, and can be generated up to 3 days apart. They're only duplicate in the sense that they should be considered the same offer from a business perspective.

 

Our current reporting counts these as individual offers. But now we need to ensure that the "duplicates" are no longer included.

 

HAVE

OFFER_ID OFFER_DATE CUSTOMER_ID
101 2022-05-19 12345
117 2022-05-19 12345
127 2022-05-21 12345
133 2022-06-13 12345
139 2022-05-01 84637
144 2022-06-03 87609
146 2022-06-03 87609
151 2022-05-14 24556
159 2022-05-18 24556
161 2022-05-18 24556
165 2022-05-19 24556
167 2022-05-22 24556
177 2022-06-27 12345

 

WANT

OFFER_ID OFFER_DATE CUSTOMER_ID
101 2022-05-19 12345
133 2022-06-13 12345
139 2022-05-01 84637
144 2022-06-03 87609
151 2022-05-14 24556
159 2022-05-18 24556
177 2022-06-27 12345

 

I basically want to select the first OFFER_ID & OFFER_DATE per CUSTOMER_ID. But only when the OFFER_DATE of the offers are within max 3 days of each other. If an offer was generated after 4+ days, it should be considered a unique new offer. This rule has been agreed upon among the business users.

 

Consider the WANT-rows for CUSTOMER_ID=24556. The offers generated on 2022-05-18 and 2022-05-22 are 4 days apart, yet they should still be treated as the same offer because they're "linked" by the offer generated on 2022-05-19. So basically: "2022-05-18 is within 3 days of 2022-05-19, which is within 3 days of 2022-05-22, therefore these 3 offers should be counted as 1".

 

Any advice on how to implement this? I imagine a self-join on CUSTOMER_ID is involved, but I'm unsure on how to group based on the 3-day-rule.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @EinarRoed,

 

Try this:

proc sort data=have;
by customer_id offer_date offer_id;
run;

data want;
set have;
by customer_id;
if first.customer_id | offer_date-3>lag(offer_date);
run;

proc sort data=want;
by offer_id;
run;

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @EinarRoed,

 

Try this:

proc sort data=have;
by customer_id offer_date offer_id;
run;

data want;
set have;
by customer_id;
if first.customer_id | offer_date-3>lag(offer_date);
run;

proc sort data=want;
by offer_id;
run;
EinarRoed
Pyrite | Level 9
Works like a charm, thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 294 views
  • 1 like
  • 2 in conversation