Hello,
I am looking for code to remove duplicate records with overlapping dates for same customer ID and keep the one with oldest Service_Date_From.
For e.g. If a customer has Service_Date_From as 10/04/18 and Service_Date_To as 10/12/18 then any other transaction of the same customer between these 2 dates mentioned above (be it Service_From_Date or Service_To_Date) should be considered as duplicate and hence dropped from the output data set.
Can someone please help with the code.
Preferably using the first. and last. options
data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date_From mmddyy8 Service_Date_To mmddyy8.;
format Service_Date_From mmddyy10.;
datalines;
AB1 12345 10/12/18 10/14/18
AB1 12345 10/14/18 10/18/18
AB1 12345 10/15/18 10/16/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
;
run;
Result I want is this.
AB1 12345 10/12/18 10/14/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
Please try the first.
data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date_From mmddyy8 Service_Date_To mmddyy8.;
format Service_Date_From mmddyy10.;
datalines;
AB1 12345 10/12/18 10/14/18
AB1 12346 10/14/18 10/18/18
AB1 12347 10/15/18 10/16/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
;
run;
proc sort data=have;
by Customer_ID $ Transaction_ID Service_Date_From;
run;
data want;
set have;
by Customer_ID $ Transaction_ID Service_Date_From;
if first.Transaction_ID;
run;
or simply the proc sort also works
proc sort data=have out=want nodupkey;
by Customer_ID Transaction_ID ;
run;
Why do you only keep one record for AB1?
AB1 12345 10/12/18 10/14/18
For the first record of AB1 , the service_date_to has 10/14 which overlaps with second record's service date from. Similarly, 2nd record has dates 10/14 to 10/18 which overlaps with 3rd record dates i.e. 10/15 and 10/16. I retain first record since it has the oldest date i.e. 10/12. I hope that answers your question. Let me know if you need more info on this. Thanks for looking into it.
2 questions:
1.If the dates are say:
Jan01-Jan03
Jan02-Jan04
Jan03-Jan05
Jan04-Jan26
which records do you keep?
2.Is the data, sorted i.e. does the first record encountered have any special reason to be kept rather than the second one?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.