Hello,
I am looking for code to remove duplicate records with overlapping dates for same customer ID and keep the one with oldest Service_Date_From.
For e.g. If a customer has Service_Date_From as 10/04/18 and Service_Date_To as 10/12/18 then any other transaction of the same customer between these 2 dates mentioned above (be it Service_From_Date or Service_To_Date) should be considered as duplicate and hence dropped from the output data set.
Can someone please help with the code.
Preferably using the first. and last. options
data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date_From mmddyy8 Service_Date_To mmddyy8.;
format Service_Date_From mmddyy10.;
datalines;
AB1 12345 10/12/18 10/14/18
AB1 12345 10/14/18 10/18/18
AB1 12345 10/15/18 10/16/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
;
run;
Result I want is this.
AB1 12345 10/12/18 10/14/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
Please try the first.
data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date_From mmddyy8 Service_Date_To mmddyy8.;
format Service_Date_From mmddyy10.;
datalines;
AB1 12345 10/12/18 10/14/18
AB1 12346 10/14/18 10/18/18
AB1 12347 10/15/18 10/16/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
;
run;
proc sort data=have;
by Customer_ID $ Transaction_ID Service_Date_From;
run;
data want;
set have;
by Customer_ID $ Transaction_ID Service_Date_From;
if first.Transaction_ID;
run;
or simply the proc sort also works
proc sort data=have out=want nodupkey;
by Customer_ID Transaction_ID ;
run;
Why do you only keep one record for AB1?
AB1 12345 10/12/18 10/14/18
For the first record of AB1 , the service_date_to has 10/14 which overlaps with second record's service date from. Similarly, 2nd record has dates 10/14 to 10/18 which overlaps with 3rd record dates i.e. 10/15 and 10/16. I retain first record since it has the oldest date i.e. 10/12. I hope that answers your question. Let me know if you need more info on this. Thanks for looking into it.
2 questions:
1.If the dates are say:
Jan01-Jan03
Jan02-Jan04
Jan03-Jan05
Jan04-Jan26
which records do you keep?
2.Is the data, sorted i.e. does the first record encountered have any special reason to be kept rather than the second one?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.