BookmarkSubscribeRSS Feed
VarunD
Obsidian | Level 7

Hello,

 

I am looking for code to remove duplicate records with overlapping dates for same customer ID and keep the one with oldest Service_Date_From.

For e.g. If a customer has Service_Date_From as 10/04/18 and Service_Date_To as 10/12/18 then any other transaction of the same customer between these 2 dates mentioned above (be it Service_From_Date or Service_To_Date) should be considered as duplicate and hence dropped from the output data set.

 

Can someone please help with the code.
Preferably using the first. and last. options

 

data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date_From mmddyy8 Service_Date_To mmddyy8.;
format Service_Date_From mmddyy10.;
datalines;
AB1 12345 10/12/18 10/14/18
AB1 12345 10/14/18 10/18/18
AB1 12345 10/15/18 10/16/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
;
run;

Result I want is this.

AB1 12345 10/12/18 10/14/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19

 

9 REPLIES 9
Jagadishkatam
Amethyst | Level 16

Please try the  first.

 

data have;
infile datalines dlm='09'x dsd;
input Customer_ID $ Transaction_ID Service_Date_From mmddyy8 Service_Date_To mmddyy8.;
format Service_Date_From mmddyy10.;
datalines;
AB1 12345 10/12/18 10/14/18
AB1 12346 10/14/18 10/18/18
AB1 12347 10/15/18 10/16/18
AB2 22233 6/10/18 06/15/18
AB2 22234 9/12/18 9/13/18
AB3 35678 1/5/19 1/15/19
;
run;

proc sort data=have;
by  Customer_ID $ Transaction_ID Service_Date_From;
run;

data want;
set have;
by  Customer_ID $ Transaction_ID Service_Date_From;
if first.Transaction_ID;
run;
Thanks,
Jag
Jagadishkatam
Amethyst | Level 16

or simply the proc sort also works

 

 
proc sort data=have out=want nodupkey;
by Customer_ID Transaction_ID ;
run;
  

 

Thanks,
Jag
VarunD
Obsidian | Level 7
I am sorry that didn't seem to work. I guess there is drawback with the data I provided as well. Because Transaction ID is not unique. I will update the data now .
ChrisNZ
Tourmaline | Level 20

Why do you only keep one record for AB1?

AB1 12345 10/12/18 10/14/18

VarunD
Obsidian | Level 7

For the first record of AB1 , the service_date_to has 10/14 which overlaps with second record's service date from. Similarly, 2nd record has dates 10/14 to 10/18 which overlaps with 3rd record dates i.e. 10/15 and 10/16. I retain first record since it has the oldest date i.e. 10/12. I hope that answers your question. Let me know if you need more info on this. Thanks for looking into it.

ChrisNZ
Tourmaline | Level 20

2 questions:
1.If the dates are say:
Jan01-Jan03
Jan02-Jan04
Jan03-Jan05
Jan04-Jan26
which records do you keep?
2.Is the data, sorted i.e. does the first record encountered have any special reason to be kept rather than the second one?

VarunD
Obsidian | Level 7
I would keep Jan01-Jan03 since it has the oldest date i.e. Jan01.
Data is not sorted. Any record with the oldest date should be kept. Hope that answers your questions. Please let me know if you need more information on this. Again, thanks for looking into this. Have a good weekend !
Riteshdell
Quartz | Level 8
As per his Source data , and date guidelines, its not matching, kindly provide correct data
VarunD
Obsidian | Level 7
Thanks for looking into this. Could you please specify what is wrong with the data ?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1970 views
  • 0 likes
  • 4 in conversation