Hi All,
I have input tables that contains duplicates. I need to remove the duplicate to keep a distinct list of the first instance of the record.
However I have discovered that at times I actually need to keep the second record in certain circumstances.
I usually match the records by Connection and Customer_ID. But if the there is a duplicate and the first instance doesn't have all the information I need to keep the second or subsequent record.
In the example below I show Customer_Id appearing 3 times. wice for Connection 12345, which I need to keep the second record, and also the Connection 12349 as it is a different Connection.
I'm trying now to create test data but this is what I'm trying to achieve.
This is the data I have
Date | Connection | Cust_Id | Names | queue_status | Employee_ID | Agent__Group | First_Name | last_name | status |
14Mar2019 9:47:00 | 12345 | AAAA | Peaches | Not Picked | |||||
14Mar2019 9:54:00 | 12345 | AAAA | Peaches | Picked | AAA111 | Fruit | Jane | Doe | Orchard |
14Mar2019 10:47:00 | 12346 | BBBB | Peaches | Picked | BBB222 | Fruit | John | Doe | Orchard |
14Mar2019 10:54:00 | 12346 | BBBB | Peaches | Picked | CCC333 | Fruit | John | Doe | Orchard |
14Mar2019 11:30:00 | 12347 | CCCC | Peaches | Picked | DDD444 | Fruit | Kid | Doe | Orchard |
14Mar2019 9:01:00 | 12348 | DDDD | Peaches | Not Picked | |||||
14Mar2019 11:54:00 | 12349 | AAAA | Peaches | Picked | EEE555 | Fruit | Little | Doe | Orchard |
This is the result I need
Date | Connection | Cust_Id | Names | Status | Employee_ID | Group | First_Name | last_name | status |
14Mar2019 9:54:00 | 12345 | AAAA | Peaches | Picked | AAA111 | Fruit | Jane | Doe | Orchard |
14Mar2019 10:47:00 | 12346 | BBBB | Peaches | Picked | BBB222 | Fruit | John | Doe | Orchard |
14Mar2019 11:30:00 | 12347 | CCCC | Peaches | Picked | DDD444 | Fruit | Kid | Doe | Orchard |
14Mar2019 9:47:00 | 12348 | DDDD | Peaches | Not Picked | |||||
14Mar2019 11:54:00 | 12349 | AAAA | Peaches | Picked | EEE555 | Fruit | Little | Doe | Orchard |
Any help appreciated.
Cheers
Dean.
Hi,
maybe this one will help:
data have;
infile cards dlm="#";
input Date : datetime19. Connection Cust_Id : $ Names : $ queue_status : $ 20. Employee_ID : $ Agent__Group: $ First_Name : $ last_name : $ status : $;
format Date datetime19.;
cards;
14Mar2019 9:47:00#12345#AAAA#Peaches#Not Picked# # # # #
14Mar2019 9:54:00#12345#AAAA#Peaches#Picked#AAA111#Fruit#Jane#Doe#Orchard
14Mar2019 10:47:00#12346#BBBB#Peaches#Picked#BBB222#Fruit#John#Doe#Orchard
14Mar2019 10:54:00#12346#BBBB#Peaches#Picked#CCC333#Fruit#John#Doe#Orchard
14Mar2019 11:30:00#12347#CCCC#Peaches#Picked#DDD444#Fruit#Kid#Doe#Orchard
14Mar2019 9:01:00#12348#DDDD#Peaches#Not Picked# # # # #
14Mar2019 11:54:00#12349#AAAA#Peaches#Picked#EEE555#Fruit#Little#Doe#Orchard
;
run;
data want;
/* check if group is duplicated */
if 0 then set have(keep = Connection Cust_Id);
length _counter_ 8; drop _counter_;
declare hash H();
H.defineKey("Connection","Cust_Id");
H.defineData("_counter_");
H.defineDone();
do until (eof);
set have(keep = Connection Cust_Id) end = eof;
by Connection Cust_Id;
if first.Cust_Id then _counter_ = 0;
_counter_ + 1;
_IORC_ = h.replace();
end;
/*H.output(dataset:"H");*/
eof = 0;
/* apply selection logic */
do until (eof);
set have end = eof;
by Connection Cust_Id;
if first.cust_id then _stop_ = 0; drop _stop_;
if _stop_ then continue;
_IORC_ = h.find();
_cMISS_ = cmiss(Agent__Group, First_Name, last_name, status); drop _cMISS_;
select;
/* first in the group and everything is ok */
when (first.Cust_Id and _cMISS_ eq 0) do; output; _stop_ = 1; end;
/* only one is the group */
when (first.Cust_Id and _counter_ = 1) do; output; _stop_ = 1; end;
/* first wsnt ok check next */
when ((not first.Cust_Id) and _cMISS_ eq 0) do; output; _stop_ = 1; end;
/* nothing was ok and we are in the last output last */
when (last.Cust_Id) do; output; _stop_ = 1; end;
otherwise;
end;
end;
stop;
run;
all the best
Bart
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.