Hi All,
I have input tables that contains duplicates. I need to remove the duplicate to keep a distinct list of the first instance of the record.
However I have discovered that at times I actually need to keep the second record in certain circumstances.
I usually match the records by Connection and Customer_ID. But if the there is a duplicate and the first instance doesn't have all the information I need to keep the second or subsequent record.
In the example below I show Customer_Id appearing 3 times. wice for Connection 12345, which I need to keep the second record, and also the Connection 12349 as it is a different Connection.
I'm trying now to create test data but this is what I'm trying to achieve.
This is the data I have
Date | Connection | Cust_Id | Names | queue_status | Employee_ID | Agent__Group | First_Name | last_name | status |
14Mar2019 9:47:00 | 12345 | AAAA | Peaches | Not Picked | |||||
14Mar2019 9:54:00 | 12345 | AAAA | Peaches | Picked | AAA111 | Fruit | Jane | Doe | Orchard |
14Mar2019 10:47:00 | 12346 | BBBB | Peaches | Picked | BBB222 | Fruit | John | Doe | Orchard |
14Mar2019 10:54:00 | 12346 | BBBB | Peaches | Picked | CCC333 | Fruit | John | Doe | Orchard |
14Mar2019 11:30:00 | 12347 | CCCC | Peaches | Picked | DDD444 | Fruit | Kid | Doe | Orchard |
14Mar2019 9:01:00 | 12348 | DDDD | Peaches | Not Picked | |||||
14Mar2019 11:54:00 | 12349 | AAAA | Peaches | Picked | EEE555 | Fruit | Little | Doe | Orchard |
This is the result I need
Date | Connection | Cust_Id | Names | Status | Employee_ID | Group | First_Name | last_name | status |
14Mar2019 9:54:00 | 12345 | AAAA | Peaches | Picked | AAA111 | Fruit | Jane | Doe | Orchard |
14Mar2019 10:47:00 | 12346 | BBBB | Peaches | Picked | BBB222 | Fruit | John | Doe | Orchard |
14Mar2019 11:30:00 | 12347 | CCCC | Peaches | Picked | DDD444 | Fruit | Kid | Doe | Orchard |
14Mar2019 9:47:00 | 12348 | DDDD | Peaches | Not Picked | |||||
14Mar2019 11:54:00 | 12349 | AAAA | Peaches | Picked | EEE555 | Fruit | Little | Doe | Orchard |
Any help appreciated.
Cheers
Dean.
Hi,
maybe this one will help:
data have;
infile cards dlm="#";
input Date : datetime19. Connection Cust_Id : $ Names : $ queue_status : $ 20. Employee_ID : $ Agent__Group: $ First_Name : $ last_name : $ status : $;
format Date datetime19.;
cards;
14Mar2019 9:47:00#12345#AAAA#Peaches#Not Picked# # # # #
14Mar2019 9:54:00#12345#AAAA#Peaches#Picked#AAA111#Fruit#Jane#Doe#Orchard
14Mar2019 10:47:00#12346#BBBB#Peaches#Picked#BBB222#Fruit#John#Doe#Orchard
14Mar2019 10:54:00#12346#BBBB#Peaches#Picked#CCC333#Fruit#John#Doe#Orchard
14Mar2019 11:30:00#12347#CCCC#Peaches#Picked#DDD444#Fruit#Kid#Doe#Orchard
14Mar2019 9:01:00#12348#DDDD#Peaches#Not Picked# # # # #
14Mar2019 11:54:00#12349#AAAA#Peaches#Picked#EEE555#Fruit#Little#Doe#Orchard
;
run;
data want;
/* check if group is duplicated */
if 0 then set have(keep = Connection Cust_Id);
length _counter_ 8; drop _counter_;
declare hash H();
H.defineKey("Connection","Cust_Id");
H.defineData("_counter_");
H.defineDone();
do until (eof);
set have(keep = Connection Cust_Id) end = eof;
by Connection Cust_Id;
if first.Cust_Id then _counter_ = 0;
_counter_ + 1;
_IORC_ = h.replace();
end;
/*H.output(dataset:"H");*/
eof = 0;
/* apply selection logic */
do until (eof);
set have end = eof;
by Connection Cust_Id;
if first.cust_id then _stop_ = 0; drop _stop_;
if _stop_ then continue;
_IORC_ = h.find();
_cMISS_ = cmiss(Agent__Group, First_Name, last_name, status); drop _cMISS_;
select;
/* first in the group and everything is ok */
when (first.Cust_Id and _cMISS_ eq 0) do; output; _stop_ = 1; end;
/* only one is the group */
when (first.Cust_Id and _counter_ = 1) do; output; _stop_ = 1; end;
/* first wsnt ok check next */
when ((not first.Cust_Id) and _cMISS_ eq 0) do; output; _stop_ = 1; end;
/* nothing was ok and we are in the last output last */
when (last.Cust_Id) do; output; _stop_ = 1; end;
otherwise;
end;
end;
stop;
run;
all the best
Bart
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.