Hi All,
I have input tables that contains duplicates. I need to remove the duplicate to keep a distinct list of the first instance of the record.
However I have discovered that at times I actually need to keep the second record in certain circumstances.
I usually match the records by Connection and Customer_ID. But if the there is a duplicate and the first instance doesn't have all the information I need to keep the second or subsequent record.
In the example below I show Customer_Id appearing 3 times. wice for Connection 12345, which I need to keep the second record, and also the Connection 12349 as it is a different Connection.
I'm trying now to create test data but this is what I'm trying to achieve.
This is the data I have
Date | Connection | Cust_Id | Names | queue_status | Employee_ID | Agent__Group | First_Name | last_name | status |
14Mar2019 9:47:00 | 12345 | AAAA | Peaches | Not Picked | |||||
14Mar2019 9:54:00 | 12345 | AAAA | Peaches | Picked | AAA111 | Fruit | Jane | Doe | Orchard |
14Mar2019 10:47:00 | 12346 | BBBB | Peaches | Picked | BBB222 | Fruit | John | Doe | Orchard |
14Mar2019 10:54:00 | 12346 | BBBB | Peaches | Picked | CCC333 | Fruit | John | Doe | Orchard |
14Mar2019 11:30:00 | 12347 | CCCC | Peaches | Picked | DDD444 | Fruit | Kid | Doe | Orchard |
14Mar2019 9:01:00 | 12348 | DDDD | Peaches | Not Picked | |||||
14Mar2019 11:54:00 | 12349 | AAAA | Peaches | Picked | EEE555 | Fruit | Little | Doe | Orchard |
This is the result I need
Date | Connection | Cust_Id | Names | Status | Employee_ID | Group | First_Name | last_name | status |
14Mar2019 9:54:00 | 12345 | AAAA | Peaches | Picked | AAA111 | Fruit | Jane | Doe | Orchard |
14Mar2019 10:47:00 | 12346 | BBBB | Peaches | Picked | BBB222 | Fruit | John | Doe | Orchard |
14Mar2019 11:30:00 | 12347 | CCCC | Peaches | Picked | DDD444 | Fruit | Kid | Doe | Orchard |
14Mar2019 9:47:00 | 12348 | DDDD | Peaches | Not Picked | |||||
14Mar2019 11:54:00 | 12349 | AAAA | Peaches | Picked | EEE555 | Fruit | Little | Doe | Orchard |
Any help appreciated.
Cheers
Dean.
Hi,
maybe this one will help:
data have;
infile cards dlm="#";
input Date : datetime19. Connection Cust_Id : $ Names : $ queue_status : $ 20. Employee_ID : $ Agent__Group: $ First_Name : $ last_name : $ status : $;
format Date datetime19.;
cards;
14Mar2019 9:47:00#12345#AAAA#Peaches#Not Picked# # # # #
14Mar2019 9:54:00#12345#AAAA#Peaches#Picked#AAA111#Fruit#Jane#Doe#Orchard
14Mar2019 10:47:00#12346#BBBB#Peaches#Picked#BBB222#Fruit#John#Doe#Orchard
14Mar2019 10:54:00#12346#BBBB#Peaches#Picked#CCC333#Fruit#John#Doe#Orchard
14Mar2019 11:30:00#12347#CCCC#Peaches#Picked#DDD444#Fruit#Kid#Doe#Orchard
14Mar2019 9:01:00#12348#DDDD#Peaches#Not Picked# # # # #
14Mar2019 11:54:00#12349#AAAA#Peaches#Picked#EEE555#Fruit#Little#Doe#Orchard
;
run;
data want;
/* check if group is duplicated */
if 0 then set have(keep = Connection Cust_Id);
length _counter_ 8; drop _counter_;
declare hash H();
H.defineKey("Connection","Cust_Id");
H.defineData("_counter_");
H.defineDone();
do until (eof);
set have(keep = Connection Cust_Id) end = eof;
by Connection Cust_Id;
if first.Cust_Id then _counter_ = 0;
_counter_ + 1;
_IORC_ = h.replace();
end;
/*H.output(dataset:"H");*/
eof = 0;
/* apply selection logic */
do until (eof);
set have end = eof;
by Connection Cust_Id;
if first.cust_id then _stop_ = 0; drop _stop_;
if _stop_ then continue;
_IORC_ = h.find();
_cMISS_ = cmiss(Agent__Group, First_Name, last_name, status); drop _cMISS_;
select;
/* first in the group and everything is ok */
when (first.Cust_Id and _cMISS_ eq 0) do; output; _stop_ = 1; end;
/* only one is the group */
when (first.Cust_Id and _counter_ = 1) do; output; _stop_ = 1; end;
/* first wsnt ok check next */
when ((not first.Cust_Id) and _cMISS_ eq 0) do; output; _stop_ = 1; end;
/* nothing was ok and we are in the last output last */
when (last.Cust_Id) do; output; _stop_ = 1; end;
otherwise;
end;
end;
stop;
run;
all the best
Bart
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.