BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DME790
Pyrite | Level 9

Hi All,

 

I have input tables that contains duplicates. I need to remove the duplicate to keep a distinct list of the first instance of the record.

However I have discovered that at times I actually need to keep the second record in certain circumstances.

 

I usually match the records by Connection and Customer_ID. But if the there is a duplicate and the first instance doesn't have all the information I need to keep the second or subsequent record.

 

In the example below I show Customer_Id appearing 3 times. wice for Connection 12345, which I need to keep the second record, and also the Connection 12349 as it is a different Connection. 

 

I'm trying now to create test data but this is what I'm trying to achieve.

 

This is the data I have

 

DateConnectionCust_IdNamesqueue_statusEmployee_IDAgent__GroupFirst_Namelast_namestatus
14Mar2019 9:47:0012345AAAAPeachesNot Picked     
14Mar2019 9:54:0012345AAAAPeachesPickedAAA111FruitJaneDoeOrchard
14Mar2019 10:47:0012346BBBBPeachesPickedBBB222FruitJohnDoeOrchard
14Mar2019 10:54:0012346BBBBPeachesPickedCCC333FruitJohnDoeOrchard
14Mar2019 11:30:0012347CCCCPeachesPickedDDD444FruitKidDoeOrchard
14Mar2019 9:01:0012348DDDDPeachesNot Picked     
14Mar2019 11:54:0012349AAAAPeachesPickedEEE555FruitLittleDoeOrchard

 

 

This is the result I need

 

DateConnectionCust_IdNamesStatusEmployee_IDGroupFirst_Namelast_namestatus
14Mar2019 9:54:0012345AAAAPeachesPickedAAA111FruitJaneDoeOrchard
14Mar2019 10:47:0012346BBBBPeachesPickedBBB222FruitJohnDoeOrchard
14Mar2019 11:30:0012347CCCCPeachesPickedDDD444FruitKidDoeOrchard
14Mar2019 9:47:0012348DDDDPeachesNot Picked     
14Mar2019 11:54:0012349AAAAPeachesPickedEEE555FruitLittleDoeOrchard

 

 

Any help appreciated.

 

Cheers

 

Dean.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star
Here is an approach that can even use values from the third record or beyond if both the first and second contain missing values for the same variable.

proc sort data=have;
by connection cust_id descending date;
run:

data want;
update have (obs=0) have;
by connection cust_id;
run;

In the right situations, UPDATE is magical. You don't even need to know all the variable names.

View solution in original post

3 REPLIES 3
Astounding
PROC Star
Here is an approach that can even use values from the third record or beyond if both the first and second contain missing values for the same variable.

proc sort data=have;
by connection cust_id descending date;
run:

data want;
update have (obs=0) have;
by connection cust_id;
run;

In the right situations, UPDATE is magical. You don't even need to know all the variable names.
DME790
Pyrite | Level 9
Thanks Astounding - works a treat.
yabwon
Amethyst | Level 16

Hi,

 

maybe this one will help:

 

data have;
infile cards dlm="#";
input Date : datetime19.	Connection	Cust_Id :	$ Names : $ queue_status : $ 20. Employee_ID : $ Agent__Group: $ First_Name : $ last_name : $ status : $;
format Date datetime19.;	
cards;
14Mar2019 9:47:00#12345#AAAA#Peaches#Not Picked# # # # # 
14Mar2019 9:54:00#12345#AAAA#Peaches#Picked#AAA111#Fruit#Jane#Doe#Orchard
14Mar2019 10:47:00#12346#BBBB#Peaches#Picked#BBB222#Fruit#John#Doe#Orchard
14Mar2019 10:54:00#12346#BBBB#Peaches#Picked#CCC333#Fruit#John#Doe#Orchard
14Mar2019 11:30:00#12347#CCCC#Peaches#Picked#DDD444#Fruit#Kid#Doe#Orchard
14Mar2019 9:01:00#12348#DDDD#Peaches#Not Picked# # # # # 
14Mar2019 11:54:00#12349#AAAA#Peaches#Picked#EEE555#Fruit#Little#Doe#Orchard
;
run;

data want;
  /* check if group is duplicated */
  if 0 then set have(keep = Connection Cust_Id); 
  length _counter_ 8; drop _counter_;

  declare hash H();
  H.defineKey("Connection","Cust_Id");
  H.defineData("_counter_");
  H.defineDone();

  do until (eof);
    set have(keep = Connection Cust_Id) end = eof;
    by Connection Cust_Id;
    if first.Cust_Id then _counter_ = 0;
    _counter_ + 1;
    _IORC_ = h.replace();
  end;
   /*H.output(dataset:"H");*/
  
  eof = 0;
  /* apply selection logic */
  do until (eof);
    set have end = eof;
    by Connection Cust_Id;
    if first.cust_id then _stop_ = 0; drop _stop_;
    
    if _stop_ then continue;
    _IORC_ = h.find();
    _cMISS_ = cmiss(Agent__Group, First_Name, last_name, status); drop _cMISS_;

    select; 
      /* first in the group and everything is ok */
      when (first.Cust_Id and _cMISS_ eq 0) do; output; _stop_ = 1; end;
      /* only one is the group */
      when (first.Cust_Id and _counter_ = 1) do; output; _stop_ = 1; end;
      /* first wsnt ok check next */
      when ((not first.Cust_Id) and _cMISS_ eq 0) do; output; _stop_ = 1; end;
      /* nothing was ok and we are in the last output last */
      when (last.Cust_Id) do; output; _stop_ = 1; end;
      otherwise;
    end;

  end;
stop;
run;

 

all the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1757 views
  • 1 like
  • 3 in conversation