<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: remove duplicates based on variables in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551307#M33467</link>
    <description>Here is an approach that can even use values from the third record or beyond if both the first and second contain missing values for the same variable.&lt;BR /&gt;&lt;BR /&gt;proc sort data=have;&lt;BR /&gt;by connection cust_id descending date;&lt;BR /&gt;run:&lt;BR /&gt;&lt;BR /&gt;data want;&lt;BR /&gt;update have (obs=0) have;&lt;BR /&gt;by connection cust_id;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;In the right situations, UPDATE is magical.  You don't even need to know all the variable names.</description>
    <pubDate>Tue, 16 Apr 2019 06:21:06 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2019-04-16T06:21:06Z</dc:date>
    <item>
      <title>remove duplicates based on variables</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551297#M33466</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have input tables that contains duplicates. I need to remove the duplicate to keep a distinct list of the first instance of the record.&lt;/P&gt;&lt;P&gt;However I have discovered that at times I actually need to keep the second record in certain circumstances.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I usually match the records by Connection and Customer_ID. But if the there is a duplicate and the first instance doesn't have all the information I need to keep the second or subsequent record.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the example below I show Customer_Id appearing 3 times. wice for Connection 12345, which I need to keep the second record, and also the Connection 12349 as it is a different Connection.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying now to create test data but this is what I'm trying to achieve.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the data I have&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Date&lt;/TD&gt;&lt;TD&gt;Connection&lt;/TD&gt;&lt;TD&gt;Cust_Id&lt;/TD&gt;&lt;TD&gt;Names&lt;/TD&gt;&lt;TD&gt;queue_status&lt;/TD&gt;&lt;TD&gt;Employee_ID&lt;/TD&gt;&lt;TD&gt;Agent__Group&lt;/TD&gt;&lt;TD&gt;First_Name&lt;/TD&gt;&lt;TD&gt;last_name&lt;/TD&gt;&lt;TD&gt;status&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 9:47:00&lt;/TD&gt;&lt;TD&gt;12345&lt;/TD&gt;&lt;TD&gt;AAAA&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Not Picked&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 9:54:00&lt;/TD&gt;&lt;TD&gt;12345&lt;/TD&gt;&lt;TD&gt;AAAA&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;AAA111&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;Jane&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 10:47:00&lt;/TD&gt;&lt;TD&gt;12346&lt;/TD&gt;&lt;TD&gt;BBBB&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;BBB222&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 10:54:00&lt;/TD&gt;&lt;TD&gt;12346&lt;/TD&gt;&lt;TD&gt;BBBB&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;CCC333&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 11:30:00&lt;/TD&gt;&lt;TD&gt;12347&lt;/TD&gt;&lt;TD&gt;CCCC&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;DDD444&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;Kid&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 9:01:00&lt;/TD&gt;&lt;TD&gt;12348&lt;/TD&gt;&lt;TD&gt;DDDD&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Not Picked&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 11:54:00&lt;/TD&gt;&lt;TD&gt;12349&lt;/TD&gt;&lt;TD&gt;AAAA&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;EEE555&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;Little&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the result I need&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Date&lt;/TD&gt;&lt;TD&gt;Connection&lt;/TD&gt;&lt;TD&gt;Cust_Id&lt;/TD&gt;&lt;TD&gt;Names&lt;/TD&gt;&lt;TD&gt;Status&lt;/TD&gt;&lt;TD&gt;Employee_ID&lt;/TD&gt;&lt;TD&gt;Group&lt;/TD&gt;&lt;TD&gt;First_Name&lt;/TD&gt;&lt;TD&gt;last_name&lt;/TD&gt;&lt;TD&gt;status&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 9:54:00&lt;/TD&gt;&lt;TD&gt;12345&lt;/TD&gt;&lt;TD&gt;AAAA&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;AAA111&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;Jane&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 10:47:00&lt;/TD&gt;&lt;TD&gt;12346&lt;/TD&gt;&lt;TD&gt;BBBB&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;BBB222&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 11:30:00&lt;/TD&gt;&lt;TD&gt;12347&lt;/TD&gt;&lt;TD&gt;CCCC&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;DDD444&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;Kid&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 9:47:00&lt;/TD&gt;&lt;TD&gt;12348&lt;/TD&gt;&lt;TD&gt;DDDD&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Not Picked&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14Mar2019 11:54:00&lt;/TD&gt;&lt;TD&gt;12349&lt;/TD&gt;&lt;TD&gt;AAAA&lt;/TD&gt;&lt;TD&gt;Peaches&lt;/TD&gt;&lt;TD&gt;Picked&lt;/TD&gt;&lt;TD&gt;EEE555&lt;/TD&gt;&lt;TD&gt;Fruit&lt;/TD&gt;&lt;TD&gt;Little&lt;/TD&gt;&lt;TD&gt;Doe&lt;/TD&gt;&lt;TD&gt;Orchard&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dean.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Apr 2019 05:27:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551297#M33466</guid>
      <dc:creator>DME790</dc:creator>
      <dc:date>2019-04-16T05:27:46Z</dc:date>
    </item>
    <item>
      <title>Re: remove duplicates based on variables</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551307#M33467</link>
      <description>Here is an approach that can even use values from the third record or beyond if both the first and second contain missing values for the same variable.&lt;BR /&gt;&lt;BR /&gt;proc sort data=have;&lt;BR /&gt;by connection cust_id descending date;&lt;BR /&gt;run:&lt;BR /&gt;&lt;BR /&gt;data want;&lt;BR /&gt;update have (obs=0) have;&lt;BR /&gt;by connection cust_id;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;In the right situations, UPDATE is magical.  You don't even need to know all the variable names.</description>
      <pubDate>Tue, 16 Apr 2019 06:21:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551307#M33467</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-04-16T06:21:06Z</dc:date>
    </item>
    <item>
      <title>Re: remove duplicates based on variables</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551317#M33469</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;maybe this one will help:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards dlm="#";
input Date : datetime19.	Connection	Cust_Id :	$ Names : $ queue_status : $ 20. Employee_ID : $ Agent__Group: $ First_Name : $ last_name : $ status : $;
format Date datetime19.;	
cards;
14Mar2019 9:47:00#12345#AAAA#Peaches#Not Picked# # # # # 
14Mar2019 9:54:00#12345#AAAA#Peaches#Picked#AAA111#Fruit#Jane#Doe#Orchard
14Mar2019 10:47:00#12346#BBBB#Peaches#Picked#BBB222#Fruit#John#Doe#Orchard
14Mar2019 10:54:00#12346#BBBB#Peaches#Picked#CCC333#Fruit#John#Doe#Orchard
14Mar2019 11:30:00#12347#CCCC#Peaches#Picked#DDD444#Fruit#Kid#Doe#Orchard
14Mar2019 9:01:00#12348#DDDD#Peaches#Not Picked# # # # # 
14Mar2019 11:54:00#12349#AAAA#Peaches#Picked#EEE555#Fruit#Little#Doe#Orchard
;
run;

data want;
  /* check if group is duplicated */
  if 0 then set have(keep = Connection Cust_Id); 
  length _counter_ 8; drop _counter_;

  declare hash H();
  H.defineKey("Connection","Cust_Id");
  H.defineData("_counter_");
  H.defineDone();

  do until (eof);
    set have(keep = Connection Cust_Id) end = eof;
    by Connection Cust_Id;
    if first.Cust_Id then _counter_ = 0;
    _counter_ + 1;
    _IORC_ = h.replace();
  end;
   /*H.output(dataset:"H");*/
  
  eof = 0;
  /* apply selection logic */
  do until (eof);
    set have end = eof;
    by Connection Cust_Id;
    if first.cust_id then _stop_ = 0; drop _stop_;
    
    if _stop_ then continue;
    _IORC_ = h.find();
    _cMISS_ = cmiss(Agent__Group, First_Name, last_name, status); drop _cMISS_;

    select; 
      /* first in the group and everything is ok */
      when (first.Cust_Id and _cMISS_ eq 0) do; output; _stop_ = 1; end;
      /* only one is the group */
      when (first.Cust_Id and _counter_ = 1) do; output; _stop_ = 1; end;
      /* first wsnt ok check next */
      when ((not first.Cust_Id) and _cMISS_ eq 0) do; output; _stop_ = 1; end;
      /* nothing was ok and we are in the last output last */
      when (last.Cust_Id) do; output; _stop_ = 1; end;
      otherwise;
    end;

  end;
stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;all the best&lt;/P&gt;&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 16 Apr 2019 07:51:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/551317#M33469</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2019-04-16T07:51:47Z</dc:date>
    </item>
    <item>
      <title>Re: remove duplicates based on variables</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/553104#M33528</link>
      <description>Thanks Astounding - works a treat.</description>
      <pubDate>Tue, 23 Apr 2019 01:51:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/remove-duplicates-based-on-variables/m-p/553104#M33528</guid>
      <dc:creator>DME790</dc:creator>
      <dc:date>2019-04-23T01:51:19Z</dc:date>
    </item>
  </channel>
</rss>

