<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Conditional Deduping in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574650#M162425</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/234586"&gt;@A_SAS_Man&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am attempting to dedupe some data, but how each group is deduped requires some extra logic than just comparing a few columns. Below is an example of my starting data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
 input order line product $5. product_descriptor $5. dollars 4.;
 datalines;
1 10 line1 shoe 200
1 10 line2 shoe 200
1 10 line3 shoe 200
2 10 line1 shoe 200
2 10 line2 shoe 200
2 10 line3 shoe 200
3 10 line4 tops 300
3 10 line5 tape 200
4 10 line6 tots 200
4 10 line7 limo 800
5 10 line8 hats 500
5 10 line9 pens 500
 ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;There are essentially three rules I'm trying to build into the deduping process.&lt;/P&gt;
&lt;P&gt;1. If order and line match, keep the product and product descriptor with the most dollars&lt;/P&gt;
&lt;P&gt;2. If order, line and product descriptor match, choose any 1 data line to keep (doesn't matter which one).&lt;/P&gt;
&lt;P&gt;3. If order, line and dollars match, keep none of the lines.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If these rules are followed, it should produce a dataset like the below&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test2;
 input order line product $5. product_descriptor $5. dollars 4.;
 datalines;
1 10 line1 shoe 200
2 10 line1 shoe 200
3 10 line4 tops 300
4 10 line7 limo 800
 ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I am guessing I need to use some sort of macro to accomplish this? I've been playing around with a few ideas but so far I'm kind of stuck. Any help is appreciated!&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I think that you need to indicate which is the hierarchy, ie. which rule goes first. Since every one of the records that match rule 2 and 3 would also satisfy rule 1 you might get solutions that in effect only implement rule 1 as that is the one listed first. It actually appears that you might want 1 as the last rule to check after the other two. And you have a similar issue of which of rule 2 and 3 should actually be applied first.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Not macro should be needed but since you need to look at multiple rows simultaneously this is likely NOT going to be a single data step solution either.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 18 Jul 2019 15:52:53 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2019-07-18T15:52:53Z</dc:date>
    <item>
      <title>Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574645#M162422</link>
      <description>&lt;P&gt;I am attempting to dedupe some data, but how each group is deduped requires some extra logic than just comparing a few columns. Below is an example of my starting data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
 input order line product $5. product_descriptor $5. dollars 4.;
 datalines;
1 10 line1 shoe 200
1 10 line2 shoe 200
1 10 line3 shoe 200
2 10 line1 shoe 200
2 10 line2 shoe 200
2 10 line3 shoe 200
3 10 line4 tops 300
3 10 line5 tape 200
4 10 line6 tots 200
4 10 line7 limo 800
5 10 line8 hats 500
5 10 line9 pens 500
 ;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;There are essentially three rules I'm trying to build into the deduping process.&lt;/P&gt;&lt;P&gt;1. If order and line match, keep the product and product descriptor with the most dollars&lt;/P&gt;&lt;P&gt;2. If order, line, dollars and product descriptor match, choose any 1 data line to keep (doesn't matter which one).&lt;/P&gt;&lt;P&gt;3. If order, line and dollars match but product descriptor is different, keep none of the lines.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If these rules are followed, it should produce a dataset like the below&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test2;
 input order line product $5. product_descriptor $5. dollars 4.;
 datalines;
1 10 line1 shoe 200
2 10 line1 shoe 200
3 10 line4 tops 300
4 10 line7 limo 800
 ;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I am guessing I need to use some sort of macro to accomplish this? I've been playing around with a few ideas but so far I'm kind of stuck. Any help is appreciated!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Edit: I have added some more description to the three scenarios to illustrate how they differ.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 15:59:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574645#M162422</guid>
      <dc:creator>A_SAS_Man</dc:creator>
      <dc:date>2019-07-18T15:59:23Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574648#M162423</link>
      <description>&lt;P&gt;If it doesn't matter which one is chosen in (2), could the logic then be the same as in (1)?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 15:49:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574648#M162423</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-18T15:49:36Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574649#M162424</link>
      <description>Sorry, I had a typo on what two is. It is a slightly different scenario than one, please see updated post.</description>
      <pubDate>Thu, 18 Jul 2019 15:51:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574649#M162424</guid>
      <dc:creator>A_SAS_Man</dc:creator>
      <dc:date>2019-07-18T15:51:29Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574650#M162425</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/234586"&gt;@A_SAS_Man&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am attempting to dedupe some data, but how each group is deduped requires some extra logic than just comparing a few columns. Below is an example of my starting data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
 input order line product $5. product_descriptor $5. dollars 4.;
 datalines;
1 10 line1 shoe 200
1 10 line2 shoe 200
1 10 line3 shoe 200
2 10 line1 shoe 200
2 10 line2 shoe 200
2 10 line3 shoe 200
3 10 line4 tops 300
3 10 line5 tape 200
4 10 line6 tots 200
4 10 line7 limo 800
5 10 line8 hats 500
5 10 line9 pens 500
 ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;There are essentially three rules I'm trying to build into the deduping process.&lt;/P&gt;
&lt;P&gt;1. If order and line match, keep the product and product descriptor with the most dollars&lt;/P&gt;
&lt;P&gt;2. If order, line and product descriptor match, choose any 1 data line to keep (doesn't matter which one).&lt;/P&gt;
&lt;P&gt;3. If order, line and dollars match, keep none of the lines.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If these rules are followed, it should produce a dataset like the below&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test2;
 input order line product $5. product_descriptor $5. dollars 4.;
 datalines;
1 10 line1 shoe 200
2 10 line1 shoe 200
3 10 line4 tops 300
4 10 line7 limo 800
 ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I am guessing I need to use some sort of macro to accomplish this? I've been playing around with a few ideas but so far I'm kind of stuck. Any help is appreciated!&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I think that you need to indicate which is the hierarchy, ie. which rule goes first. Since every one of the records that match rule 2 and 3 would also satisfy rule 1 you might get solutions that in effect only implement rule 1 as that is the one listed first. It actually appears that you might want 1 as the last rule to check after the other two. And you have a similar issue of which of rule 2 and 3 should actually be applied first.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Not macro should be needed but since you need to look at multiple rows simultaneously this is likely NOT going to be a single data step solution either.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 15:52:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574650#M162425</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-07-18T15:52:53Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574652#M162427</link>
      <description>See updated post, I think the three scenarios should be distinct here. But if a hierarchy was needed I would want 1 to be the one used whenever possible, then 2, then 3.</description>
      <pubDate>Thu, 18 Jul 2019 15:55:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574652#M162427</guid>
      <dc:creator>A_SAS_Man</dc:creator>
      <dc:date>2019-07-18T15:55:37Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574662#M162434</link>
      <description>&lt;P&gt;Ok. Does this meet your needs?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=test out=temp;
   by order line product_descriptor dollars;
run;

data want;
   set test;
   by order;
   if first.order then do;
      _line=line;
      _pd=product_descriptor;
      _dollars=dollars;
   end;

   if last.order then do;
      if _line=line &amp;amp; _dollars=dollars &amp;amp; product_descriptor ne _pd then return;
      else output;
   end;
   retain _:;
   drop _:;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jul 2019 16:13:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574662#M162434</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-18T16:13:02Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574666#M162436</link>
      <description>&lt;P&gt;So it definitely looks like it is getting close, I think maybe my descriptions for each scenario were not precise enough so I apologize if my edit is what messed you up. The difference is it looks like for order 3, your logic is keeping product line5 and product_descriptor "tape" when it should be retaining product line4 and "tops" (due to that line having a higher dollar amount).&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 16:24:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574666#M162436</guid>
      <dc:creator>A_SAS_Man</dc:creator>
      <dc:date>2019-07-18T16:24:01Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574669#M162438</link>
      <description>&lt;P&gt;Sorry, I used the wrong data in my data step. How about now?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=test out=temp;
   by order dollars;
run;

data want;
   set temp;
   by order;
   if first.order then do;
      _line=line;
      _pd=product_descriptor;
      _dollars=dollars;
   end;

   if last.order then do;
      if _line=line &amp;amp; _dollars=dollars &amp;amp; product_descriptor ne _pd then return;
      else output;
   end;
   retain _:;
   drop _:;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jul 2019 16:30:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574669#M162438</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-18T16:30:08Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574676#M162441</link>
      <description>That does the trick, thank you!</description>
      <pubDate>Thu, 18 Jul 2019 16:58:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574676#M162441</guid>
      <dc:creator>A_SAS_Man</dc:creator>
      <dc:date>2019-07-18T16:58:27Z</dc:date>
    </item>
    <item>
      <title>Re: Conditional Deduping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574738#M162456</link>
      <description>&lt;P&gt;Anytime, glad to help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 19:15:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Conditional-Deduping/m-p/574738#M162456</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-18T19:15:07Z</dc:date>
    </item>
  </channel>
</rss>

