<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Find Duplicates Based On Multiple Fields No Equal In One Field in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480040#M286576</link>
    <description>&lt;P&gt;You could also break it into multiple processing steps.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Group by the 20 variables counting the number of PID's in each set&lt;/LI&gt;&lt;LI&gt;Inner join that table to the original on the 20 variables, where the number of PID's is &amp;gt; 1&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token procnames"&gt;proc&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;sql&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
   create &lt;SPAN class="token statement"&gt;table&lt;/SPAN&gt; countGroups as 
       &lt;SPAN class="token statement"&gt;select&lt;/SPAN&gt; &lt;SPAN class="token punctuation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;20&lt;/SPAN&gt; &lt;SPAN class="token keyword"&gt;columns&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;}&lt;/SPAN&gt;
          &lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token function"&gt;count&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;(distinct PID&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; as numPIDs
       &lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt; have
       &lt;SPAN class="token keyword"&gt;group&lt;/SPAN&gt; &lt;SPAN class="token statement"&gt;by&lt;/SPAN&gt; &lt;SPAN class="token punctuation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;20&lt;/SPAN&gt; &lt;SPAN class="token keyword"&gt;columns&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;}&lt;/SPAN&gt;
   &lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;SPAN class="token procnames"&gt;quit&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;BR /&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;   create table want as &lt;BR /&gt;      select t1.*&lt;BR /&gt;      from have as t1&lt;BR /&gt;      inner join countGroups as t2&lt;BR /&gt;         on t1.column1 = t2.column1&lt;BR /&gt;            and t1.column2 = t2.column2 &lt;BR /&gt;            ... &lt;BR /&gt;            and t1.column20 = t2.column20&lt;BR /&gt;      where t2.numPIDs &amp;gt; 1&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 20 Jul 2018 19:26:45 GMT</pubDate>
    <dc:creator>Urban_Science</dc:creator>
    <dc:date>2018-07-20T19:26:45Z</dc:date>
    <item>
      <title>Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480003#M286572</link>
      <description>&lt;P&gt;I have a fairly large table (30 fields by ~10M+ rows).&amp;nbsp; I need to find duplicates based on a grouping of 20 of the fields.&amp;nbsp; I was able to do this with:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
     create table count as
     select *, 1 as line
     from orig_data
;
     create table dup_id as
     select *, sum(line) as linesum
     from count
     group by {list 20 fields}
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any rows with linesum&amp;gt;1&amp;nbsp;are flagged to have dups based on the 20 field grouping and I could check them out.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My issue is that this group by statement does not exclude any of the other 10 fields from the table, within the grouping.&amp;nbsp; For 9 of them that is fine, but I need identify a grouping where one particular field (PIN) is different from the rest of the grouping above.&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;So essentially, I need a way to identify where those 20 fields are identical but PIN is different.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think I can accomplish this with a sort and data step using first. logic, but I wanted to see if this is the preferred approach?&amp;nbsp; Is there another way?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 18:11:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480003#M286572</guid>
      <dc:creator>Ins_Analytic</dc:creator>
      <dc:date>2018-07-20T18:11:12Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480018#M286573</link>
      <description>&lt;P&gt;I'm not a fan of clever solutions, but here I go... you could try&amp;nbsp;abusing&amp;nbsp;proc sql with the combination of merging statistics with a having clause.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   create table want as 
       select {20 columns}
          , PID
          , count(*) as numRows
       from have
       group by {20 columns}
       having numRows &amp;gt; 1
   ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;That should return a table with all sets of 20 columns that are duplicated across multiple columns.&amp;nbsp; It will throw a warning about having&amp;nbsp;PID in the select statement and not in the group by, but it should still run.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 18:44:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480018#M286573</guid>
      <dc:creator>Urban_Science</dc:creator>
      <dc:date>2018-07-20T18:44:17Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480033#M286574</link>
      <description>&lt;P&gt;Thank you for your response, I appreciate you looking into this problem!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am still getting the final grouping with the same PINs, unfortunately.&amp;nbsp; For my results, I'm looking for some way to identify any lines that&amp;nbsp;are identical&amp;nbsp;based on the group of 20 columns but have different PINs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 19:03:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480033#M286574</guid>
      <dc:creator>Ins_Analytic</dc:creator>
      <dc:date>2018-07-20T19:03:54Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480036#M286575</link>
      <description>&lt;P&gt;Something like this maybe? Test it on some small data sets and let me know if it works.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table want as
select &amp;lt;group 20 variables&amp;gt; , case when max(pin) ne min(pin) then 'Different Pins' else 'Same Pins' end as status
from have
group by &amp;lt;group 20 variables&amp;gt;
having count(*) &amp;gt; 1;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 19:07:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480036#M286575</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-07-20T19:07:20Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480040#M286576</link>
      <description>&lt;P&gt;You could also break it into multiple processing steps.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Group by the 20 variables counting the number of PID's in each set&lt;/LI&gt;&lt;LI&gt;Inner join that table to the original on the 20 variables, where the number of PID's is &amp;gt; 1&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token procnames"&gt;proc&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;sql&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
   create &lt;SPAN class="token statement"&gt;table&lt;/SPAN&gt; countGroups as 
       &lt;SPAN class="token statement"&gt;select&lt;/SPAN&gt; &lt;SPAN class="token punctuation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;20&lt;/SPAN&gt; &lt;SPAN class="token keyword"&gt;columns&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;}&lt;/SPAN&gt;
          &lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token function"&gt;count&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;(distinct PID&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; as numPIDs
       &lt;SPAN class="token keyword"&gt;from&lt;/SPAN&gt; have
       &lt;SPAN class="token keyword"&gt;group&lt;/SPAN&gt; &lt;SPAN class="token statement"&gt;by&lt;/SPAN&gt; &lt;SPAN class="token punctuation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;20&lt;/SPAN&gt; &lt;SPAN class="token keyword"&gt;columns&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;}&lt;/SPAN&gt;
   &lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;SPAN class="token procnames"&gt;quit&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;BR /&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;   create table want as &lt;BR /&gt;      select t1.*&lt;BR /&gt;      from have as t1&lt;BR /&gt;      inner join countGroups as t2&lt;BR /&gt;         on t1.column1 = t2.column1&lt;BR /&gt;            and t1.column2 = t2.column2 &lt;BR /&gt;            ... &lt;BR /&gt;            and t1.column20 = t2.column20&lt;BR /&gt;      where t2.numPIDs &amp;gt; 1&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jul 2018 19:26:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480040#M286576</guid>
      <dc:creator>Urban_Science</dc:creator>
      <dc:date>2018-07-20T19:26:45Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480301#M286577</link>
      <description>&lt;P&gt;Unsure I fully understand your need. Like this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data HAVE;
  GR1=1; GR20=1; PIN=1; output;
  GR1=1; GR20=1; PIN=1; output;
  GR1=2; GR20=1; PIN=1; output;
  GR1=2; GR20=1; PIN=2; output;
  GR1=3; GR20=1; PIN=1; output;
run;
proc sql;
  create table WANT as
  select GR1, GR20, count(distinct PIN) as NBPINS
  from HAVE
  group by GR1, GR20
  having count(*) &amp;gt; 1;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;TABLE style="border-collapse: collapse; width: 144pt;" border="0" width="192" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="64" height="20" style="height: 15.0pt; width: 48pt;"&gt;GR1&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;GR20&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;NBPINS&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;1&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;2&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jul 2018 23:25:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480301#M286577</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-07-22T23:25:46Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480338#M286578</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/206296"&gt;@Ins_Analytic&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;If your source data is stored in a SAS table and not a data base then below code should be o.k.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=inter;
  by &amp;lt;col01 to col20&amp;gt; pin;
run;

data want;
  set inter;
  by &amp;lt;col01 to col20&amp;gt; pin;
  /* rows with identical col1 to col20 */
  if not (first.&amp;lt;col20&amp;gt; and last.&amp;lt;col20&amp;gt;) then
    do;
      /* rows with unique pin's within "identical" rows */
      if first.pin and last.pin then output;
    end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 23 Jul 2018 03:35:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480338#M286578</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-07-23T03:35:15Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480775#M286579</link>
      <description>&lt;P&gt;I need to find your top 2 lines where everything is the same.&amp;nbsp; I need to know that CID (part of 20 grouping fields) has a duplicate on those 20 fields, with the same PIN as it's duplicate.&amp;nbsp; So I need to extract the CID of the lines with duplicates on the 20 field group by, with the same PIN.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 12:26:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480775#M286579</guid>
      <dc:creator>Ins_Analytic</dc:creator>
      <dc:date>2018-07-24T12:26:12Z</dc:date>
    </item>
    <item>
      <title>Re: Find Duplicates Based On Multiple Fields No Equal In One Field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480958#M286580</link>
      <description>&lt;P&gt;I honestly think any of these solutions could have worked if I was savvy enough.&amp;nbsp; For the structure of the data, this is the one that worked best for me.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you everyone for the suggestions, I really appreciate the help.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 22:37:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-Duplicates-Based-On-Multiple-Fields-No-Equal-In-One-Field/m-p/480958#M286580</guid>
      <dc:creator>Ins_Analytic</dc:creator>
      <dc:date>2018-07-24T22:37:37Z</dc:date>
    </item>
  </channel>
</rss>

