<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicates list in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692656#M37335</link>
    <description>&lt;P&gt;A small (but realistic) example would be helpful. Please follow &lt;A href="https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/" target="_self"&gt;these instructions&lt;/A&gt; when providing data. Do not provide data as screen captures or as attachments. More context about the problem would also be helpful.&lt;/P&gt;</description>
    <pubDate>Mon, 19 Oct 2020 18:18:22 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2020-10-19T18:18:22Z</dc:date>
    <item>
      <title>Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692655#M37334</link>
      <description>Hey there,&lt;BR /&gt;&lt;BR /&gt;Novice to SAS so bear with me.&lt;BR /&gt;&lt;BR /&gt;I have a large data set where I am trying to create a new data set of duplicates. I am using postcode as the duplicate but then I want to bring in if the date of birth field also matches on the row.&lt;BR /&gt;&lt;BR /&gt;I tried a proc sort nodupkey but I think it’s just give me a clean list and then a duplicates list. I just want to see the ones which have duplicated on both criteria in a data set.&lt;BR /&gt;&lt;BR /&gt;Any help would be greatly appreciated.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;</description>
      <pubDate>Mon, 19 Oct 2020 18:15:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692655#M37334</guid>
      <dc:creator>danhopkinslewis</dc:creator>
      <dc:date>2020-10-19T18:15:46Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692656#M37335</link>
      <description>&lt;P&gt;A small (but realistic) example would be helpful. Please follow &lt;A href="https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/" target="_self"&gt;these instructions&lt;/A&gt; when providing data. Do not provide data as screen captures or as attachments. More context about the problem would also be helpful.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Oct 2020 18:18:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692656#M37335</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-10-19T18:18:22Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692678#M37337</link>
      <description>&lt;P&gt;One way that might get you started:&lt;/P&gt;
&lt;PRE&gt;Proc freq data=have noprint;
  tables var1*var2 / out=want(where=(count&amp;gt;1) drop=percent) ;
run;&lt;/PRE&gt;
&lt;P&gt;The noprint option suppresses the normal proc freq output. The out= creates a data set named Want with the levels of the variable combinations and the count, and only when the count is more than 1, i.e. duplicates.&lt;/P&gt;
&lt;P&gt;If you want to treat a missing value as a valid combination add the keyword MISSING after the / .&lt;/P&gt;</description>
      <pubDate>Mon, 19 Oct 2020 19:25:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692678#M37337</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-10-19T19:25:22Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692768#M37339</link>
      <description>&lt;P&gt;Thanks. So if I wanted to bring in all the other variable columns but not include them in the count, how would I do that?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Oct 2020 07:31:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692768#M37339</guid>
      <dc:creator>danhopkinslewis</dc:creator>
      <dc:date>2020-10-20T07:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692769#M37340</link>
      <description>&lt;P&gt;Sorry, unable to submit data as its sensitive.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Oct 2020 07:32:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692769#M37340</guid>
      <dc:creator>danhopkinslewis</dc:creator>
      <dc:date>2020-10-20T07:32:41Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692807#M37342</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/351225"&gt;@danhopkinslewis&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Sorry, unable to submit data as its sensitive.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Make up some data, as long as it represents the problem.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Oct 2020 10:36:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692807#M37342</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-10-20T10:36:55Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692808#M37343</link>
      <description>&lt;P&gt;Then make up some fake data that illustrates your issue. Just make sure that variable types and other attributes (length, format) are the same.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Oct 2020 10:38:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/692808#M37343</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-10-20T10:38:33Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates list</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/694602#M37383</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/351225"&gt;@danhopkinslewis&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Thanks. So if I wanted to bring in all the other variable columns but not include them in the count, how would I do that?&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Do ALL of the other variables where you have duplicates have the same values?&lt;/P&gt;
&lt;P&gt;If not you will need to decide which ones you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the idea is to add a code that the variable combination is a duplicate then you could merge this back on the original data.&lt;/P&gt;
&lt;P&gt;Again pseudo code because you haven't mentioned names of data sets or variables: The below renames the count to indicate that it is indeed a duplicate count.&lt;/P&gt;
&lt;PRE&gt;Proc sort data=have;
   by var1 var2;
run;

data final;
   merge have want(  rename=(count=dupecount) );&lt;BR /&gt;   by var1 var2;
run;&lt;/PRE&gt;
&lt;P&gt;If you want a simple flag for "this record is part of a duplicate set" you could add something like:&lt;/P&gt;
&lt;P&gt;Flag = (dupecount&amp;gt;1);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS will treat the result of logical comparisons as a numeric one when true and zero when false.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Other logic could be used depending on what you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2020 15:04:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Duplicates-list/m-p/694602#M37383</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-10-27T15:04:15Z</dc:date>
    </item>
  </channel>
</rss>

