<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delete duplicate records. in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8305#M284</link>
    <description>Patrick,&lt;BR /&gt;
&lt;BR /&gt;
Your code will display the cases with duplicate IDs, but does not address duplicate content, which is what I read Q2 to ask.&lt;BR /&gt;
&lt;BR /&gt;
Doc Muhlbaier&lt;BR /&gt;
Duke</description>
    <pubDate>Sun, 25 Apr 2010 13:37:56 GMT</pubDate>
    <dc:creator>Doc_Duke</dc:creator>
    <dc:date>2010-04-25T13:37:56Z</dc:date>
    <item>
      <title>Delete duplicate records.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8302#M281</link>
      <description>proc sort data=mydataset nodupkey;&lt;BR /&gt;
by _all_;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
Q1: Will there be no identical records in mydataset after execution of the above code; without any exception?&lt;BR /&gt;
&lt;BR /&gt;
Q2: Is there any procedure to get a report on how many identical records there is &lt;B&gt; for each&lt;/B&gt; ID, where ID is a primary key in mydataset? (Yes, of course run before the above code.)</description>
      <pubDate>Thu, 22 Apr 2010 18:36:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8302#M281</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-04-22T18:36:50Z</dc:date>
    </item>
    <item>
      <title>Re: Delete duplicate records.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8303#M282</link>
      <description>My observation is to say "yes" to Q1.  As for Q2, I suggest you review the use of DUPOUT= (with PROC SORT), if you need to do some analysis, whether you need to generate a report - there is no built-in SAS "report" for analyzing with a duplicate-data condition.&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.</description>
      <pubDate>Thu, 22 Apr 2010 20:32:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8303#M282</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-04-22T20:32:52Z</dc:date>
    </item>
    <item>
      <title>Re: Delete duplicate records.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8304#M283</link>
      <description>For Q2:&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
  select id,count(*) as Duplicates&lt;BR /&gt;
    from mydataset&lt;BR /&gt;
    group by id&lt;BR /&gt;
      having count(*)&amp;gt;1&lt;BR /&gt;
  ;&lt;BR /&gt;
quit;</description>
      <pubDate>Sun, 25 Apr 2010 03:45:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8304#M283</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2010-04-25T03:45:55Z</dc:date>
    </item>
    <item>
      <title>Re: Delete duplicate records.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8305#M284</link>
      <description>Patrick,&lt;BR /&gt;
&lt;BR /&gt;
Your code will display the cases with duplicate IDs, but does not address duplicate content, which is what I read Q2 to ask.&lt;BR /&gt;
&lt;BR /&gt;
Doc Muhlbaier&lt;BR /&gt;
Duke</description>
      <pubDate>Sun, 25 Apr 2010 13:37:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8305#M284</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2010-04-25T13:37:56Z</dc:date>
    </item>
    <item>
      <title>Re: Delete duplicate records.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8306#M285</link>
      <description>Ernesto&lt;BR /&gt;
&lt;BR /&gt;
Duke is of course right. The code I provided counts only duplicate ID's.&lt;BR /&gt;
&lt;BR /&gt;
The code below shows/counts now duplicate records.&lt;BR /&gt;
&lt;BR /&gt;
What confuses me:&lt;BR /&gt;
ID is defined as primary key and should therefore be unique. &lt;BR /&gt;
Just removing all duplicate records won't solve all possible issues with duplicate ID's. There could even be different sets of duplicate records per id as&lt;BR /&gt;
the sample data illustrates.&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
data mydataset;&lt;BR /&gt;
  keep id var1-var5;&lt;BR /&gt;
  length id 8;&lt;BR /&gt;
  array var {5} 8;&lt;BR /&gt;
  do id=1 to 2;&lt;BR /&gt;
    do i=1 to 20;&lt;BR /&gt;
      do j=1 to dim(var);&lt;BR /&gt;
        var{j}=round(ranuni(1));&lt;BR /&gt;
      end;&lt;BR /&gt;
      output;&lt;BR /&gt;
    end;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
/* variant1: count before duplicates got removed */&lt;BR /&gt;
proc sql;&lt;BR /&gt;
  select NAME INTO :col_list separated by ','&lt;BR /&gt;
    from dictionary.columns&lt;BR /&gt;
      where libname='WORK' and memname='MYDATASET';&lt;BR /&gt;
    ;&lt;BR /&gt;
  select count(*) as Duplicates,*&lt;BR /&gt;
    from mydataset&lt;BR /&gt;
    group by &amp;amp;col_list&lt;BR /&gt;
    having count(*)&amp;gt;1&lt;BR /&gt;
;&lt;BR /&gt;
quit; &lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
/* variant2: count after duplicates got removed */&lt;BR /&gt;
proc sort data=mydataset out=Unique_mydataset equals nodupkey dupout=Dup_mydataset;&lt;BR /&gt;
  by _all_;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
  select name into :col_list separated by ','&lt;BR /&gt;
    from dictionary.columns&lt;BR /&gt;
      where libname='WORK' and memname='Dup_mydataset';&lt;BR /&gt;
    ;&lt;BR /&gt;
  select count(*)+1 as Duplicates,*&lt;BR /&gt;
    from Dup_mydataset&lt;BR /&gt;
    group by &amp;amp;col_list&lt;BR /&gt;
;&lt;BR /&gt;
quit; &lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
HTH&lt;BR /&gt;
Patrick</description>
      <pubDate>Mon, 26 Apr 2010 02:42:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8306#M285</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2010-04-26T02:42:16Z</dc:date>
    </item>
    <item>
      <title>Re: Delete duplicate records.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8307#M286</link>
      <description>Correction: &lt;B&gt;ID is not a primary key.&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
Thank's for all answers.</description>
      <pubDate>Mon, 26 Apr 2010 19:34:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delete-duplicate-records/m-p/8307#M286</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-04-26T19:34:53Z</dc:date>
    </item>
  </channel>
</rss>

