<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Count Consecutive Duplicates in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5611#M1779</link>
    <description>I was hoping somebody would raise that issue (duplicate "records" versus duplicate counter). That's why I showed the differing values for AGE. What I showed was how to create the cumulative counter. Since the request was for the counter (and NOT the removal of the duplicated rows), I didn't compare anything except the NAME column or mention using PROC SORT.&lt;BR /&gt;
&lt;BR /&gt;
My solution is only correct if you want a dup counter on duplicates for 1 variable only, but as Doc pointed out, if you want to set the counter for duplicate "records" then you would need to compare every field and set the counter accordingly. If I had taken NAME and AGE into account, then every row would have been unique.&lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
    <pubDate>Thu, 22 Nov 2007 17:52:23 GMT</pubDate>
    <dc:creator>Cynthia_sas</dc:creator>
    <dc:date>2007-11-22T17:52:23Z</dc:date>
    <item>
      <title>Count Consecutive Duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5607#M1775</link>
      <description>I have found some code to split duplicate records into a seperate query, but I'm looking for a way to count the consecutive duplicate rows in a query without splitting them into seperate datasets. For example, I'm looking for an output like the "Dupicate_Count" column below:&lt;BR /&gt;
&lt;BR /&gt;
Name      Duplicate_Count&lt;BR /&gt;
Mary        0&lt;BR /&gt;
Mary       1&lt;BR /&gt;
Mary       2&lt;BR /&gt;
Bob         0&lt;BR /&gt;
Bob         1&lt;BR /&gt;
Bob         2&lt;BR /&gt;
Bob         3&lt;BR /&gt;
Fred        0&lt;BR /&gt;
Fred        1&lt;BR /&gt;
Fred        2</description>
      <pubDate>Wed, 21 Nov 2007 16:17:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5607#M1775</guid>
      <dc:creator>Amfam1</dc:creator>
      <dc:date>2007-11-21T16:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: Count Consecutive Duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5608#M1776</link>
      <description>Hi:&lt;BR /&gt;
  I'm sure there is a way to do this in SQL, however, I can't think of it. I do have a DATA step solution to calculate a counter the way you want. First, I made some test data (called WORK.TESTDATA):&lt;BR /&gt;
[pre]&lt;BR /&gt;
Obs    Name       Age&lt;BR /&gt;
&lt;BR /&gt;
  1    Alfred      14&lt;BR /&gt;
  2    Alfred      19&lt;BR /&gt;
  3    Alfred      29&lt;BR /&gt;
  4    Alice       13&lt;BR /&gt;
  5    Alice       23&lt;BR /&gt;
  6    Barbara     13&lt;BR /&gt;
  7    Barbara     23&lt;BR /&gt;
  8    Carol       14&lt;BR /&gt;
  9    Carol       19&lt;BR /&gt;
 10    Carol       29&lt;BR /&gt;
 11    Henry       14&lt;BR /&gt;
 12    Henry       19&lt;BR /&gt;
 13    Henry       29&lt;BR /&gt;
 14    James       12&lt;BR /&gt;
 15    James       22&lt;BR /&gt;
 16    Jane        12&lt;BR /&gt;
 17    Jane        22&lt;BR /&gt;
 18    Janet       15&lt;BR /&gt;
 19    Janet       20&lt;BR /&gt;
 20    Janet       30&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
Next, I used this program to make the counter (called DUPCNT) and a copy of the original dataset with the DUPCNT variable added (data set is WORK.NEWTEST):&lt;BR /&gt;
[pre]&lt;BR /&gt;
data newtest;&lt;BR /&gt;
  retain dupcnt;&lt;BR /&gt;
  set testdata;&lt;BR /&gt;
  by name;&lt;BR /&gt;
  ** set counter to 0 for first name;&lt;BR /&gt;
  if first.name then dupcnt = 0;&lt;BR /&gt;
    &lt;BR /&gt;
  ** output the record;&lt;BR /&gt;
  output;&lt;BR /&gt;
   &lt;BR /&gt;
  ** increment the counter -- if next name is the same;&lt;BR /&gt;
  ** the new counter value will be used and output. Otherwise,;&lt;BR /&gt;
  ** if diff name, the counter value restarts at 0 for every new name;&lt;BR /&gt;
  ** the RETAIN statement keeps DUPCNT as a "running" total or cumulative counter;&lt;BR /&gt;
  dupcnt + 1;&lt;BR /&gt;
run;&lt;BR /&gt;
     &lt;BR /&gt;
proc print data=newtest;&lt;BR /&gt;
  title 'After creating Duplicate counter';&lt;BR /&gt;
  var name age dupcnt;&lt;BR /&gt;
run;&lt;BR /&gt;
   &lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
And the output looks like this:&lt;BR /&gt;
[pre]&lt;BR /&gt;
After creating Duplicate counter&lt;BR /&gt;
    &lt;BR /&gt;
Obs    Name       Age    dupcnt&lt;BR /&gt;
   &lt;BR /&gt;
  1    Alfred      14       0&lt;BR /&gt;
  2    Alfred      19       1&lt;BR /&gt;
  3    Alfred      29       2&lt;BR /&gt;
  4    Alice       13       0&lt;BR /&gt;
  5    Alice       23       1&lt;BR /&gt;
  6    Barbara     13       0&lt;BR /&gt;
  7    Barbara     23       1&lt;BR /&gt;
  8    Carol       14       0&lt;BR /&gt;
  9    Carol       19       1&lt;BR /&gt;
 10    Carol       29       2&lt;BR /&gt;
 11    Henry       14       0&lt;BR /&gt;
 12    Henry       19       1&lt;BR /&gt;
 13    Henry       29       2&lt;BR /&gt;
 14    James       12       0&lt;BR /&gt;
 15    James       22       1&lt;BR /&gt;
 16    Jane        12       0&lt;BR /&gt;
 17    Jane        22       1&lt;BR /&gt;
 18    Janet       15       0&lt;BR /&gt;
 19    Janet       20       1&lt;BR /&gt;
 20    Janet       30       2&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
If you need help with the DATA step program, you might consider contacting Tech Support. To use the program in EG, you'd have to put the code in a code node.&lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
      <pubDate>Wed, 21 Nov 2007 17:04:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5608#M1776</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2007-11-21T17:04:18Z</dc:date>
    </item>
    <item>
      <title>Re: Count Consecutive Duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5609#M1777</link>
      <description>One thing to always be aware of in removing or counting "duplicates" is the definition of "duplicate" and the process of obtaining it.  Your example and Cynthia's response will look at what are effectively duplicate KEYS, not duplicate RECORDS.  To look at duplicate RECORDS, you have to sort on all variables in the dataset, and compare on all values.  Recall this note from the PROC SORT documentation &lt;BR /&gt;
&lt;BR /&gt;
"Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations might remain in the output data set. You can remove all duplicates with this option by sorting on all variables."&lt;BR /&gt;
&lt;BR /&gt;
I believe SAS has had this "feature" since at least version 5.&lt;BR /&gt;
&lt;BR /&gt;
Doc Muhlbaier&lt;BR /&gt;
Duke</description>
      <pubDate>Thu, 22 Nov 2007 05:09:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5609#M1777</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2007-11-22T05:09:56Z</dc:date>
    </item>
    <item>
      <title>Re: Count Consecutive Duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5610#M1778</link>
      <description>as a comparative exercise here is some SQL&lt;BR /&gt;
It is a little different in the result which is discussed below.&lt;BR /&gt;
Method:  join/merge in summary counts &lt;BR /&gt;
something like&lt;BR /&gt;
[pre]proc sql;&lt;BR /&gt;
create table ages as&lt;BR /&gt;
select name, age, ages&lt;BR /&gt;
  from testdata&lt;BR /&gt;
  join ( select name as namect, count( distinct age) as ages&lt;BR /&gt;
           from testdata&lt;BR /&gt;
       group by name )&lt;BR /&gt;
    on name= namect&lt;BR /&gt;
      ;&lt;BR /&gt;
quit;[/pre]&lt;BR /&gt;
&lt;BR /&gt;
of course, not really having "row order" information in sql, it seems a bit more complex to replicate the increasing counter within name.&lt;BR /&gt;
It returns the rows of result in the order of the rows of testdata, so it might be useful that it doesn't need to be in any special order.&lt;BR /&gt;
Drop the "distinct" word if all within name should be counted instead of counting unique ages .&lt;BR /&gt;
&lt;BR /&gt;
PeterC</description>
      <pubDate>Thu, 22 Nov 2007 16:02:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5610#M1778</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2007-11-22T16:02:37Z</dc:date>
    </item>
    <item>
      <title>Re: Count Consecutive Duplicates</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5611#M1779</link>
      <description>I was hoping somebody would raise that issue (duplicate "records" versus duplicate counter). That's why I showed the differing values for AGE. What I showed was how to create the cumulative counter. Since the request was for the counter (and NOT the removal of the duplicated rows), I didn't compare anything except the NAME column or mention using PROC SORT.&lt;BR /&gt;
&lt;BR /&gt;
My solution is only correct if you want a dup counter on duplicates for 1 variable only, but as Doc pointed out, if you want to set the counter for duplicate "records" then you would need to compare every field and set the counter accordingly. If I had taken NAME and AGE into account, then every row would have been unique.&lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
      <pubDate>Thu, 22 Nov 2007 17:52:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Count-Consecutive-Duplicates/m-p/5611#M1779</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2007-11-22T17:52:23Z</dc:date>
    </item>
  </channel>
</rss>

