<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic flagging duplicate data in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584700#M166570</link>
    <description>&lt;P&gt;I have data where subjects come in for three separate visits during a study to track health issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to flag data that repeats the same issues for the same visit date of each subject.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;EX:&lt;/P&gt;&lt;P&gt;Sub&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Visit&amp;nbsp; &amp;nbsp; Issue&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;D&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Subject 2's second visit should be flagged because there are 2 issues C's posted in the data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Aug 2019 18:08:08 GMT</pubDate>
    <dc:creator>Rcrowder</dc:creator>
    <dc:date>2019-08-28T18:08:08Z</dc:date>
    <item>
      <title>flagging duplicate data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584700#M166570</link>
      <description>&lt;P&gt;I have data where subjects come in for three separate visits during a study to track health issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to flag data that repeats the same issues for the same visit date of each subject.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;EX:&lt;/P&gt;&lt;P&gt;Sub&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Visit&amp;nbsp; &amp;nbsp; Issue&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;D&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Subject 2's second visit should be flagged because there are 2 issues C's posted in the data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 18:08:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584700#M166570</guid>
      <dc:creator>Rcrowder</dc:creator>
      <dc:date>2019-08-28T18:08:08Z</dc:date>
    </item>
    <item>
      <title>Re: flagging duplicate data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584707#M166573</link>
      <description>&lt;P&gt;Can you please post the expected output sample for the input sample? It leads me/perhaps some of us to assumptions that we can avoid&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 18:16:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584707#M166573</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-08-28T18:16:50Z</dc:date>
    </item>
    <item>
      <title>Re: flagging duplicate data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584724#M166586</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/281903"&gt;@Rcrowder&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I have data where subjects come in for three separate visits during a study to track health issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I need to flag data that repeats the same issues for the same visit date of each subject.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;EX:&lt;/P&gt;
&lt;P&gt;Sub&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Visit&amp;nbsp; &amp;nbsp; Issue&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A&lt;/P&gt;
&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;
&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;B&lt;/P&gt;
&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C&lt;/P&gt;
&lt;P&gt;2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;D&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Subject 2's second visit should be flagged because there are 2 issues C's posted in the data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Best is to provide example data in the form of a data step so we test code. Example:&lt;/P&gt;
&lt;PRE&gt;data work.have;
input Sub       Visit    Issue  $;
datalines;
1            1         A
1            2         B
1            2         C
1            3         A
2            1         A
2            2         C
2            2         B
2            2         C
2            3         D
;&lt;/PRE&gt;
&lt;P&gt;Paste into a code box opened with the forum's {I} or "running man" icons to preserve formatting. the message windows will reformat text and sometimes the reformatting will change the data enough that the data step will not run.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One easy way to get just the problem records:&lt;/P&gt;
&lt;PRE&gt;proc freq data=work.have noprint;
   tables sub*visit*issue/out=work.issuecount(where=(count&amp;gt;1));
run;&lt;/PRE&gt;
&lt;P&gt;Or to set a flag, 1=True 0=False for duplicates:&lt;/P&gt;
&lt;PRE&gt;proc sort data=work.have;
   by sub visit issue;
run;
data work.want;
   set work.have;
   by sub visit issue;
   flag=  not first.issue ;
run;&lt;/PRE&gt;
&lt;P&gt;A By statement in the data set creates a number of Boolean variables named First. and Last. for each variable on the by statement. If the record is the first at a given level then it is True (or 1). So if something is not the first record with the same Issue value then negating that variable gets the flag.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 18:36:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584724#M166586</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-08-28T18:36:22Z</dc:date>
    </item>
    <item>
      <title>Re: flagging duplicate data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584752#M166604</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/281903"&gt;@Rcrowder&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;Methinks the simplest is to use the SAS SQL's self-merge facility:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;                                               
  input Sub Visit Issue :$1. ;                            
  cards ;                                                 
1  1  A                                                   
1  2  B                                                   
1  2  C                                                   
1  3  A                                                   
2  1  A                                                   
2  2  C                                                   
2  2  B                                                   
2  2  C                                                   
2  3  D                                                   
;                                                         
run ;                                                     
                                                          
proc sql ;                                                
  create table want as                                       
  select *, count(issue) &amp;gt; count(distinct issue) as flag
  from   have                                             
  group  sub, visit                                       
  ;                                                       
quit ;                                                    
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note: The original order of the records within (sub,visit) may change, for example:&lt;/P&gt;
&lt;PRE&gt;Sub    Visit    Issue    flag
-----------------------------
 1       1        A        0 
 1       2        C        0 
 1       2        B        0 
 1       3        A        0 
 2       1        A        0 
 &lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;2       2        &lt;FONT color="#800080"&gt;B&lt;/FONT&gt;        1 
 2       2        &lt;FONT color="#800080"&gt;C&lt;/FONT&gt;        1 
 2       2        &lt;FONT color="#800080"&gt;C&lt;/FONT&gt;        1&lt;/FONT&gt;&lt;/STRONG&gt; 
 2       3        D        0 
&lt;/PRE&gt;
&lt;P&gt;But since it's SQL, this is an expected behavior.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OTOH, since your data appear to be sorted by (sub,visit), you can get what you want and preserve the original record order using a simple double DoW loop:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop = _:) ;                               
  do _n_ = 1 by 1 until (last.visit) ;    
    set have ;                            
    by sub visit ;                        
    if _n_ = 1 then _issue = issue ;      
    else if issue = _issue then flag = 1 ;
  end ;                                   
  do _n_ = 1 to _n_ ;                     
    set have ;                            
    flag = ^^ flag ;                      
    output ;                              
  end ;                                   
run ;                                     
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;What you get as a result in the output is:&lt;/P&gt;
&lt;PRE&gt;Sub    Visit    Issue    flag
-----------------------------
 1       1        A        0 
 1       2        B        0 
 1       2        C        0 
 1       3        A        0 
 2       1        A        0 
&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt; 2       2        &lt;FONT color="#800080"&gt;C&lt;/FONT&gt;        1 
 2       2        &lt;FONT color="#800080"&gt;B&lt;/FONT&gt;        1 
 2       2        &lt;FONT color="#800080"&gt;C&lt;/FONT&gt;        1&lt;/FONT&gt; &lt;/STRONG&gt;
 2       3        D        0 
&lt;/PRE&gt;
&lt;P&gt;However, your specs are rather vague, so there's a chance you want to flag only the records with the duplicate values of ISSUE rather than the entire (sub,visit) group where they are found. If this is the case, the hash object comes in handy, especially since its use doesn't assume that the input file is sorted - and it also preserves the original record order:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop = _:) ;                     
  if _n_ = 1 then do ;                      
    dcl hash h () ;                         
    h.definekey  ("sub", "visit", "issue") ;
    h.definedata ("_q") ;                   
    h.definedone () ;                       
    do until (z) ;                          
      set have end = z ;                    
      if h.find() ne 0 then _q = 1 ;        
      else _q + 1 ;                         
      h.replace() ;                         
    end ;                                   
  end ;                                     
  set have ;                                
  h.find() ;                                
  flag = _q &amp;gt; 1 ;                           
run ;                                       
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;As a result, you'll get:&lt;/P&gt;
&lt;PRE&gt;Sub    Visit    Issue    flag
-----------------------------
 1       1        A        0 
 1       2        B        0 
 1       2        C        0 
 1       3        A        0 
 2       1        A        0 
&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt; 2       2        C        &lt;FONT color="#800080"&gt;1&lt;/FONT&gt; 
 2       2        B        &lt;FONT color="#800080"&gt;0&lt;/FONT&gt; 
 2       2        C        &lt;FONT color="#800080"&gt;1 &lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;
 2       3        D        0 &lt;/PRE&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 20:45:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/flagging-duplicate-data/m-p/584752#M166604</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-08-28T20:45:45Z</dc:date>
    </item>
  </channel>
</rss>

