<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What's the best way to see if a dataset has any duplicates? in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912691#M40872</link>
    <description>I was really hoping for a much shorter solution, lol...but this will definitely help me learn how to use HAS, thanks!</description>
    <pubDate>Tue, 23 Jan 2024 13:58:13 GMT</pubDate>
    <dc:creator>cosmid</dc:creator>
    <dc:date>2024-01-23T13:58:13Z</dc:date>
    <item>
      <title>What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912585#M40857</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is PROC SORT nodupkey or nodup the best way to check for duplicates or there a better way to quickly check to see if a variable has duplicated values in a dataset?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 02:41:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912585#M40857</guid>
      <dc:creator>cosmid</dc:creator>
      <dc:date>2024-01-23T02:41:20Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912598#M40859</link>
      <description>&lt;P&gt;you can use in memory hash table to read data and print "error" on the first duplicate:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input x $1. @@;
if x ne " ";
cards;
qwertyuiopasdfghjklzxcvbnm1234567890q
;
run;
proc print;
run;


/* test for dups */
data _null_;
  declare hash H();
  H.defineKey("x");
  H.defineDone();

  do until(eof);
    set HAVE end=eof curobs=curobs;
    rc=H.add();
    if rc then 
      do;
        put "ERROR: Duplicate value: " x "detected in observation " curobs;
        stop;
      end;
  end;
stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 07:46:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912598#M40859</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2024-01-23T07:46:38Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912602#M40860</link>
      <description>&lt;P&gt;Another hash-solution (using the data provided by &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
   if 0 then set have;
   declare hash h(dataset: 'have', duplicate: 'e');
   h.defineKey('x');
   h.defineDone();
   stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 23 Jan 2024 07:58:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912602#M40860</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2024-01-23T07:58:52Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912606#M40861</link>
      <description>&lt;P&gt;And cool thing is that it can be easily extended from only single variable check to row duplicates;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input x $1. @@;
if x ne " ";
y=rank(x);
z=y*10;
cards;
qwertyuiopasdfghjklzxcvbnm1234567890q
;
run;
proc print;
run;

data _null_;
   if 0 then set have;
   declare hash h(dataset: 'have', duplicate: 'e');
   h.defineKey(all:'yes'); /* duplicated rows */
   h.defineDone();
   stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 08:10:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912606#M40861</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2024-01-23T08:10:45Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912607#M40862</link>
      <description>Define "best".&lt;BR /&gt;Apart from the suggested hash techniques, you could also use PROC SQL with HAVING and COUNT.&lt;BR /&gt;Or apply a unique index and see the operation succeeeds.</description>
      <pubDate>Tue, 23 Jan 2024 08:11:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912607#M40862</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2024-01-23T08:11:06Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912690#M40871</link>
      <description>Hi andreas!&lt;BR /&gt;&lt;BR /&gt;I didn't know there's a DUPLICATE that can be used with HASH. &lt;BR /&gt;&lt;BR /&gt;I have seen a lot programs with the IF statement:&lt;BR /&gt;if 0 then set data_set_name;&lt;BR /&gt;&lt;BR /&gt;I always wondered how that statement can execute because I thought the default numeric value for FALSE is also 0? So the 0 here must mean something else?&lt;BR /&gt;</description>
      <pubDate>Tue, 23 Jan 2024 13:57:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912690#M40871</guid>
      <dc:creator>cosmid</dc:creator>
      <dc:date>2024-01-23T13:57:33Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912691#M40872</link>
      <description>I was really hoping for a much shorter solution, lol...but this will definitely help me learn how to use HAS, thanks!</description>
      <pubDate>Tue, 23 Jan 2024 13:58:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912691#M40872</guid>
      <dc:creator>cosmid</dc:creator>
      <dc:date>2024-01-23T13:58:13Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912692#M40873</link>
      <description>Hi LinusH,&lt;BR /&gt;&lt;BR /&gt;So, I was hoping for a built-in SAS function that I didn't know of or something like a one line of code. The other solution I found beside PROC SORT was using the FIRST and LAST and compare them. The PROC SORT creates another dataset and the FIRST and LAST involves more coding so I was hoping for a shorter version of some sort. I thought there might be one that exist since checking for duplicates is such a common task.</description>
      <pubDate>Tue, 23 Jan 2024 14:02:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912692#M40873</guid>
      <dc:creator>cosmid</dc:creator>
      <dc:date>2024-01-23T14:02:29Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912763#M40877</link>
      <description>I understand the IF 0 now. It's used to set the PDV and skip reading in the observations.&lt;BR /&gt;Sorry, I wanted to follow up because I asked about it in an earlier reply and I don't know how to delete that reply. So in case I'll waste more of your time to answer I'll just explain here.&lt;BR /&gt;Thanks again for the help!</description>
      <pubDate>Tue, 23 Jan 2024 23:01:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912763#M40877</guid>
      <dc:creator>cosmid</dc:creator>
      <dc:date>2024-01-23T23:01:19Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912765#M40878</link>
      <description>&lt;P&gt;Personally I find it useful to create macros for common tasks like this. It means you can get your answer with just one statement. It also means the underlying method isn't so important.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SASKiwi_0-1706056615771.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92919iF3A791FC65734F63/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SASKiwi_0-1706056615771.png" alt="SASKiwi_0-1706056615771.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro Find_Dups ( dataset = 
                  ,byvar   =
                  ,dupvar  = 
                 );

%if &amp;amp;dupvar = %then %let dupvar = &amp;amp;byvar; 

proc sort data = &amp;amp;dataset 
          out = sorted
           ;
  by &amp;amp;byvar;
run;

data dups;
  set sorted;
   by &amp;amp;byvar;
  if not (first.&amp;amp;dupvar and last.&amp;amp;dupvar);
run;

%mend Find_Dups;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Although you will notice that I prefer to create a table with the duplicate rows.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jan 2024 00:39:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/912765#M40878</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-01-24T00:39:39Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/917813#M41080</link>
      <description>Thanks for the code! Is there a way for SAS to take parameters at the command line? I'm referring to Linux environment. For example, if I wanted to check if dataset sample.sas7bdat has any duplicate, I could just run the program with command like SAS PROG.SAS sample var&lt;BR /&gt;And the program will take the first parameter as the dataset and the 2nd parameter as the BY variable</description>
      <pubDate>Sun, 25 Feb 2024 16:18:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/917813#M41080</guid>
      <dc:creator>cosmid</dc:creator>
      <dc:date>2024-02-25T16:18:39Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/917814#M41081</link>
      <description>&lt;P&gt;This thread seems to be devolving into a general discussion.&lt;/P&gt;
&lt;P&gt;Much better to post new questions on new threads.&amp;nbsp; You can always include a link to some older topic.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can use the old -sysparm option.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/doc/en/mcrolref/3.2/p0ajr6rtdhuhzbn199hhpkak2v8p.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/mcrolref/3.2/p0ajr6rtdhuhzbn199hhpkak2v8p.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or you can take advantage of the new -set option.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostunx/n106qouqj0hfk5n1wgqpw8iovxy2.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostunx/n106qouqj0hfk5n1wgqpw8iovxy2.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 25 Feb 2024 17:07:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/917814#M41081</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-02-25T17:07:19Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best way to see if a dataset has any duplicates?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/917839#M41083</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/253026"&gt;@cosmid&lt;/a&gt;&amp;nbsp;- I suggest you follow&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;'s advice regarding the SET option which creates environment variables you can read using %SYSGET or SYSGET in your SAS program.&lt;/P&gt;</description>
      <pubDate>Sun, 25 Feb 2024 22:06:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/What-s-the-best-way-to-see-if-a-dataset-has-any-duplicates/m-p/917839#M41083</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-02-25T22:06:27Z</dc:date>
    </item>
  </channel>
</rss>

