<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: effective way to count the number of uniqueness in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216198#M307666</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Is your ID, a number or character? If number, do have any idea about it Range.&lt;/P&gt;&lt;P&gt;If it is number, array will be fast.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 04 Aug 2015 00:12:31 GMT</pubDate>
    <dc:creator>KachiM</dc:creator>
    <dc:date>2015-08-04T00:12:31Z</dc:date>
    <item>
      <title>effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216195#M307663</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I have about 10 million records SAS data set with ids. I want to check if the id is unique or not? What is the effective way to do this?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried use proc sql to count the number but it takes long time. Should I create an index for the id? Please advise. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; select count(distinct id) from master;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 03 Aug 2015 21:37:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216195#M307663</guid>
      <dc:creator>chenyiwen1717</dc:creator>
      <dc:date>2015-08-03T21:37:21Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216196#M307664</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Your query will count the number of distinct IDs in the dataset. Is that what you want?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you want a list of records with non-unique IDs, you should use&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc sql;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;create table dups as&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;select * from master&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;group by ID&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;having count(*) &amp;gt; 1;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;quit;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Adding an index to your dataset would help if you are going to be repeating this check often. It is not worth the effort if this is a one time verification.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 03 Aug 2015 22:05:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216196#M307664</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2015-08-03T22:05:26Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216197#M307665</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yes. I just want to check if the IDs are unique or not. However my simple query runs a long time. Is there any other efficient way to check?Thanks.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 03 Aug 2015 23:04:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216197#M307665</guid>
      <dc:creator>chenyiwen1717</dc:creator>
      <dc:date>2015-08-03T23:04:06Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216198#M307666</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Is your ID, a number or character? If number, do have any idea about it Range.&lt;/P&gt;&lt;P&gt;If it is number, array will be fast.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 00:12:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216198#M307666</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2015-08-04T00:12:31Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216199#M307667</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Proc FREQ will do this much quicker.&amp;nbsp; Try this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;proc freq data=master&amp;nbsp;&amp;nbsp; noprint ;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp; table ID / out=UniqueIDs(where=(count=1));&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 00:26:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216199#M307667</guid>
      <dc:creator>JerryLeBreton</dc:creator>
      <dc:date>2015-08-04T00:26:49Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216200#M307668</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Depending on what else you are doing, you could use Proc SORT, which performs about the same as Proc FREQ.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;proc sort data=master(keep=ID)&amp;nbsp; nounikey &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out=duplicateIDs&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; uniout=uniqueIDs ;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; by ID;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And if you're not interested in the non-unique IDs&amp;nbsp; being written to a data set, you can use...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-family: courier new,courier;"&gt;out = _null_&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 01:08:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216200#M307668</guid>
      <dc:creator>JerryLeBreton</dc:creator>
      <dc:date>2015-08-04T01:08:12Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216201#M307669</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;May be first. and last. might be handy. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The below query will output only those id which are unique. For example in the below code only id 2 will be outputted. But may be i did not understand the query, please correct if i did not.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data have;&lt;/P&gt;&lt;P&gt;input id;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P&gt;1&lt;/P&gt;&lt;P&gt;1&lt;/P&gt;&lt;P&gt;1&lt;/P&gt;&lt;P&gt;1&lt;/P&gt;&lt;P&gt;2&lt;/P&gt;&lt;P&gt;3&lt;/P&gt;&lt;P&gt;3&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;set have;&lt;/P&gt;&lt;P&gt;by id;&lt;/P&gt;&lt;P&gt;if first.id and last.id;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc print;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jag&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 02:03:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216201#M307669</guid>
      <dc:creator>Jagadishkatam</dc:creator>
      <dc:date>2015-08-04T02:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216202#M307670</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have character IDs. Let say my data set has 10,000,000 records. I just want to know if the IDs are unique or not. That's why I use:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;proc sql;&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp; select count(distinct id) from master;&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;quit;&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;/P&gt;&lt;P style="font-size: 13px; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;If I can get the exact same number of count then I am able to know they are all unique. But this method takes a lot of time.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 03:27:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216202#M307670</guid>
      <dc:creator>chenyiwen1717</dc:creator>
      <dc:date>2015-08-04T03:27:12Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216203#M307671</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you just want know if ID is unique , don't care which ID is duplicated.&lt;/P&gt;&lt;P&gt;The Best way is using Hash Table . If there are some duplicated ID , "&lt;STRONG&gt;ha.add()"&amp;nbsp; &lt;/STRONG&gt;will eject ERROR info in LOG .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data _null_;&lt;/P&gt;&lt;P&gt;if _n_=1 then do;&lt;/P&gt;&lt;P&gt;if 0 then set have;&lt;/P&gt;&lt;P&gt;declare hash ha();&lt;/P&gt;&lt;P&gt;ha.definekey('id');&lt;/P&gt;&lt;P&gt;ha.definedone();&lt;/P&gt;&lt;P&gt;end;&lt;/P&gt;&lt;P&gt;set have;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ha.add();&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Message was edited by: xia keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 12:06:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216203#M307671</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2015-08-04T12:06:05Z</dc:date>
    </item>
    <item>
      <title>Re: effective way to count the number of uniqueness</title>
      <link>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216204#M307672</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you consider that a unique ID is one that occurs once only, it is possible to find by using Hash objects. But how much time it will take compared to SQL can be found only by you, by running both ways.&lt;/P&gt;&lt;P&gt;Xia has already given a hash solution. It can not tell which ID occurs once only. Here is another hash way with an example, the length of ID need not be 6 in your case. It It collects all records having one ID into one data set(Unique) and the rest to another data set(Duplicates).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data have;&lt;/P&gt;&lt;P&gt;input ID $6.;&lt;/P&gt;&lt;P&gt;datalines;&lt;/P&gt;&lt;P&gt;aaaaaa&lt;/P&gt;&lt;P&gt;aaaaaa&lt;/P&gt;&lt;P&gt;aaaaaa&lt;/P&gt;&lt;P&gt;bbbbbb&lt;/P&gt;&lt;P&gt;dddddd&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;data _null_;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if _n_ = 1 then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash h();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.definekey('ID');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.definedata('ID','freq');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.definedone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set have end = eof;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if h.find() ne 0 then freq = 0;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; freq + 1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; h.replace();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if eof;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.output(dataset:'Unique (where=(freq=1)');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.output(dataset:'Duplicates (where=(freq &amp;gt; 1)');&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc print data = Unique;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc print data = Duplicates;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Aug 2015 14:23:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/effective-way-to-count-the-number-of-uniqueness/m-p/216204#M307672</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2015-08-04T14:23:19Z</dc:date>
    </item>
  </channel>
</rss>

