<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Most efficient way to eliminate duplicate in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176702#M45293</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There are others who know the syntax better than I do, but the best approach is likely to be a hash table.&amp;nbsp; Load the data into a hash table, using REPLACE for duplicate key values.&amp;nbsp; Then unload the hash table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;All assuming that you have sufficient memory.&amp;nbsp; Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 08 Oct 2014 13:42:48 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2014-10-08T13:42:48Z</dc:date>
    <item>
      <title>SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176696#M45287</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin: 0 0 1em; font-size: 14px; color: #000000; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; background: #ffffff;"&gt;I&lt;SPAN style="line-height: 1.5em;"&gt; have a database with an identifier and declarations. Declarations are constructed as identifier + a letter. If the idendifier is 123456, declarations would then be "123456A", "123456B" and so on&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin: 0 0 1em; font-size: 14px; color: #000000; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; background: #ffffff;"&gt;I would like to select one observation for each identifier, with the declaration that is the one with the last letter, which is of course, not always the same.&lt;/P&gt;&lt;P style="margin: 0 0 1em; font-size: 14px; color: #000000; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; background: #ffffff;"&gt;I assume I can do that with a proc sort and then another one with nodupkey :&lt;/P&gt;&lt;PRE style="margin: 0 0 10px; padding: 5px; font-size: 14px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; color: #000000; background: #eeeeee;"&gt;
&lt;P&gt;&lt;CODE style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;proc sort data=have out=have2; &lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; by identifier declaration /descending; *&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;run; &lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;proc sort data=have2 out=want nodupkey; &lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;by identifier; &lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;run; &lt;/CODE&gt;&lt;/P&gt;

&lt;/PRE&gt;&lt;P style="margin: 0 0 1em; font-size: 14px; color: #000000; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; background: #ffffff;"&gt;but as I have a relatively important database (tens of millions observations) I would like to know the best in sense of both better suited and fastest method if it is another one. Typically, if it is possible in one step, as for now it takes a lot of time.&lt;/P&gt;&lt;P style="margin: 0 0 1em; font-size: 14px; color: #000000; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; background: #ffffff;"&gt;Thanks&lt;/P&gt;&lt;P style="margin: 0 0 1em; font-size: 14px; color: #000000; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; background: #ffffff;"&gt;Edit1 : change a typo in program&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 11:59:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176696#M45287</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-08T11:59:52Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176697#M45288</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Nice question. And a lot more to choose a direction.&lt;/P&gt;&lt;P&gt;- What SAS version do you have,&lt;/P&gt;&lt;P&gt;- What system (Windows/Unix/Mainframe) and with what kind of resouces (cores memory IO)&lt;/P&gt;&lt;P&gt;- Where is that database? Is at an external one (like Oracle) or SAS dedicated.&lt;/P&gt;&lt;P&gt;- How fast should it run? (seconds? several minutes?)&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;Are both character identification and declaration character based? The character could get separated? The identification is integer number of max 10 digits?&lt;BR /&gt;You just want of that one resulting dataset or is there more?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 12:20:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176697#M45288</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-10-08T12:20:04Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176698#M45289</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;So in order : &lt;/P&gt;&lt;P&gt;-I have SAS 9.2&lt;/P&gt;&lt;P&gt;-I am on windows 7, and I am working on a company cloud which I do not have much info about&lt;/P&gt;&lt;P&gt;-It is a sas-dedicated database&lt;/P&gt;&lt;P&gt;-There is not that important constraints, faster is better, but it is part of a bigger program, which is planned to be run annualy. The whole process can take a week or two.&lt;/P&gt;&lt;P&gt;-identifier is a fourtenn-digit character variable, declaration a fifteen-digit character one&lt;/P&gt;&lt;P&gt;-I can have the letter by a substr procedure but i guess you knew that so I am not sure of what you mean about the character that can be serparated&lt;/P&gt;&lt;P&gt;-It is just a part of a big program (not sure it was the question though)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thx in advance&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 12:43:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176698#M45289</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-08T12:43:52Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176699#M45290</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I don't actually have SAS right now, but my suggestion would be this.&amp;nbsp; First take a distinct list of identifiers e.g.:&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table IDS as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; distinct&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; IDENTIFIER,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max(rank(DECLARATION)) as XX /* Just guessing here, i.e. get the max ASCII number of the letter*/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; YOUR_DATA;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;Then use that to create new table:&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table WANT_DATA as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; YOUR_DATA&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; IDENTIFIER||DECLARATION in (select distinct ... from IDS);&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Don't know how quick it would be though.&amp;nbsp; Maybe do&amp;nbsp; it in pieces, e.g. 1 id per time.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:14:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176699#M45290</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2014-10-08T13:14:25Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176700#M45291</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Going for that ten of millions (20M? 60M?) of observations and that two variables we are talking about a length of 30 bytes (latin1) that is (60M) 1,8 Gb. &lt;BR /&gt;It is big but with a modern pc charged with 8Gb and 8 cores 500Gb dasd (quality laptop) all should be an easy fit for in memory processing.&lt;/P&gt;&lt;P&gt;When you are using SAS at windows it will allow you to do that. Please check memsize (and OS) with: " proc options;run;&amp;nbsp; put _all_ ;&amp;nbsp; "&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sequential processing reading once all data is commonly faster than random io unless you are using SSD-s.&lt;BR /&gt;A fast solution could building op the key as a Hash. It will deliver a sorted table in memory. Working back Last/Prev should give easy the highest value (case sensitve) for every declaration.&lt;BR /&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002588805.htm" title="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002588805.htm"&gt;SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition&lt;/A&gt;&amp;nbsp; Only that fifteen long key is needed as a hash when the quality assurance is that the first 14 chars are equal.&lt;/P&gt;&lt;P&gt;In that case it would be one load reading back for selection.&amp;nbsp; Options:&lt;/P&gt;&lt;P&gt;a/ change data/update in a dataset and doing an output creating a new dataset.&lt;/P&gt;&lt;P&gt;b/ Modifying the hash (deleting the lower values) do a hash table save&lt;/P&gt;&lt;P&gt;Both can be combined with some other processing on the data. eliminating other steps.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just an annual run? That one also most likely to change annully. maintainability may be more important as speed of processing. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:15:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176700#M45291</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-10-08T13:15:12Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176701#M45292</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;To be more precise I have to do that a hundred times on databases that have beween 500 000 and two millions observations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Memsize is 2147483648, not sure where to find the OS&lt;/P&gt;&lt;P&gt;It will be just an annual run so of course maintainability is key, but for now I have to run it a lot in order to test some features. Moreover it is only a small part of my program, which is also only a small part of the whole program that has to be run each year, so every time gained is still precious&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:36:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176701#M45292</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-08T13:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176702#M45293</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There are others who know the syntax better than I do, but the best approach is likely to be a hash table.&amp;nbsp; Load the data into a hash table, using REPLACE for duplicate key values.&amp;nbsp; Then unload the hash table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;All assuming that you have sufficient memory.&amp;nbsp; Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:42:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176702#M45293</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2014-10-08T13:42:48Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176703#M45294</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You still haven't told us if your data is stored as SAS files or resides in a data base (and which one). This information is crucial to come up with a "best" solution.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below sample code is for data stored in SAS files (&lt;A __default_attr="8872" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt;'s option b).&lt;/P&gt;&lt;P&gt;If your data is stored in a data base then - depending on the data base - using SQL analytical functions would eventually be the a very efficient way &lt;A href="http://www.oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions.php" title="http://www.oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions.php"&gt;ORACLE-BASE - FIRST_VALUE and LAST_VALUE Analytic Functions&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;options fullstimer;&lt;/P&gt;&lt;P&gt;data have(keep=identifier declaration value);&lt;/P&gt;&lt;P&gt;&amp;nbsp; attrib identifier length=8 format=best32. declaration length=$9.;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; do identifier=1 to 10000000;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; _stop=ceil(ranuni(1)*10);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i=1 to _stop;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declaration=cats(identifier,byte(_i+64));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value+1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (drop=_:);&lt;/P&gt;&lt;P&gt;&amp;nbsp; set have end=last;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have(keep=declaration rename=(declaration=_decl));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash h1 (dataset:'have(keep=identifier declaration rename=(declaration=_decl))', hashexp:9);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineKey('identifier');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineData('_decl');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; call missing(_decl);&lt;/P&gt;&lt;P&gt;&amp;nbsp; _rc=h1.find();&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if declaration&amp;gt;_decl then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; h1.replace(key:identifier,data:declaration);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if last then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i=1 to _nobs;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have nobs=_nobs point=_i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(_decl);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.find();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if declaration=_decl then output;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:44:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176703#M45294</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2014-10-08T13:44:24Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176704#M45295</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have, at least I think so : "&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;-It is a sas-dedicated database"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;That is I have a library pointing to a directory in which are stored my databases as .sas7bdat&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:46:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176704#M45295</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-08T13:46:55Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176705#M45296</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Regardless of which approach you end up using, I don't think your logic is correct and you don't need to write the 3rd file.&amp;nbsp; Aren't you really just trying to accomplish something like?:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-weight: inherit; font-style: inherit; font-family: inherit;"&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;proc sort data=have out=want; &lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;&amp;nbsp; by identifier&lt;SPAN style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif;"&gt; descending&lt;/SPAN&gt; declaration;&lt;SPAN style="font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif;"&gt; /*&amp;lt;-I changed this statement*/&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;run; &lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-weight: inherit; font-style: inherit; font-family: inherit;"&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;proc sort data=want nodupkey; &lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;&amp;nbsp; by identifier; /*&amp;lt;-I changed this statement*/&lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;run; &lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE style="font-weight: inherit; font-style: inherit; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; background-position: initial;"&gt;&lt;BR /&gt;&lt;/CODE&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:49:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176705#M45296</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2014-10-08T13:49:23Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176706#M45297</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;And here Jaap's option a) - I believe. Runs a bit longer and uses a bit more resources but is may-be a bit better to understand/maintain.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;options fullstimer;&lt;/P&gt;&lt;P&gt;data have(keep=identifier declaration value);&lt;/P&gt;&lt;P&gt;&amp;nbsp; attrib identifier length=8 format=best32. declaration length=$9.;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; do identifier=1 to 10000000;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; _stop=ceil(ranuni(1)*10);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i=1 to _stop;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declaration=cats(identifier,byte(_i+64));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value+1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data _null_;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set have end=last;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have(keep=declaration rename=(declaration=_decl));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash h1 (dataset:'have(keep=identifier declaration rename=(declaration=_decl))');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineKey('identifier');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineData('identifier','_decl');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; call missing(_decl);&lt;/P&gt;&lt;P&gt;&amp;nbsp; _rc=h1.find();&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if declaration&amp;gt;_decl then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; h1.replace(key:identifier,data:identifier,data:declaration);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if last then h1.output(dataset:'selection(rename=(_decl=declaration))');&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set have;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;/P&gt;&lt;P&gt;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash h1 (dataset:'selection');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineKey('identifier','declaration');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if h1.check()=0 then output;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 13:58:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176706#M45297</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2014-10-08T13:58:22Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176707#M45298</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;you are right in the fact i want to nodupkey by identifier -it is what i do for now.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can indeed skip a table (it can also be the first one, just sort it and keep it), but I do not know what is best in terms of time.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 14:01:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176707#M45298</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-08T14:01:24Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176708#M45299</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thx, I will look at it tomorrow when more focused.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 14:02:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176708#M45299</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-08T14:02:54Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176709#M45300</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;O.K. - then using a hash approach will be hard to beat. I've loaded 10M records into the hash which consumed around 650MB RAM. So even if your key/data combination consumes more storage space, your volumes are quite a bit lower but your memory setting indicates that you're having up to 2GB at your disposal.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;"To be more precise I have to do that a hundred times on databases that have beween 500 000 and two millions observations."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Then you definitely want to go for code as optimized as possible and then "wrap" this into a SAS macro so that you have to maintain the "complexity" in a single place only.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On a side note - and more for &lt;A __default_attr="8872" __jive_macro_name="user" class="jive_macro jive_macro_user" href="https://communities.sas.com/"&gt;&lt;/A&gt; actually - the code variant with a second data step and a hash with only a key defined (with 2 variables) consumes 800MB RAM. Not sure why and this is may-be the reason why Paul Dorfman mentioned in another discussion that we need to define a single byte variable as "data" for the hash in such cases.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 14:14:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176709#M45300</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2014-10-08T14:14:28Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176710#M45301</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Because your source data is in a DBMS, it might be most efficient to perform the processing there.&amp;nbsp; You can write your own SQL query and submit it using PROC SQL's explicit pass-through capability.&amp;nbsp; If you're not quite sure how to write that query, you can look to PROC SORT for an example.&amp;nbsp; When the NODUPKEY option is specified, PROC SORT can generate and submit SQL to a DBMS to perform the work.&amp;nbsp; Obviously, the sorting BY identifier DESCENDING declaration with NODUPKEY isn't what you want but the SQL it generates can serve as your starting point.&amp;nbsp; The flavor of SQL generated depends upon the DBMS being used.&amp;nbsp; You can print the generated SQL to the log by specifying the SQL_IP_TRACE=SOURCE option.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's an example:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;libname dbms ... ; /* Modify as appropriate for your DBMS */

proc delete data=dbms.have;
run;

data dbms.have;
 length identifier declaration $ 10;
 input identifier declaration;
 cards;
Y 23456C
Y 23456B
Y 23456A
X 123456A
X 123456B
;
run;

options msglevel=i;
options sql_ip_trace=source;

/*************************************************** 
&amp;nbsp; Use the SQL generated here as a starting point 
 ***************************************************/
proc sort data=dbms.have out=have2 nodupkey;
 by identifier descending declaration;
run;

proc print data=have2;
run;

/*************************************************** 
&amp;nbsp; For example, we take the generated SQL and submit 
&amp;nbsp; it to the DBMS using explicit pass-through.&amp;nbsp; This
&amp;nbsp; does not quite do what we want, though.
 ***************************************************/
proc sql nowarn;
 drop table have2;
 connect using dbms as db;
 create table have2 as select * from connection to db
 (
&amp;nbsp;&amp;nbsp; WITH "subquery0" ( "declaration", "identifier" ) AS ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "declaration" AS "declaration", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "identifier" AS "identifier" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM "have" 
&amp;nbsp;&amp;nbsp; ) 
&amp;nbsp;&amp;nbsp; SELECT 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."identifier",
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."declaration" 
&amp;nbsp;&amp;nbsp; FROM ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT "declaration", "identifier" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "declaration", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "identifier", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY "identifier", "declaration"
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY "identifier", "declaration" DESC&amp;nbsp; 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS "tempcol0" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM&amp;nbsp; "subquery0" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS "subquery1" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE ( "tempcol0" = 1 )&amp;nbsp; 
&amp;nbsp;&amp;nbsp; )&amp;nbsp; AS "table0"&amp;nbsp; 
&amp;nbsp;&amp;nbsp; ORDER BY 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."identifier", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."declaration" DESC
 );
 disconnect from db;
quit;

proc print data=have2;
run;

/*************************************************** 
&amp;nbsp; But we can modify the SQL, removing the 
&amp;nbsp; declaration column from the PARTITION BY clause,
&amp;nbsp; which should produce what we want
 ***************************************************/
proc sql nowarn;
 drop table have2;
 connect using dbms as db;
 create table have2 as select * from connection to db
 (
&amp;nbsp;&amp;nbsp; WITH "subquery0" ( "declaration", "identifier" ) AS ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "declaration" AS "declaration", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "identifier" AS "identifier" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM "have" 
&amp;nbsp;&amp;nbsp; ) 
&amp;nbsp;&amp;nbsp; SELECT 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."identifier",
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."declaration" 
&amp;nbsp;&amp;nbsp; FROM ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT "declaration", "identifier" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "declaration", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "identifier", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER ( 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY "identifier" /*, "declaration" */
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY "identifier", "declaration" DESC&amp;nbsp; 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS "tempcol0" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM&amp;nbsp; "subquery0" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS "subquery1" 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE ( "tempcol0" = 1 )&amp;nbsp; 
&amp;nbsp;&amp;nbsp; )&amp;nbsp; AS "table0"&amp;nbsp; 
&amp;nbsp;&amp;nbsp; ORDER BY 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."identifier", 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "table0"."declaration" DESC
 );
 disconnect from db;
quit;

proc print data=have2;
run;

&lt;/PRE&gt;&lt;P&gt;The last PROC PRINT produces:&lt;/P&gt;&lt;PRE&gt;Obs&amp;nbsp;&amp;nbsp;&amp;nbsp; identifier&amp;nbsp;&amp;nbsp;&amp;nbsp; declaration

 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 123456B
 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Y&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 23456C

&lt;/PRE&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 14:17:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176710#M45301</guid>
      <dc:creator>scmebu</dc:creator>
      <dc:date>2014-10-08T14:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176711#M45302</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The one thing which makes me thinking is that you say it's about hundreds of tables AND end of year processing. That's normally the time where system resources are the most scarce.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If the hash doesn't get the required memory then the job will fail. I'm not 100% sure that if a memsize of 2GB has been defined that you have this amount of memory guaranteed. Hope someone else can shed some light on this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I/O and disk space is at the end of the year normally also in high demand but normally the effect of shared I/O is only a decrease in performance and not job failure. So if you want to be on the very safe side then using a sort is eventually better for your situation. Performance will likely be worse due to increased I/O - especially for tables with big records (lots of variables or very long variables).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below a code option for such an approach. The sample code performs actually quite well - but the record size is also quite low (only one additional 8 byte variable).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;options fullstimer;&lt;/P&gt;&lt;P&gt;data have(keep=identifier declaration value);&lt;/P&gt;&lt;P&gt;&amp;nbsp; attrib identifier length=8 format=best32. declaration length=$9.;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; do identifier=1 to 10000000;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; _stop=ceil(ranuni(1)*10);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i=1 to _stop;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declaration=cats(identifier,byte(_i+64));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; value+1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create view v_have as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select *&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; from have&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by identifier, declaration&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set v_have;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by identifier declaration;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if last.identifier;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 14:45:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176711#M45302</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2014-10-08T14:45:53Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176712#M45303</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Just going back to the environment.&amp;nbsp; The %put _all_ is including the %put _automatic_ ;&amp;nbsp;&amp;nbsp; It will show the automatic systemvaraibles. This is UE on Windows. &lt;/P&gt;&lt;P&gt;SYSSCP and SYSSCPL wil tell the OS-system your sas-session is running on. As you are telling it is a sas7bdat file I am convinced it is all Windows desktop based. You are not using Eguide but the old classic DMS.&lt;/P&gt;&lt;P&gt;All your processing will be done by your desktop/laptop. The sizing of 2Gb memsize is rather small maybe a 32-bit OS and SAS or a VDI being used.&lt;/P&gt;&lt;P&gt;This sizing normally is belonging as default to a server session setting.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;43&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; %put _automatic_ ;&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC AFDSID 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC AFDSNAME &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC AFLIB &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC AFSTR1 &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC AFSTR2 &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC FSPBDV &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSADDRBITS 64&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSBUFFR &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSCC 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSCHARWIDTH 1&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSCMD &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDATASTEPPHASE &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDATE 08OCT14&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDATE9 08OCT2014&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDAY Wednesday&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDEVIC &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDMG 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSDSN&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _NULL_&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSENCODING utf-8&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSENDIAN LITTLE&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSENV BACK&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSERR 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSERRORTEXT Shell escape is not valid in this SAS session.&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSFILRC 1&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSHOSTINFOLONG Linux LIN X64 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 CentOS release 6.5 &lt;/P&gt;&lt;P class="sasSource"&gt; (Final) &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSHOSTNAME localhost&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSINDEX 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSINFO 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSJOBID 24800&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSLAST _NULL_&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSLCKRC 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSLIBRC 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSLOGAPPLNAME &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSMACRONAME &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSMAXLONG 9007199254740992&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSMENV S&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSMSG &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSNCPU 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSNOBS 58&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSODSESCAPECHAR&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSODSGRAPHICS 1&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSODSPATH&amp;nbsp; WORK.TEMPLAT(UPDATE) SASUSER.TEMPLAT(READ) SASHELP.TMPLMST(READ)&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSPARM &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSPBUFF &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSPROCESSID 41D9C14253A7EC464018000000000000&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSPROCESSMODE SAS Workspace Server&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSPROCESSNAME Object Server&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSPROCNAME &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSRC 0&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSCP LIN X64&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSCPL Linux&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSITE 70068118&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSIZEOFLONG 8&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSIZEOFPTR 8&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSIZEOFUNICODE 4&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSTARTID &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSSTARTNAME &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSTCPIPHOSTNAME localhost.localdomain&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSTIME 10:28&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSTIMEZONE &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSTIMEZONEIDENT &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSTIMEZONEOFFSET -14400&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSUSERID sasdemo&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSVER 9.4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSVLONG 9.04.01M1P120413&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSVLONG4 9.04.01M1P12042013&lt;/P&gt;&lt;P class="sasSource"&gt; AUTOMATIC SYSWARNINGTEXT &lt;/P&gt;&lt;P class="sasSource"&gt; 44&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 15:12:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176712#M45303</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-10-08T15:12:37Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176713#M45304</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There is something I don't understand.&lt;/P&gt;&lt;P&gt;You have an annually updated table.&lt;/P&gt;&lt;P&gt;What are the other "hundreds" of programs you wish to run, it sounded like that they were equal/similar?&lt;/P&gt;&lt;P&gt;If this is the "one" type of query you wish to optimize, just store the table in sorted order by identifier and declaration. Then you just need to do simple table scans using set by with last. (or first. depending on table sort order) logic.&lt;/P&gt;&lt;P&gt;This sounds too simple so I presume that I missed something in this conversation?&lt;/P&gt;&lt;P&gt;To speed up table scans, try store it in a SPDE libname, using the fastest disks you can get.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 20:06:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176713#M45304</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2014-10-08T20:06:23Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176714#M45305</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have taken the test-set of Patrick and modified it so it that two vars. I added/changed the value to obspntr.&lt;/P&gt;&lt;P&gt;You can use it that way when having done the selection relating to the original record. Even using the point= method for accessing that one.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;I added the log/fullstimer using UE. It is about 13M records with a datasetsize of 500MB.&amp;nbsp; (ca 15s)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Getting the duplicates as last back.&lt;/P&gt;&lt;P&gt;The first solution is solving it as commented by using knowing declarations are clustered and ordered.&lt;/P&gt;&lt;P&gt;The by processing with notsorted will do and run in about 15sec.&lt;/P&gt;&lt;P&gt;When it is not sorted a sort step can solve that. Needing some additional time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The second solution is solving it as hash. The loading of the has will order/sort the data.&amp;nbsp; With this approach there is no sorting step needed.&lt;/P&gt;&lt;P&gt;Than reading back in a iteration a second hash filled as the last of the 14 chars. This runs in about 1m20 seconds.&lt;/P&gt;&lt;P&gt;I planned to remove keys objects but got a locking error because the iteration position was locking the position. &lt;/P&gt;&lt;P&gt;As shown the datastep processing is not used. You could do some on that using those hashes in memory. There 1,5Gb used.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I made I mistake on the test-dataset. 10 times more records. (135M) durations ha&amp;nbsp;&amp;nbsp;&amp;nbsp; where 10minutes creating and last processing 5 minutes. The hash could not run.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The more is known of your processing and requirements the better the choices van be tailored to that.&lt;/P&gt;&lt;P&gt;Hundred of programs datasets? That sounds some effort on that is worth doing that.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;44&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; data test.have(keep=identifier declaration obspnt);&lt;/P&gt;&lt;P class="sasSource"&gt; 45&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; attrib identifier length=$14&amp;nbsp; declaration length=$15.;&lt;/P&gt;&lt;P class="sasSource"&gt; 46&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; retain obspnt 0;&lt;/P&gt;&lt;P class="sasSource"&gt; 47&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do i=1 to 1000000;&amp;nbsp; /* no of records */&lt;/P&gt;&lt;P class="sasSource"&gt; 48&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _value=ceil(ranuni(1)*12000000000);&lt;/P&gt;&lt;P class="sasSource"&gt; 49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; identifier=putn(_value,'z14.0');&lt;/P&gt;&lt;P class="sasSource"&gt; 50&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _stop=ceil(ranuni(2)*26 ); /* 26 letters at max, ascii letters latin-1 start at 40x */&lt;/P&gt;&lt;P class="sasSource"&gt; 51&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i=1 to _stop;&lt;/P&gt;&lt;P class="sasSource"&gt; 52&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declaration=cats(identifier,byte(64+_i));&lt;/P&gt;&lt;P class="sasSource"&gt; 53&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; obspnt+1;&lt;/P&gt;&lt;P class="sasSource"&gt; 54&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P class="sasSource"&gt; 55&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P class="sasSource"&gt; 56&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P class="sasSource"&gt; 57&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote1_1412800904732"&gt; NOTE: The data set TEST.HAVE has 13482024 observations and 3 variables.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote2_1412800904732"&gt; NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 14.21 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; user cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.00 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; system cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.37 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 379.96k&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OS Memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 25236.00k&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Timestamp&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 08-10-2014 04:40:07 PM&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Step Count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 42&amp;nbsp; Switch Count&amp;nbsp; 74&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Faults&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Reclaims&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 68&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Swaps&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Voluntary Context Switches&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 272&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Involuntary Context Switches&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 90&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Block Input Operations&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Block Output Operations&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1058064&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 58&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 59&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* unique indentity unsorted, but decalartion within that is clustered and oredered wantes as last */&lt;/P&gt;&lt;P class="sasSource"&gt; 60&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; data test.want1 ;&lt;/P&gt;&lt;P class="sasSource"&gt; 61&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set test.have ;&lt;/P&gt;&lt;P class="sasSource"&gt; 62&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; by identifier notsorted&amp;nbsp; ;&lt;/P&gt;&lt;P class="sasSource"&gt; 63&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if last.identifier;&lt;/P&gt;&lt;P class="sasSource"&gt; 64&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote3_1412800904732"&gt; NOTE: There were 13482024 observations read from the data set TEST.HAVE.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote4_1412800904732"&gt; NOTE: The data set TEST.WANT1 has 1000000 observations and 3 variables.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote5_1412800904732"&gt; NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.41 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; user cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.00 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; system cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.96 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 562.09k&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OS Memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 25236.00k&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Timestamp&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 08-10-2014 04:40:23 PM&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Step Count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 43&amp;nbsp; Switch Count&amp;nbsp; 40&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Faults&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Reclaims&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 39&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Swaps&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Voluntary Context Switches&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 105&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Involuntary Context Switches&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 245&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Block Input Operations&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Block Output Operations&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 78992&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 65&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 66&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* using just decalaration to build hash wiht highest letter value */&lt;/P&gt;&lt;P class="sasSource"&gt; 67&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; data notwant (drop=_:);&lt;/P&gt;&lt;P class="sasSource"&gt; 68&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set sashelp.class end=last;&lt;/P&gt;&lt;P class="sasSource"&gt; 69&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 70&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _n_=1 then do;&lt;/P&gt;&lt;P class="sasSource"&gt; 71&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; length _declprvh _declprvt $14 ;&lt;/P&gt;&lt;P class="sasSource"&gt; 72&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then set test.have(keep=declaration obspnt rename=(declaration=_decl));&lt;/P&gt;&lt;P class="sasSource"&gt; 73&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash h1 (dataset:'test.have(keep=declaration obspnt rename=(declaration=_decl) obs=max ))', duplicate:'r',&lt;/P&gt;&lt;P class="sasSource"&gt; 73&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ! ordered: 'yes', hashexp:20);&lt;/P&gt;&lt;P class="sasSource"&gt; 74&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hiter h1p('h1');&lt;/P&gt;&lt;P class="sasSource"&gt; 75&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineKey('_decl');&lt;/P&gt;&lt;P class="sasSource"&gt; 76&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineData('_decl','obspnt');&lt;/P&gt;&lt;P class="sasSource"&gt; 77&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineDone();&lt;/P&gt;&lt;P class="sasSource"&gt; 78&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash h2 ( ordered: 'yes', duplicate:'r', hashexp:20);&lt;/P&gt;&lt;P class="sasSource"&gt; 79&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hiter h2p('h2');&lt;/P&gt;&lt;P class="sasSource"&gt; 80&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h2.defineKey('_decl');&lt;/P&gt;&lt;P class="sasSource"&gt; 81&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h2.defineData('_decl','obspnt');&lt;/P&gt;&lt;P class="sasSource"&gt; 82&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h2.defineDone();&lt;/P&gt;&lt;P class="sasSource"&gt; 83&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(_decl,obspnt);&lt;/P&gt;&lt;P class="sasSource"&gt; 84&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 85&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rcp=h1p.last();&lt;/P&gt;&lt;P class="sasSource"&gt; 86&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while ( not _rcp) ;&lt;/P&gt;&lt;P class="sasSource"&gt; 87&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _declprvt=substr(_decl,1,14);&lt;/P&gt;&lt;P class="sasSource"&gt; 88&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; If _declprvt not = _declprvh then _rc=h2.add( );&lt;/P&gt;&lt;P class="sasSource"&gt; 89&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _declprvh=_declprvt;&lt;/P&gt;&lt;P class="sasSource"&gt; 90&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rcp=h1p.prev();&lt;/P&gt;&lt;P class="sasSource"&gt; 91&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P class="sasSource"&gt; 92&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* Create output data set from hash object */&lt;/P&gt;&lt;P class="sasSource"&gt; 93&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc = h2.output(dataset:'test.want2(rename=(_decl=declaration))');&lt;/P&gt;&lt;P class="sasSource"&gt; 94&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P class="sasSource"&gt; 95&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 96&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* your program code on class dataset&amp;nbsp; */&lt;/P&gt;&lt;P class="sasSource"&gt; 97&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote6_1412800904732"&gt; NOTE: There were 13482024 observations read from the data set TEST.HAVE.&lt;/P&gt;&lt;P class="sasWarning" id="sasLogWarning1_1412800904732"&gt; WARNING: Hash Object DATASET option should be used when specifying the DUPLICATE option.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote7_1412800904732"&gt; NOTE: The data set TEST.WANT2 has 1000000 observations and 2 variables.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote8_1412800904732"&gt; NOTE: There were 19 observations read from the data set SASHELP.CLASS.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote9_1412800904732"&gt; NOTE: The data set WORK.NOTWANT has 19 observations and 6 variables.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote10_1412800904732"&gt; NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1:21.50&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; user cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 30.95 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; system cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 47.99 seconds&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1523263.51k&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OS Memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1547520.00k&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Timestamp&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 08-10-2014 04:41:44 PM&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Step Count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 44&amp;nbsp; Switch Count&amp;nbsp; 154&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Faults&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Reclaims&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 98625&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Page Swaps&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Voluntary Context Switches&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 699&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Involuntary Context Switches&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1111&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Block Input Operations&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Block Output Operations&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 47656&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 08 Oct 2014 20:53:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176714#M45305</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-10-08T20:53:01Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Most efficient way to eliminate duplicate</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176715#M45306</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I know realize I should have said it before, but I have to order the table because I then merge it to another table by this identifier.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So as I understand your message (I do my best but it is a bit technical for me), the second method does that, the first not, so that if I add a proc sort to the first method, the second one will be more efficient ? &lt;/P&gt;&lt;P&gt;Will nevertheless the first one with a proc sort after be more efficient that what I proposed ? &lt;/P&gt;&lt;P&gt;I will try both, but I do not know at all if the hash table solution will be considered understandable enough to be validated.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Oct 2014 06:41:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Most-efficient-way-to-eliminate-duplicate/m-p/176715#M45306</guid>
      <dc:creator>Aboiron</dc:creator>
      <dc:date>2014-10-09T06:41:14Z</dc:date>
    </item>
  </channel>
</rss>

