<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Request for logical dataset comparison help? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167878#M263953</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Vincent,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you so much indeed for the very quick response in providing me the much needed help. Your solution worked amazingly well and the &lt;STRONG&gt;performance&lt;/STRONG&gt; was &lt;STRONG&gt;simply great. &lt;/STRONG&gt; And sorry for the delayed reply reply from me&amp;nbsp; as I only got the chance to run your solution couple of hours ago.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Much Appreciate it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 10 Dec 2013 17:05:38 GMT</pubDate>
    <dc:creator>MarkWik</dc:creator>
    <dc:date>2013-12-10T17:05:38Z</dc:date>
    <item>
      <title>Request for logical dataset comparison help?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167876#M263951</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hey Folks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have got two datasets, Dataset 1 and Dataset 2&amp;nbsp; with the following variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt;Account_id&amp;nbsp; Markets&amp;nbsp; Sales&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Date&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;110032A&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Medium&amp;nbsp;&amp;nbsp; 200&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20/11/2013&lt;/P&gt;&lt;P&gt;110032A&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Small&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 150&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 21/11/2013&lt;/P&gt;&lt;P&gt;110032A&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Large&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 250&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 22/11/2013&lt;/P&gt;&lt;P&gt;110021B&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Large&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 100&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20/11/20013&lt;/P&gt;&lt;P&gt;110022C&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Medium&amp;nbsp;&amp;nbsp;&amp;nbsp; 230&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 25/11/2013&lt;/P&gt;&lt;P&gt;----and so on 25 million large dataset&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The Dataset 2 also has the &lt;STRONG&gt;same&lt;/STRONG&gt; variables as Dataset 1 and the &lt;STRONG&gt;requirement is that the records and values must match Dataset 1&lt;/STRONG&gt; as its a merely a rewrite from old process. However, I find there are &lt;STRONG&gt;more records&lt;/STRONG&gt; in&amp;nbsp; my Dataset 2 and I would like to make a comparison with dataset 1 to see and identity those records that contribute to &lt;STRONG&gt;excess&lt;/STRONG&gt; in dataset2.&amp;nbsp; There could some new account_id, or there could be new markets for the some existing account ids. The idea is to exclude the new records and create a separate report. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried proc compare, but of no use. I would appreciate a datastep pr proc sql logic. Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;NOTE: The account_id s are not sorted&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 09 Dec 2013 17:48:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167876#M263951</guid>
      <dc:creator>MarkWik</dc:creator>
      <dc:date>2013-12-09T17:48:26Z</dc:date>
    </item>
    <item>
      <title>Re: Request for logical dataset comparison help?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167877#M263952</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I think proc compare requires each set to be sorted but don't take my word on this. I don't know what hardware you are working with but likely, 32 length or so variables could fit in the standard instal of -MEMSIZE 2G to use hash table to save proc sql from achieving a full sort on each set before joining.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So I suggest you try the following:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data old new;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _n_=1 then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash myhash(dataset: 'dataset1');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myhash.definekey('account_id', 'markets', 'sales', 'date');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myhash.definedata(); shouldn't need to definedata unless dataset1 isn't fully included in dataset2 and you wish to create a 3rd dataset for those obscure records */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; myhash.definedone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set dataset2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if myhash.find()=0 then output old;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else output new;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Provided you have enough room in memory to load such a big hash table (so technically if you really only have 2 strings and 2 numeric variables, 32 bytes * 25M records &amp;lt; 1.9G or so that is available to hashing in standard install conditions), this should save you a lot of time from sql joins.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;logic is if record from table2 is found in table1 you output old (all methods return code 0 if they are succesful)&lt;/P&gt;&lt;P&gt;otherwise output new&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Vincent&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 09 Dec 2013 18:00:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167877#M263952</guid>
      <dc:creator>Vince28_Statcan</dc:creator>
      <dc:date>2013-12-09T18:00:17Z</dc:date>
    </item>
    <item>
      <title>Re: Request for logical dataset comparison help?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167878#M263953</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Vincent,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you so much indeed for the very quick response in providing me the much needed help. Your solution worked amazingly well and the &lt;STRONG&gt;performance&lt;/STRONG&gt; was &lt;STRONG&gt;simply great. &lt;/STRONG&gt; And sorry for the delayed reply reply from me&amp;nbsp; as I only got the chance to run your solution couple of hours ago.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Much Appreciate it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 10 Dec 2013 17:05:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Request-for-logical-dataset-comparison-help/m-p/167878#M263953</guid>
      <dc:creator>MarkWik</dc:creator>
      <dc:date>2013-12-10T17:05:38Z</dc:date>
    </item>
  </channel>
</rss>

