<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to the solve the look solution of this problem with very very large dataset? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236282#M308593</link>
    <description>&lt;P&gt;I'm not sure if I can help you, but I think it would be useful if you could provide the community more information about your problem:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Which platform are you working on? Windows PC, Workstation, Unix server, Mainframe, ...?&lt;/LI&gt;
&lt;LI&gt;Which SAS products are installed at your site? Only Base SAS 9.2?&lt;/LI&gt;
&lt;LI&gt;How much RAM and disk space are available?&lt;/LI&gt;
&lt;LI&gt;What is the size of the two datasets in GB?&lt;/LI&gt;
&lt;LI&gt;Is either of them compressed?&lt;/LI&gt;
&lt;LI&gt;Can you give an estimate as to what percentage of the 250 million records are from CLIENT_IDs contained in dataset1?&lt;/LI&gt;
&lt;LI&gt;Is dataset1 a kind of master dataset (e.g. with names, addresses etc. of the clients) and, if so, why isn't it sorted by CLIENT_ID?&lt;/LI&gt;
&lt;LI&gt;Is this a one-time task or will it have to be done regularly?&lt;/LI&gt;
&lt;LI&gt;Do you have empirical values (run times, resource consumption) from similar tasks (but with smaller datasets) in the past?&lt;/LI&gt;
&lt;LI&gt;Have you considered the use of indexes?&lt;/LI&gt;
&lt;/OL&gt;</description>
    <pubDate>Tue, 24 Nov 2015 21:58:37 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2015-11-24T21:58:37Z</dc:date>
    <item>
      <title>How to the solve the look solution of this problem with very very large dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236267#M308592</link>
      <description>&lt;P&gt;How to the solve the look solution of this problem with &lt;STRONG&gt;very very large datase&lt;/STRONG&gt;t?&lt;/P&gt;
&lt;P&gt;I have a two datasets, dataset1 and dataset2.&lt;/P&gt;
&lt;P&gt;dataset1 has&amp;nbsp;&lt;/P&gt;
&lt;P&gt;client_id var1_var25 (so in total 26 variables)/8client_id is the key and is unique but not ordered */&lt;/P&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;6&lt;/P&gt;
&lt;P&gt;9&lt;/P&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;dataset2 has &amp;nbsp;duplicate client_ids/* obviously coz the dataset has daily balances of bank customers for the last so many months*/&lt;/P&gt;
&lt;P&gt;client_id &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;date &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;daily_balances&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 14/10/2014 &amp;nbsp; &amp;nbsp;5000 &amp;nbsp; &amp;nbsp; /*some date and daily balances of clients until end of file*/&lt;/P&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;3&lt;/P&gt;
&lt;P&gt;2&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Q1. I need to simply to do a left join to get the daily balances in dataset1 coz i want the daily balances of only those client_id's of dataset1? The problem is that that dataset2 has over 250 million records and dataset1 has a million records. Any &lt;STRONG&gt;efficient solution &lt;/STRONG&gt;with SAS 9.2 please?&lt;/P&gt;
&lt;P&gt;Q2. I want the last balance of each client_id, just as simple as to sort and last.id after the join?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have no clue how to manage the look up dataset1&lt;STRONG&gt;(with 1 million records) to &amp;nbsp;dataset2(with 250 million records)&lt;/STRONG&gt;. I'd appreciate any help please.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Nov 2015 20:02:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236267#M308592</guid>
      <dc:creator>CharlotteCain</dc:creator>
      <dc:date>2015-11-24T20:02:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to the solve the look solution of this problem with very very large dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236282#M308593</link>
      <description>&lt;P&gt;I'm not sure if I can help you, but I think it would be useful if you could provide the community more information about your problem:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Which platform are you working on? Windows PC, Workstation, Unix server, Mainframe, ...?&lt;/LI&gt;
&lt;LI&gt;Which SAS products are installed at your site? Only Base SAS 9.2?&lt;/LI&gt;
&lt;LI&gt;How much RAM and disk space are available?&lt;/LI&gt;
&lt;LI&gt;What is the size of the two datasets in GB?&lt;/LI&gt;
&lt;LI&gt;Is either of them compressed?&lt;/LI&gt;
&lt;LI&gt;Can you give an estimate as to what percentage of the 250 million records are from CLIENT_IDs contained in dataset1?&lt;/LI&gt;
&lt;LI&gt;Is dataset1 a kind of master dataset (e.g. with names, addresses etc. of the clients) and, if so, why isn't it sorted by CLIENT_ID?&lt;/LI&gt;
&lt;LI&gt;Is this a one-time task or will it have to be done regularly?&lt;/LI&gt;
&lt;LI&gt;Do you have empirical values (run times, resource consumption) from similar tasks (but with smaller datasets) in the past?&lt;/LI&gt;
&lt;LI&gt;Have you considered the use of indexes?&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Tue, 24 Nov 2015 21:58:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236282#M308593</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2015-11-24T21:58:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to the solve the look solution of this problem with very very large dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236284#M308594</link>
      <description>&lt;P&gt;There would be options creating indexes and eventually also storing the data using the SPDE engine.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But in the end I believe that taking a "traditional" approach and sorting/merging &amp;nbsp;the data will be fastest in your case and give you the most flexibility to answer multiple questions downstream.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In case you only need the latest record per customer from dataset2 then what could work is to first write ID and latest date to a hash table (1 pass through the data) and then in a second pass only select the records matching with the hash. Write these records to a new table and then sort this new table with the reduced volumes.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Nov 2015 22:12:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-the-solve-the-look-solution-of-this-problem-with-very/m-p/236284#M308594</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2015-11-24T22:12:32Z</dc:date>
    </item>
  </channel>
</rss>

