<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Most efficient way to merge large data sets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553198#M153803</link>
    <description>&lt;P&gt;I have several large datasets(30GB), and try to merge only&amp;nbsp;few variables from second table to first table&amp;nbsp;based on key variable. But since they are too big, it will take a long time. which method is most efficient to do this.&lt;/P&gt;</description>
    <pubDate>Tue, 23 Apr 2019 12:48:22 GMT</pubDate>
    <dc:creator>babu-in</dc:creator>
    <dc:date>2019-04-23T12:48:22Z</dc:date>
    <item>
      <title>Most efficient way to merge large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553198#M153803</link>
      <description>&lt;P&gt;I have several large datasets(30GB), and try to merge only&amp;nbsp;few variables from second table to first table&amp;nbsp;based on key variable. But since they are too big, it will take a long time. which method is most efficient to do this.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2019 12:48:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553198#M153803</guid>
      <dc:creator>babu-in</dc:creator>
      <dc:date>2019-04-23T12:48:22Z</dc:date>
    </item>
    <item>
      <title>Re: Most efficient way to merge large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553202#M153804</link>
      <description>&lt;P&gt;Are both tables of this size, or do you have one large table and a considerably smaller one used for a lookup?&lt;/P&gt;
&lt;P&gt;Do you have a many-to-one or a many-to-many relationship?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In any case, you need to strive to "travel light", meaning you only keep the variables you need when sorting in preparation for a merge.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2019 12:51:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553202#M153804</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-04-23T12:51:59Z</dc:date>
    </item>
    <item>
      <title>Re: Most efficient way to merge large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553210#M153808</link>
      <description>&lt;P&gt;yes, tables&amp;nbsp;are same size and keys are unique in the both tables.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2019 13:01:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553210#M153808</guid>
      <dc:creator>babu-in</dc:creator>
      <dc:date>2019-04-23T13:01:05Z</dc:date>
    </item>
    <item>
      <title>Re: Most efficient way to merge large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553213#M153810</link>
      <description>&lt;P&gt;Then take a closer look at your data. See if the datasets are compressed, and see if you need all columns in your output or if you can drop some.&lt;/P&gt;
&lt;P&gt;If in doubt, run proc contents on your datasets and post the output.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, please post the log of your merge code, using the {i} button (available when posting Rich Text).&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2019 13:05:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553213#M153810</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-04-23T13:05:31Z</dc:date>
    </item>
    <item>
      <title>Re: Most efficient way to merge large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553214#M153811</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/271602"&gt;@babu-in&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;On this level the best answer I can give is "it depends".&lt;/P&gt;
&lt;P&gt;Assuming both data sets are SAS tables what normally takes the most time is sorting and writing data to disk. That's either a Proc Sort with a data step merge or then a SQL Join. But also the SQL join will sort and write data to disk as intermediary temporary files in UTILLOC.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"&lt;EM&gt;and try to merge only&amp;nbsp;few variables from second table"&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;IF you can fit these few variables into memory then most likely a data step hash lookup would perform best as this doesn't require any sort operations and reduces write operations to disk.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2019 13:05:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Most-efficient-way-to-merge-large-data-sets/m-p/553214#M153811</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-04-23T13:05:48Z</dc:date>
    </item>
  </channel>
</rss>

