<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Efficiently Merging Two Large Data Sets With Some Common Variables (SAS 9.4) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431464#M281714</link>
    <description>&lt;P&gt;I am trying to find an efficient way to merge two very large data sets. I am new to SAS and I have only ever merged small data sets, so I am not sure where to start.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The first data set, home._2004, contains 45 variables and 67 million observations.&lt;/P&gt;&lt;P&gt;The second data set, home.c2final, contains 73 variables and 24 million observations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The common variables are Year, MSA, CountyCode, state, and Loan_Amount.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the code I would have used if the data wasn't so large:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA home.testmerge;
MERGE home._2004 home.c2final;
BY Year MSA CountyCode state Loan_Amount;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;As you can see, I have very little experience with SAS. From the research I done, there are more efficient ways to merge data of this size, but I do not know where to start with those methods.&amp;nbsp;I was trying to find some way to merge the data using groups of observations from the home._2004 data, but I don't know how to set it up so that after it's done with one group of observations it movies to the next.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For further clarification here are some sample observations from each set. Like I mentioned, I am very new to SAS, so I did not know if there was an easier way to display the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Home._2004 Sample:&lt;/P&gt;&lt;P&gt;file:///D:/Research/2004sample.html&lt;/P&gt;&lt;P&gt;Home.c2final Sample:&lt;/P&gt;&lt;P&gt;file:///D:/Research/c2finalsample.html&lt;/P&gt;</description>
    <pubDate>Fri, 26 Jan 2018 22:42:40 GMT</pubDate>
    <dc:creator>Bennettr99</dc:creator>
    <dc:date>2018-01-26T22:42:40Z</dc:date>
    <item>
      <title>Efficiently Merging Two Large Data Sets With Some Common Variables (SAS 9.4)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431464#M281714</link>
      <description>&lt;P&gt;I am trying to find an efficient way to merge two very large data sets. I am new to SAS and I have only ever merged small data sets, so I am not sure where to start.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The first data set, home._2004, contains 45 variables and 67 million observations.&lt;/P&gt;&lt;P&gt;The second data set, home.c2final, contains 73 variables and 24 million observations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The common variables are Year, MSA, CountyCode, state, and Loan_Amount.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the code I would have used if the data wasn't so large:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA home.testmerge;
MERGE home._2004 home.c2final;
BY Year MSA CountyCode state Loan_Amount;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;As you can see, I have very little experience with SAS. From the research I done, there are more efficient ways to merge data of this size, but I do not know where to start with those methods.&amp;nbsp;I was trying to find some way to merge the data using groups of observations from the home._2004 data, but I don't know how to set it up so that after it's done with one group of observations it movies to the next.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For further clarification here are some sample observations from each set. Like I mentioned, I am very new to SAS, so I did not know if there was an easier way to display the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Home._2004 Sample:&lt;/P&gt;&lt;P&gt;file:///D:/Research/2004sample.html&lt;/P&gt;&lt;P&gt;Home.c2final Sample:&lt;/P&gt;&lt;P&gt;file:///D:/Research/c2finalsample.html&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jan 2018 22:42:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431464#M281714</guid>
      <dc:creator>Bennettr99</dc:creator>
      <dc:date>2018-01-26T22:42:40Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Merging Two Large Data Sets With Some Common Variables (SAS 9.4)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431468#M281715</link>
      <description>&lt;P&gt;You wrote&lt;EM&gt;=&amp;nbsp; &amp;nbsp;"&amp;nbsp;The common variables are Year, MSA, CountyCode, state, and Loan_Amount. "&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Why you are merging only &lt;STRONG&gt;by year? and not by&amp;nbsp;Year, MSA, CountyCode, state, Loan_Amount.(is it the right order?)&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you post a small sample of your req like 5-6 records? Do you have any duplicate entries in any of the tables? If you can let us know more, I am sure you will get very good responses&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jan 2018 22:09:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431468#M281715</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-01-26T22:09:52Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently Merging Two Large Data Sets With Some Common Variables (SAS 9.4)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431524#M281717</link>
      <description>If both your data sets are sorted by your BY variables, MERGE is the most efficient way. &lt;BR /&gt;What is your expected hit rate on the merge, and so you wish to keep alla observations from both data set in the result?&lt;BR /&gt;Also I wonder about your data, MERGE on I individual loan amounts sounds a bit....odd. That sounds like measure, not a key variable.</description>
      <pubDate>Sat, 27 Jan 2018 08:58:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-Merging-Two-Large-Data-Sets-With-Some-Common/m-p/431524#M281717</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2018-01-27T08:58:32Z</dc:date>
    </item>
  </channel>
</rss>

