<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Effective Merging Techniques in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369355#M88168</link>
    <description>&lt;P&gt;As &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt; said, we need to know more, especially the type of join.&lt;/P&gt;
&lt;P&gt;How many records do you expect in the output? 500k? 750m?&lt;/P&gt;
&lt;P&gt;Is the key in A unique?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;These are SAS data sets?&lt;/P&gt;
&lt;P&gt;How many variables in each input dataset? How many variables do you want out?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;None of the data sets is sorted?&lt;/P&gt;
&lt;P&gt;Any indexes?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Jun 2017 22:15:56 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2017-06-21T22:15:56Z</dc:date>
    <item>
      <title>Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369286#M88150</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have come across a situation where i need to merge two transactional datasets. One DS has about 500,000 observations and has ACCT_IND as primary key. The second dataset B has about 750 million observations.&amp;nbsp;I need to merge these based on ACCT_IND. Can someone suggest the best way to do it since if I merge them I will need to sort the datasets first and if I join them using sql it will be very huge.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone demonstrate a better method to do this?&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jun 2017 19:11:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369286#M88150</guid>
      <dc:creator>adityaa9z</dc:creator>
      <dc:date>2017-06-21T19:11:35Z</dc:date>
    </item>
    <item>
      <title>Re: Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369314#M88155</link>
      <description>&lt;P&gt;Can you explain in more detail what you need? &amp;nbsp;&lt;/P&gt;
&lt;P&gt;What type of merge do you need? &amp;nbsp;For example are you using the&amp;nbsp;&lt;SPAN&gt;ACCT_IND from the second dataset to lookup attributes of that account from the first dataset? &amp;nbsp;Is it a one-to-one match ? Or a one-to-many?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Is this just for a report that you need to run or are you creating and/or updating a new table based on the merge?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What do you mean by "transactional" datasets? Do you mean that they change frequently? &amp;nbsp;Do you need to re-run the whole dataset or can you just operate on the recrods that are changing? &amp;nbsp;Or do you mean that the second datasets represents modifications (or transactions) to be applied to the first dataset?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Were are these dataset actually managed? Are they only being stored in SAS datasets or do you have a some type of database or data warehouse where the data really lives?&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jun 2017 20:10:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369314#M88155</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2017-06-21T20:10:18Z</dc:date>
    </item>
    <item>
      <title>Re: Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369346#M88164</link>
      <description>&lt;P&gt;I am assuming that the smaller 500,000 observation dataset is unique at the ACCT_ID level.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The methodology is to split the larger files into 10 smaller datasets and append the above smaller dataset to each piece. To test the code, I added names to the 'small' dataset and transaction values to the 'large' dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data small;
input ACCT_IND $ name $;
datalines;
111 jon
123 bob
124 mary
451 lisa
333 sue
;
run;

data large;
input ACCT_IND $ sales;
datalines;
111 10
111 15
111 13
123 18
123 16
123 17
124 19
124 22
451 11
451 8
451 17
451 15
333 14
333 12
333 26
333 22
333 25
;
run;

*Number of records in large dataset;
%let records = 17; *750000000;

*750 million record dataset - Split prior to merging to gain speed when merging;
data huge_dataset_0 huge_dataset_1 huge_dataset_2 huge_dataset_3 huge_dataset_4 huge_dataset_5 huge_dataset_6 huge_dataset_7 huge_dataset_8 huge_dataset_9;
set large;
if _n_ &amp;lt;= &amp;amp;records./10 then output huge_dataset_0;
else if _n_ &amp;lt;= 2/10*&amp;amp;records. then output huge_dataset_1;
else if _n_ &amp;lt;= 3/10*&amp;amp;records. then output huge_dataset_2;
else if _n_ &amp;lt;= 4/10*&amp;amp;records. then output huge_dataset_3;
else if _n_ &amp;lt;= 5/10*&amp;amp;records. then output huge_dataset_4;
else if _n_ &amp;lt;= 6/10*&amp;amp;records. then output huge_dataset_5;
else if _n_ &amp;lt;= 7/10*&amp;amp;records. then output huge_dataset_6;
else if _n_ &amp;lt;= 8/10*&amp;amp;records. then output huge_dataset_7;
else if _n_ &amp;lt;= 9/10*&amp;amp;records. then output huge_dataset_8;
else output huge_dataset_9;
run;

proc sort data = small;
by ACCT_IND;
run;

*Sort the large datasets and append the smaller dataset to each;
%macro sort(i);
proc sort data = huge_dataset_&amp;amp;i.;
by ACCT_IND;
run;

data large_&amp;amp;i.;
merge small ( in = a )
      huge_dataset_&amp;amp;i. ( in = b );
by ACCT_IND;
if b then output;
run;
%mend;

%sort(i=0); %sort(i=1); %sort(i=2); %sort(i=3); %sort(i=4); 
%sort(i=5); %sort(i=6); %sort(i=7); %sort(i=8); %sort(i=9);

*Bring the dataset back together;
data huge_dataset;
set large_0 large_1 large_2 large_3 large_4 
    large_5 large_6 large_7 large_8 large_9;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 21 Jun 2017 21:45:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369346#M88164</guid>
      <dc:creator>Rwon</dc:creator>
      <dc:date>2017-06-21T21:45:26Z</dc:date>
    </item>
    <item>
      <title>Re: Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369355#M88168</link>
      <description>&lt;P&gt;As &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt; said, we need to know more, especially the type of join.&lt;/P&gt;
&lt;P&gt;How many records do you expect in the output? 500k? 750m?&lt;/P&gt;
&lt;P&gt;Is the key in A unique?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;These are SAS data sets?&lt;/P&gt;
&lt;P&gt;How many variables in each input dataset? How many variables do you want out?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;None of the data sets is sorted?&lt;/P&gt;
&lt;P&gt;Any indexes?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jun 2017 22:15:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369355#M88168</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-06-21T22:15:56Z</dc:date>
    </item>
    <item>
      <title>Re: Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369605#M88271</link>
      <description>&lt;P&gt;See response below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How many records do you expect in the output? ---&lt;FONT color="#FF0000"&gt; 500 k&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Is the key in A unique? &amp;nbsp;- &lt;FONT color="#FF0000"&gt;Key(ACCT_NO) in A is unique and it has different transactions tied to it in B.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;These are SAS data sets? - &lt;FONT color="#FF0000"&gt;Yes&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;How many variables in each input dataset? How many variables do you want out? &amp;nbsp;- C&lt;FONT color="#FF0000"&gt;lose to 10 variables in A, 950 in B. I want all from A and 3-4 from B&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;None of the data sets is sorted? &lt;FONT color="#FF0000"&gt;- Nope.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Any indexes? - &lt;FONT color="#FF0000"&gt;Nope&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2017 17:50:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369605#M88271</guid>
      <dc:creator>adityaa9z</dc:creator>
      <dc:date>2017-06-22T17:50:40Z</dc:date>
    </item>
    <item>
      <title>Re: Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369607#M88272</link>
      <description>Wow. This is pretty close to what I was looking for. Thanks a bunch. I will try this.&lt;BR /&gt;&lt;BR /&gt;Can we merge using hash technique?</description>
      <pubDate>Thu, 22 Jun 2017 17:53:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369607#M88272</guid>
      <dc:creator>adityaa9z</dc:creator>
      <dc:date>2017-06-22T17:53:02Z</dc:date>
    </item>
    <item>
      <title>Re: Effective Merging Techniques</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369730#M88306</link>
      <description>&lt;P&gt;This merge ran in 5 min on my machine.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A(keep=KEY A1-A10)
     B(keep=KEY B1-B10);
  length A1-A10 B1-B10 8;
  do I= 1 to 750e6;
    KEY=ranuni(0);
    if KEY &amp;lt; 500 / 750000 then output A; 
    output B;
  end;
run;

data WANT;
  if _N_=1 then do;
    dcl hash A(dataset:'A');
    A.definekey('KEY');
    A.definedata('A1','A2','A3','A4','A5','A6','A7','A8','A9','A10');
    A.definedone();
    if 0 then set A;
  end;
  set B(keep=KEY B1 B2);
  RC=A.find();
  if RC=0;
run;   &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;NOTE: There were 500029 observations read from the data set WORK.A.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;NOTE: There were 750000000 observations read from the data set WORK.B.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;NOTE: The data set WORK.WANT has 500029 observations and 14 variables.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;NOTE: DATA statement used (Total process time):&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5:48.32&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; user cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5:10.08&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; system cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 37.69 seconds&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 71702.09k&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OS Memory&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 98908.00k&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Timestamp&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 23/06/2017 10:57:57 AM&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Step Count&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4411&amp;nbsp; Switch Count&amp;nbsp; 665&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2017 23:05:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Effective-Merging-Techniques/m-p/369730#M88306</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-06-22T23:05:32Z</dc:date>
    </item>
  </channel>
</rss>

