<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merge in more efficient way in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773208#M245591</link>
    <description>What is better ?&lt;BR /&gt;left join ExtraInfo(where=(month=2012)) on a.CustID=d.CustID&lt;BR /&gt;&lt;BR /&gt;Or&lt;BR /&gt;left join ExtraInfo&lt;BR /&gt; on a.CustID=d.CustID&lt;BR /&gt;And a.month=2012&lt;BR /&gt;  &lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Sat, 09 Oct 2021 10:33:22 GMT</pubDate>
    <dc:creator>Ronein</dc:creator>
    <dc:date>2021-10-09T10:33:22Z</dc:date>
    <item>
      <title>Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772968#M245467</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;In real life I am trying to join the following data sets:&lt;/P&gt;
&lt;P&gt;data set "Have" has 2 million rows&lt;/P&gt;
&lt;P&gt;data set&amp;nbsp; "Extrainfo" has 50 million rows&lt;/P&gt;
&lt;P&gt;This query of merging taking long long time and I want to ask for offers how to make it more efficient with lower time.&lt;/P&gt;
&lt;P&gt;May you please also show the recommended code , thanks&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data Have;
format date date9.;
Input CustID month date :date9. W1 W2 W3 month_H;
cards;
1 2101 15JAN2021 10 20 30 2011
1 2101 16JAN2021 11 12 13 2011
1 2102 13FEB2021 15 17 19 2012
2 2102 15FEB2021 18 16 14 2011
2 2103 03MAR2021 13 15 12 2101
3 2101 19JAN2021 11 18 15 2102
4 2103 17MAR2021 13 12 11 2101
4 2104 18APR2021 19 18 13 2102
Run;

Data ExtraInfo;
Input CustID month X1 X2 X3;
cards;
1 2011 30 40 50
1 2012 35 45 55
1 2101 25 20 30
1 2102 15 20 40
1 2103 30 23 43
1 2104 19 21 82
2 2011 12 15 17 
2 2012 32 54 28
2 2101 23 21 98
2 2102 43 54 12
2 2103 12 43 12 
2 2104 43 23 25
3 2011 23 42 98
3 2012 87 67 34
3 2101 34 42 21
3 2102 98 23 65
3 2103 23 24 26
3 2104 43 64 48
4 2011 43 54 38
4 2012 32 54 27
4 2101 23 65 28
4 2102 43 26 87
4 2103 54 87 29
4 2104 67 45 53
;
Run;


PROC SQL;
	create table  wanted   as
	select a.*,
     b.X1 as  X1_H ,
	 c.X2  as X2_C,
	 d.X3 as X3_2012 
	from  Have  as a
	left join 	ExtraInfo as b
	on a.CustID=b.CustID  and a.month_H=b.month

	left join 	ExtraInfo as c
	on a.CustID=c.CustID  and a.month=c.month

	left join 	(select * FROM ExtraInfo where month=2012) as d
	on a.CustID=d.CustID  
;
QUIT;
 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 08 Oct 2021 08:59:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772968#M245467</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-10-08T08:59:06Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772974#M245470</link>
      <description>&lt;P&gt;1. Are the tables sorted by CUSTID and MONTH ?&lt;/P&gt;
&lt;P&gt;2. Have you tried hash tables?&lt;/P&gt;
&lt;P&gt;3. The last join is more likely to use sort order written as:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;left join ExtraInfo
	on a.CustID=d.CustID  and d.month=2012&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 08 Oct 2021 09:49:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772974#M245470</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-10-08T09:49:48Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772975#M245471</link>
      <description>&lt;P&gt;If you're on sas viya then proc fedsql is really fast.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;here comes an example code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc fedsql sessref=mysession  _method exec;
create table "mkt"."testa" {options  replication=0 REPLACE=true } as select 
"T1".*, case when ("T2"."numbasti")='' then 0 else 1 end as renewed
from "dna"."from_ended" {options tableID=1} T1 left join "dna"."from_renewed" t2
on t1."numbasti"=t2."numbasti";
quit ;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 08 Oct 2021 09:52:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772975#M245471</guid>
      <dc:creator>acordes</dc:creator>
      <dc:date>2021-10-08T09:52:09Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772988#M245481</link>
      <description>Thanks for your reply,&lt;BR /&gt;1. Are the tables sorted by CUSTID and MONTH ?No(tables are coming to me not sorted)&lt;BR /&gt;2. Have you tried hash tables? (No, may you show a code and I will check the efficient)&lt;BR /&gt;3-left join ExtraInfo&lt;BR /&gt;	on a.CustID=d.CustID  and d.month=2012&lt;BR /&gt;will it be better than the code I provided?</description>
      <pubDate>Fri, 08 Oct 2021 10:10:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772988#M245481</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-10-08T10:10:39Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772990#M245483</link>
      <description>&lt;P&gt;May you please show it (proc fedsql ) on my tables?&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2021 06:35:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772990#M245483</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-10-09T06:35:27Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772991#M245484</link>
      <description>Sort the tables before merging and use the  join I provided for 2012.</description>
      <pubDate>Fri, 08 Oct 2021 10:20:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772991#M245484</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-10-08T10:20:18Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772997#M245487</link>
      <description>And create an index on CUSID MONTH_H</description>
      <pubDate>Fri, 08 Oct 2021 10:42:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/772997#M245487</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-10-08T10:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773186#M245576</link>
      <description>&lt;P&gt;I suspect you can speed things up reading the small dataset (HAVE) twice, and the large dataset (EXTRAINFO) once.&amp;nbsp; The strategy is to store only the needed subset of extrainfo data (x1,x2, and x3 just for the HAVE-matched) in memory-resident hash objects.&amp;nbsp; Then the second reading of have retrieves data from these hash objects.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I.e.:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;read have and create a hash object of its CUSTID values.&amp;nbsp; Call it HAVE_CUSTID&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;read EXTRAINFO, and if the custid is found in HAVE_CUSTID, then
&lt;OL&gt;
&lt;LI&gt;Add its X1 value to hash object HX!,&amp;nbsp; keyed on custid and month_h (using MONTH from EXTRAINFO as MONTH_H in the object).&lt;/LI&gt;
&lt;LI&gt;Add its X2 value to hash object HX2, keyd on custid and month.&lt;/LI&gt;
&lt;LI&gt;If month=2012 then add X3 to hash object HX3, keyed on custid.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;reread HAVE and retrieve X1 X2 and X3 from the relevant hash objects.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I assume that the CUSTIDs in HAVE are a relatively small subset of those in EXTRAINFO.&amp;nbsp; If not, there may be performance benefits in using the HASHEXP option in the hash declaration statements.&amp;nbsp; &amp;nbsp;You could experiment with hashexp values greater than the default 8.&amp;nbsp; See&amp;nbsp; &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lecompobjref/p00ilfw5pzcjvtn1nfya9863fozd.htm" target="_self"&gt;DECLARE Statement: Hash and Hash Iterator Objects&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&amp;nbsp;&lt;A href="https://sasnrd.com/sas-run-time-hash-object-hashexp-size" target="_self"&gt;Run Time Effect Of Hash Object HASHEXP Argument Size&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;More importantly, this also assumes that the observations in EXTRAINFO are unique for each CUSTID/MONTH value pair.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data Have;
format date date9.;
Input CustID month date :date9. W1 W2 W3 month_H;
cards;
1 2101 15JAN2021 10 20 30 2011
1 2101 16JAN2021 11 12 13 2011
1 2102 13FEB2021 15 17 19 2012
2 2102 15FEB2021 18 16 14 2011
2 2103 03MAR2021 13 15 12 2101
3 2101 19JAN2021 11 18 15 2102
4 2103 17MAR2021 13 12 11 2101
4 2104 18APR2021 19 18 13 2102
Run;



Data ExtraInfo;
Input CustID month X1 X2 X3;
cards;
1 2011 30 40 50
1 2012 35 45 55
1 2101 25 20 30
1 2102 15 20 40
1 2103 30 23 43
1 2104 19 21 82
2 2011 12 15 17 
2 2012 32 54 28
2 2101 23 21 98
2 2102 43 54 12
2 2103 12 43 12 
2 2104 43 23 25
3 2011 23 42 98
3 2012 87 67 34
3 2101 34 42 21
3 2102 98 23 65
3 2103 23 24 26
3 2104 43 64 48
4 2011 43 54 38
4 2012 32 54 27
4 2101 23 65 28
4 2102 43 26 87
4 2103 54 87 29
4 2104 67 45 53
;
Run;


data want (rename=(x1=x1_h x2=x2_c x3=x3_2012));
  if _n_=1 then do;

    /* The DATASET option makes this the first reading of HAVE*/
    declare hash have_custid (dataset:'have (keep=custid)');
      have_custid.definekey('custid');
      have_custid.definedone();

    declare hash hx1 ();
      hx1.definekey('custid','month_h');
      hx1.definedata('x1');
      hx1.definedone();
    declare hash hx2 ();
      hx2.definekey('custid','month');
      hx2.definedata('x2');
      hx2.definedone();
    declare hash hx3 ();
      hx3.definekey('custid');
      hx3.definedata('x3');
      hx3.definedone();

    /*Now read EXTRAINFO and populate HX1, HX2, and HX3 */
    do until (end_of_extra);
      set extrainfo end=end_of_extra;
      if have_custid.check()^=0 then continue; /*Skip this iteration of the loop*/
      hx1.add(key:custid,key:month,data:x1);
      hx2.add();
      if month=2012 then hx3.add();
    end;
  end;

  set have;  /*This is the second reading of HAVE*/
  call missing(x1,x2,x3);
  hx1.find();
  hx2.find();
  hx3.find();
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2021 04:20:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773186#M245576</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2021-10-09T04:20:59Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773190#M245580</link>
      <description>Thanks to you &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt; I've finally understood the hash technique!&lt;BR /&gt;I don't think that I'll actually need it because cas actions cas data steps are really fast, but it's nice to have it in the toolset.</description>
      <pubDate>Sat, 09 Oct 2021 05:59:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773190#M245580</guid>
      <dc:creator>acordes</dc:creator>
      <dc:date>2021-10-09T05:59:13Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773191#M245581</link>
      <description>&lt;P&gt;As I understand from you the folowing changes will help:&lt;/P&gt;
&lt;P&gt;1-&lt;SPAN&gt;create an index on CUSID MONTH_H&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Should I create Index in both tables?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2-Sort the tables before merging.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;May I ask why?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Using proc sql join no need to sort before&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;3-use the different code related to 2012&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;May you please tell If I understand correctly and why should sort before using proc sql join&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2021 06:37:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773191#M245581</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-10-09T06:37:19Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773205#M245589</link>
      <description>&lt;P&gt;1. MONTH_H is only in one table isn't it? How could you create the index on both?&lt;/P&gt;
&lt;P&gt;2. Proc SQL does sort for you, but EXTRAINFO is used 3 times and proc SQL might be tempted to sort 3 times.&lt;BR /&gt;&amp;nbsp; &amp;nbsp;Also HAVE needs that sort order for the 3 joins, so you might as well do it once for all.&lt;BR /&gt;&amp;nbsp; &amp;nbsp;Same thing for the index: Since you need that sort order for the join, and since the current performance is not satisfying, add the index and be done with it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;3. I don't understand what the issue is. Just do it.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Another syntax is better (as in faster) than yours is&lt;/P&gt;
&lt;LI-CODE lang="sas"&gt;left join ExtraInfo(where=(month=2012))
	on a.CustID=d.CustID&lt;/LI-CODE&gt;
&lt;P&gt;It's better because you point to the EXTRAINFO table for the join (and not to a select clause) so can use index or sort order.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Last but not least: the best way to learn is to do. &lt;BR /&gt;Get guidance by all means, but try and try again. You have 9 replies; what have what you tried?&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2021 10:17:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773205#M245589</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-10-09T10:17:50Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773208#M245591</link>
      <description>What is better ?&lt;BR /&gt;left join ExtraInfo(where=(month=2012)) on a.CustID=d.CustID&lt;BR /&gt;&lt;BR /&gt;Or&lt;BR /&gt;left join ExtraInfo&lt;BR /&gt; on a.CustID=d.CustID&lt;BR /&gt;And a.month=2012&lt;BR /&gt;  &lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 09 Oct 2021 10:33:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773208#M245591</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2021-10-09T10:33:22Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773237#M245602</link>
      <description>&lt;P&gt;The performance should be the same. The second syntax is fully SQL compliant.&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2021 22:21:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773237#M245602</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-10-09T22:21:25Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773328#M245651</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;I would use only one hash; keeping only one search tree should reduce the amount of memory needed.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
if _n_ = 1 then do;
  declare hash have_custid (dataset:'have (keep=custid)');
  have_custid.definekey('custid');
  have_custid.definedone();
  declare hash hx ();
  hx.definekey('custid','month');
  hx.definedata('x1','x2','x3');
  hx.definedone();
  do until (end_of_extra);
    set extrainfo end=end_of_extra;
    if have_custid.check() ^= 0 then continue;
    hx.add();
  end;
end;
set have;
if hx.find(key:custid,key:month_h) = 0 then x1_h = x1;
if hx.find(key:custid,key:month) = 0 then x2_c = x2;
if hx.find(key:custid,key:2012) = 0 then x3_2012 = x3;
drop x1 x2 x3;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 11 Oct 2021 09:01:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773328#M245651</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-10-11T09:01:32Z</dc:date>
    </item>
    <item>
      <title>Re: Merge in more efficient way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773447#M245713</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think you are right.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Oct 2021 18:33:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-in-more-efficient-way/m-p/773447#M245713</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2021-10-11T18:33:29Z</dc:date>
    </item>
  </channel>
</rss>

