<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Joining Large Tables in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522870#M4533</link>
    <description>&lt;P&gt;There are at least three occurences of "123" in your transaction dataset, so it surely is not unique there.&lt;/P&gt;
&lt;P&gt;By "unique" I mean that a proc sort nodupkey of the account table by account will not delete any observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The fastest method of joining that is not done with in-memory techniques is sort and data step merge, in my experience.&lt;/P&gt;
&lt;P&gt;If you want to go down the hash path, I suggest to load the smaller table into hash:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data account;
input Account :$3. Date_Open :mmddyy10. Product_Type :$2.;
format Date_Open yymmddd10.;
datalines;
123 1/1/2017 CC
234 2/8/2016 DC
345 3/9/2018 CC
;
run;

data transaction;
input Account :$3. Transaction_ID Transaction_Amt :comma. Posted_Date :mmddyy10. Tran_Code :$1.;
format Posted_Date yymmddd10.;
datalines;
123 1 30 3/4/2000 N
123 2 35 5/3/2001 A
123 3 1,244 4/7/2016 C
234 4 344 9/1/2011 N
234 5 678 4/6/2015 A
234 6 856 4/7/2015 A
345 7 332 9/30/2000 C
345 8 123 8/4/2005 A
345 9 765 4/9/2017 C
345 10 324 9/4/2018 N
345 11 2,344 12/12/2018 A
;
run;

data want;
set transaction;
if _n_ = 0 then set account; /* initializes the structure */
if _n_ = 1
then do;
  declare hash acc (dataset:'account');
  acc.definekey("account");
  acc.definedata("date_open","product_type");
  acc.definedone();
end;
if not acc.find() then output;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 20 Dec 2018 14:21:19 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2018-12-20T14:21:19Z</dc:date>
    <item>
      <title>Joining Large Tables</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522842#M4524</link>
      <description>&lt;P&gt;Hi All-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to join two large tables by a simple join but it has not been&amp;nbsp;possible since&amp;nbsp;the tables&amp;nbsp;are huge and the joining process is taking forever. I need a table with the number of transactions and the amount by account and month level. I have transaction data since 1998.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to optimize a PROC SQL and also tried hash table but i am expert using hash table so not sure what is the best way to this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help is much appreciated&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table 1: Account level data ( It has around 7 Millions of records)&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Account&lt;/TD&gt;&lt;TD&gt;Date_Open&lt;/TD&gt;&lt;TD&gt;Product_Type&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;1/1/2017&lt;/TD&gt;&lt;TD&gt;CC&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;234&lt;/TD&gt;&lt;TD&gt;2/8/2016&lt;/TD&gt;&lt;TD&gt;DC&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;345&lt;/TD&gt;&lt;TD&gt;3/9/2018&lt;/TD&gt;&lt;TD&gt;CC&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table 1:Transaction Level Data (It has&amp;nbsp;millions of records. may have 10 times the account table)&amp;nbsp;.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Account&lt;/TD&gt;&lt;TD&gt;Transaction_ID&lt;/TD&gt;&lt;TD&gt;Transaction_Amt&lt;/TD&gt;&lt;TD&gt;Posted_Date&lt;/TD&gt;&lt;TD&gt;Tran_Code&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 30&lt;/TD&gt;&lt;TD&gt;3/4/2000&lt;/TD&gt;&lt;TD&gt;N&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 35&lt;/TD&gt;&lt;TD&gt;5/3/2001&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1,244&lt;/TD&gt;&lt;TD&gt;4/7/2016&lt;/TD&gt;&lt;TD&gt;C&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;234&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 344&lt;/TD&gt;&lt;TD&gt;9/1/2011&lt;/TD&gt;&lt;TD&gt;N&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;234&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 678&lt;/TD&gt;&lt;TD&gt;4/6/2015&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;234&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 856&lt;/TD&gt;&lt;TD&gt;4/7/2015&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;345&lt;/TD&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 332&lt;/TD&gt;&lt;TD&gt;9/30/2000&lt;/TD&gt;&lt;TD&gt;C&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;345&lt;/TD&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 123&lt;/TD&gt;&lt;TD&gt;8/4/2005&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;345&lt;/TD&gt;&lt;TD&gt;9&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 765&lt;/TD&gt;&lt;TD&gt;4/9/2017&lt;/TD&gt;&lt;TD&gt;C&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;345&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 324&lt;/TD&gt;&lt;TD&gt;9/4/2018&lt;/TD&gt;&lt;TD&gt;N&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;345&lt;/TD&gt;&lt;TD&gt;11&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;$&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2,344&lt;/TD&gt;&lt;TD&gt;12/12/2018&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;DATA test(drop=rc);&lt;BR /&gt;if 0 then set Transaction_Table(where=(PST50_TRAN_CD_EX IN(&amp;amp;TRA_CD.)));&lt;BR /&gt;declare Hash Trans (dataset:"Transaction_Table");&lt;BR /&gt;rc = Trans.DefineKey ('Account');&lt;BR /&gt;rc = Trans.DefineData ('Posted_Date','Transaction_Amt');&lt;BR /&gt;rc = Trans.DefineDone ();&lt;BR /&gt;do until (eof) ;&lt;BR /&gt;set acct end = eof;&lt;BR /&gt;call missing(Transaction_Table);&lt;BR /&gt;rc = Trans.find ();&lt;BR /&gt;if rc=0 then output;;&lt;BR /&gt;end;&lt;BR /&gt;stop;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;PROC SQL;&lt;BR /&gt;CREATE TABLE DD_TABLE_M AS SELECT&lt;BR /&gt;A.account,&lt;BR /&gt;N(B.Posted_Date) AS NUM_TX,&lt;BR /&gt;SUM(B.Transaction_Amt) AS AMT_TX,&lt;BR /&gt;D.YR_MTH_NBR&lt;BR /&gt;FROM acct A, Transaction_Table B, month D&lt;BR /&gt;WHERE A.account = B.account and D.TM_ID = B.TM_ID and&lt;BR /&gt;B.PST50_TRAN_CD_EX IN (SELECT DISTINCT Tran_Code FROM Tran_Code WHERE CUS IN('A','C'));&lt;BR /&gt;QUIT;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Dec 2018 13:02:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522842#M4524</guid>
      <dc:creator>DiegoDiaz</dc:creator>
      <dc:date>2018-12-20T13:02:07Z</dc:date>
    </item>
    <item>
      <title>Re: Joining Large Tables</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522854#M4531</link>
      <description>&lt;P&gt;Is the account table unique with regard to Account? If yes, sort both tables and do a data step merge.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Dec 2018 13:28:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522854#M4531</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-12-20T13:28:10Z</dc:date>
    </item>
    <item>
      <title>Re: Joining Large Tables</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522858#M4532</link>
      <description>&lt;P&gt;Yes, Account is unique id in both table,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data step merge option was not optimal for this since i had to sort the tables before and i took even more time,&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 20 Dec 2018 13:33:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522858#M4532</guid>
      <dc:creator>DiegoDiaz</dc:creator>
      <dc:date>2018-12-20T13:33:38Z</dc:date>
    </item>
    <item>
      <title>Re: Joining Large Tables</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522870#M4533</link>
      <description>&lt;P&gt;There are at least three occurences of "123" in your transaction dataset, so it surely is not unique there.&lt;/P&gt;
&lt;P&gt;By "unique" I mean that a proc sort nodupkey of the account table by account will not delete any observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The fastest method of joining that is not done with in-memory techniques is sort and data step merge, in my experience.&lt;/P&gt;
&lt;P&gt;If you want to go down the hash path, I suggest to load the smaller table into hash:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data account;
input Account :$3. Date_Open :mmddyy10. Product_Type :$2.;
format Date_Open yymmddd10.;
datalines;
123 1/1/2017 CC
234 2/8/2016 DC
345 3/9/2018 CC
;
run;

data transaction;
input Account :$3. Transaction_ID Transaction_Amt :comma. Posted_Date :mmddyy10. Tran_Code :$1.;
format Posted_Date yymmddd10.;
datalines;
123 1 30 3/4/2000 N
123 2 35 5/3/2001 A
123 3 1,244 4/7/2016 C
234 4 344 9/1/2011 N
234 5 678 4/6/2015 A
234 6 856 4/7/2015 A
345 7 332 9/30/2000 C
345 8 123 8/4/2005 A
345 9 765 4/9/2017 C
345 10 324 9/4/2018 N
345 11 2,344 12/12/2018 A
;
run;

data want;
set transaction;
if _n_ = 0 then set account; /* initializes the structure */
if _n_ = 1
then do;
  declare hash acc (dataset:'account');
  acc.definekey("account");
  acc.definedata("date_open","product_type");
  acc.definedone();
end;
if not acc.find() then output;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 20 Dec 2018 14:21:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Joining-Large-Tables/m-p/522870#M4533</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-12-20T14:21:19Z</dc:date>
    </item>
  </channel>
</rss>

