BookmarkSubscribeRSS Feed
DiegoDiaz
Calcite | Level 5

Hi All-

 

I have been trying to join two large tables by a simple join but it has not been possible since the tables are huge and the joining process is taking forever. I need a table with the number of transactions and the amount by account and month level. I have transaction data since 1998.

 

I tried to optimize a PROC SQL and also tried hash table but i am expert using hash table so not sure what is the best way to this. 

 

Any help is much appreciated

 

Thanks in advance. 

 

Table 1: Account level data ( It has around 7 Millions of records)

AccountDate_OpenProduct_Type
1231/1/2017CC
2342/8/2016DC
3453/9/2018CC

 

Table 1:Transaction Level Data (It has millions of records. may have 10 times the account table) .

AccountTransaction_IDTransaction_AmtPosted_DateTran_Code
1231 $                            303/4/2000N
1232 $                            355/3/2001A
1233 $                       1,2444/7/2016C
2344 $                          3449/1/2011N
2345 $                          6784/6/2015A
2346 $                          8564/7/2015A
3457 $                          3329/30/2000C
3458 $                          1238/4/2005A
3459 $                          7654/9/2017C
34510 $                          3249/4/2018N
34511 $                       2,34412/12/2018A

 

DATA test(drop=rc);
if 0 then set Transaction_Table(where=(PST50_TRAN_CD_EX IN(&TRA_CD.)));
declare Hash Trans (dataset:"Transaction_Table");
rc = Trans.DefineKey ('Account');
rc = Trans.DefineData ('Posted_Date','Transaction_Amt');
rc = Trans.DefineDone ();
do until (eof) ;
set acct end = eof;
call missing(Transaction_Table);
rc = Trans.find ();
if rc=0 then output;;
end;
stop;
run;


PROC SQL;
CREATE TABLE DD_TABLE_M AS SELECT
A.account,
N(B.Posted_Date) AS NUM_TX,
SUM(B.Transaction_Amt) AS AMT_TX,
D.YR_MTH_NBR
FROM acct A, Transaction_Table B, month D
WHERE A.account = B.account and D.TM_ID = B.TM_ID and
B.PST50_TRAN_CD_EX IN (SELECT DISTINCT Tran_Code FROM Tran_Code WHERE CUS IN('A','C'));
QUIT;

3 REPLIES 3
DiegoDiaz
Calcite | Level 5

Yes, Account is unique id in both table,

 

Data step merge option was not optimal for this since i had to sort the tables before and i took even more time,

Thanks

Kurt_Bremser
Super User

There are at least three occurences of "123" in your transaction dataset, so it surely is not unique there.

By "unique" I mean that a proc sort nodupkey of the account table by account will not delete any observations.

 

The fastest method of joining that is not done with in-memory techniques is sort and data step merge, in my experience.

If you want to go down the hash path, I suggest to load the smaller table into hash:

data account;
input Account :$3. Date_Open :mmddyy10. Product_Type :$2.;
format Date_Open yymmddd10.;
datalines;
123 1/1/2017 CC
234 2/8/2016 DC
345 3/9/2018 CC
;
run;

data transaction;
input Account :$3. Transaction_ID Transaction_Amt :comma. Posted_Date :mmddyy10. Tran_Code :$1.;
format Posted_Date yymmddd10.;
datalines;
123 1 30 3/4/2000 N
123 2 35 5/3/2001 A
123 3 1,244 4/7/2016 C
234 4 344 9/1/2011 N
234 5 678 4/6/2015 A
234 6 856 4/7/2015 A
345 7 332 9/30/2000 C
345 8 123 8/4/2005 A
345 9 765 4/9/2017 C
345 10 324 9/4/2018 N
345 11 2,344 12/12/2018 A
;
run;

data want;
set transaction;
if _n_ = 0 then set account; /* initializes the structure */
if _n_ = 1
then do;
  declare hash acc (dataset:'account');
  acc.definekey("account");
  acc.definedata("date_open","product_type");
  acc.definedone();
end;
if not acc.find() then output;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 828 views
  • 0 likes
  • 2 in conversation