Hi All-
I have been trying to join two large tables by a simple join but it has not been possible since the tables are huge and the joining process is taking forever. I need a table with the number of transactions and the amount by account and month level. I have transaction data since 1998.
I tried to optimize a PROC SQL and also tried hash table but i am expert using hash table so not sure what is the best way to this.
Any help is much appreciated
Thanks in advance.
Table 1: Account level data ( It has around 7 Millions of records)
Account | Date_Open | Product_Type |
123 | 1/1/2017 | CC |
234 | 2/8/2016 | DC |
345 | 3/9/2018 | CC |
Table 1:Transaction Level Data (It has millions of records. may have 10 times the account table) .
Account | Transaction_ID | Transaction_Amt | Posted_Date | Tran_Code |
123 | 1 | $ 30 | 3/4/2000 | N |
123 | 2 | $ 35 | 5/3/2001 | A |
123 | 3 | $ 1,244 | 4/7/2016 | C |
234 | 4 | $ 344 | 9/1/2011 | N |
234 | 5 | $ 678 | 4/6/2015 | A |
234 | 6 | $ 856 | 4/7/2015 | A |
345 | 7 | $ 332 | 9/30/2000 | C |
345 | 8 | $ 123 | 8/4/2005 | A |
345 | 9 | $ 765 | 4/9/2017 | C |
345 | 10 | $ 324 | 9/4/2018 | N |
345 | 11 | $ 2,344 | 12/12/2018 | A |
DATA test(drop=rc);
if 0 then set Transaction_Table(where=(PST50_TRAN_CD_EX IN(&TRA_CD.)));
declare Hash Trans (dataset:"Transaction_Table");
rc = Trans.DefineKey ('Account');
rc = Trans.DefineData ('Posted_Date','Transaction_Amt');
rc = Trans.DefineDone ();
do until (eof) ;
set acct end = eof;
call missing(Transaction_Table);
rc = Trans.find ();
if rc=0 then output;;
end;
stop;
run;
PROC SQL;
CREATE TABLE DD_TABLE_M AS SELECT
A.account,
N(B.Posted_Date) AS NUM_TX,
SUM(B.Transaction_Amt) AS AMT_TX,
D.YR_MTH_NBR
FROM acct A, Transaction_Table B, month D
WHERE A.account = B.account and D.TM_ID = B.TM_ID and
B.PST50_TRAN_CD_EX IN (SELECT DISTINCT Tran_Code FROM Tran_Code WHERE CUS IN('A','C'));
QUIT;