Thanks @Kurt_Bremser for kind reponse . Hi @Patrick , please find the details below 1. Name of source tables (assumed all stored as SAS tables, else please specify if any database involved) All stored as SAS tables 2. Number of rows per source table and eventually also the size per table Size Library of Size of Name Member Name File File ------------------------------------------------------------- XXX AAA 2GB 1931520K XXX BBB 102GB 1.0673E8K XXX CCC 1GB 1445376K XXX DDD 3GB 2657024K XXX EEE 1GB 1227008K XXX FFF 2GB 2466048K Table_Name Record count AAA - 22758772 BBB - 8.1021E8 CCC - 28754734 DDD - 27547919 EEE - 24549490 FFF - 24499890 3. The code of the actual join which you've got working already for lower volumes (so we can understand the join logic) tried joining small tables and placed in one permanent dataset and sorted large dataset and placed in one permanent dataset. and tried merging.but it takes more time and insufficient space issue code for actual join is PROC SQL;
CREATE TABLE joined1 AS
SELECT t1.AAAAAAAAAAA,
t1.PPPPPPPPPPPP,
t1.CCCCCCCCCCCC,
t1.DDDDDD,
t1.TTTTTTTTTTT,
t1.SSSSSSSSS,
t1.GGGGGGGGGGGGG,
t2.KKKKKKKKKKKK,
t2.FFFFFFFFFFFF,
t2.YYYYYYYY,
t2.RRRRRRR,
t2.EEEEEEEEEEEE,
t2.UUUUUUUUUUUUU,
t2.ZZZZZZZZZZZZZZZZ,
t2.LLLLLLLLLLLL,
t3.JJJJJJJJJ,
t3.NNNNNNNNNNNNNN,
t4.OOOOOOOOOOOO,
t4.VVVVVVVVVVVV,
FROM AAA t1
left JOIN FFF t2 ON (t1.AAAAAAAAAAA = t2.AAAAAAAAAAA)
left JOIN EEE t3 ON (t2.KKKKKKKKKKKK = t3.KKKKKKKKKKKK)
left JOIN CCC t4 ON (t3.JJJJJJJJJ =
t4.JJJJJJJJJ);
QUIT;
PROC SQL;
CREATE TABLE joined2 AS
SELECT t1.*,
t2.YYYYYYYYYYYYYYY,
t2.SAMPLE,
t2.SAMPLE1,
t2.SAMPLE2,
t2.SAMPLE3,
t2.SAMPLE4,
t2.SAMPLE5,
t2.SAMPLE6,
t2.SAMPLE7,
t2.SAMPLE8,
t2.SAMPLE9,
t2.SAMPLE10
FROM joined1 t1
left JOIN DDD t2 ON (t1.OOOOOOOOOOOO = t2.OOOOOOOOOOOO);
QUIT;
proc sort data=xx.BBB out=BBB;(large dataset)
by YYYYYYYYYYYYYYY QQQQQQQQQQ;
run;
proc sort data=joined2;
by YYYYYYYYYYYYYYY;
run;
data joined3;
merge joined2(in=ina) BBB(in=inb);
by YYYYYYYYYYYYYYY;
if ina;
run;
4. The length of the variables used for joining (so we can estimate memory consumption for any in-memory approach) the length of the columns i m using to join the columns are : 16,23,32,28,23 please excuse me for mentioning in code as table names and columns names as sudo values please 5. Is there any need for the resulting target table to be sorted in a specific way for further downstream processing Yes The resultant target table further needs to be subset-ted to 3 different dataset based on different conditions please
... View more