Dear Experts, I will have to join the following two tables. Table one (1 billion rows) Data one;
input snapshotdate sourcekey sourcesystemid value1;
datalines;
20200101 112 5 788
20200102 112 5 789
20200103 112 5 800
20200201 112 5 786
20200202 112 5 777
20200203 112 5 834
20200301 112 5 789
20200302 112 5 771
20200303 112 5 832
20200101 222 6 788
20200102 222 6 789
20200103 222 6 800
20200201 222 6 786
20200202 222 6 777
20200203 222 6 834
20200301 222 6 789
20200302 222 6 771
20200303 222 6 832
;
run; Table Two (32 million rows) Data two;
input period sourcekey sourcesystemid npl;
datalines;
202001 112 5 999
202002 112 5 988
202001 222 6 555
202002 222 6 556
;
run; I would like to get the joined table as below; snapshotdate sourcekey sourcesystemid value1 NPL 20200101 112 5 788 999 20200102 112 5 789 999 20200103 112 5 800 999 20200201 112 5 786 988 20200202 112 5 777 988 20200203 112 5 834 988 20200301 112 5 789 988 20200302 112 5 771 988 20200303 112 5 832 988 20200101 222 6 788 555 20200102 222 6 789 555 20200103 222 6 800 555 20200201 222 6 786 556 20200202 222 6 777 556 20200203 222 6 834 556 20200301 222 6 789 556 20200302 222 6 771 556 20200303 222 6 832 556 When there is a missing (year and month), it has to be filled with the latest available value (bolded), The code I have currently: (not working to replace the missing values) Proc SQL;
Create Table want as
Select
a.*,
b.npl
from one as a
left join two as b
on a.sourcekey =b.sourcekey and a.sourcesystemid = b.sourcesystemid and input(substr(put(a.snapshotdate,8.),1,6),6.) = b.period
order by a.sourcekey,a.snapshotdate
;
Quit; Since it involves large tables, I prefer this to be coded using HASH. I would like to use table two as HASH object. Thanks in advance. Regards, Myu
... View more