Proc sql;
create table JOIN as
select
a.key1, a.x1, a.x2, a.x3,
b.y1, b.y2, b.y3, b.y4,
c.z1, c.z2, c.z3
d.p1, d.p2
e.t1, e.t2, e.t3, e.t4,
f.n1, f.n2,
g.w1, g.w2, g.w3
from DATASET1 a
join DATASET2 b
on b.y1 between '01aug2017'd and '31aug2017'd
and a.key1 = b.key1
and b.y3 = 'D'
and b.y4 = 'P'
join DATASET3 c
on b.y3 = c.z1
join DATASET4 d
on c.key1 = d.key1
and d.p2 = '31jul2017'd
join DATASET5 e
on d.key2 = e.key1
join DATASET6 g
on g.w1 in ('123')
and e.t2 between '01aug2017'd and '31aug2017'd
join DATASET7 f
on c.key2 = f.key1
where a.x1 between '01aug2017'd and '31aug2017'd
and a.x2 = 'C'
and a.x3 in ('123');
Quit;
Hey everyone, I'm fairly new to SAS. I'm trying to join multiple datasets together on several different keys, and meeting certain criteria. I've posted my code and it does exactly what I want it to do... in about 80 seconds. Some of the tables (namely table e) are HUGE. Here's my issue: I need to cut down on the process time, because I need to loop this process several thousand times. I have the macro to do it, but it would take too long. I've read online that hash tables might be the solution but I can't wrap my head around how to code it. Any help would be appreciated! Thanks. (Using SAS EG 7.1)
Your problem is not the time of the join, but that you loop it 1000 times. There's a fair chance that what you do can be solved by by-group processing.
Your query can be rewritten as:
Proc sql;
create table JOIN as
select
a.key1, a.x1, a.x2, a.x3,
b.y1, b.y2, b.y3, b.y4,
c.z1, c.z2, c.z3
d.p1, d.p2
e.t1, e.t2, e.t3, e.t4,
f.n1, f.n2,
g.w1, g.w2, g.w3
from
DATASET1 a join
DATASET2 b on a.key1 = b.key1 join
DATASET3 c join
DATASET4 d on c.key1 = d.key1 join
DATASET5 e on d.key2 = e.key1 join
DATASET7 f on c.key2 = f.key1 join
DATASET6 g
where
a.x2 = 'C' and
b.y3 = 'D' and
b.y4 = 'P' and
c.z1 = 'D' and
a.x3 in ('123') and
g.w1 in ('123') and
d.p2 = '31jul2017'd and
a.x1 between '01aug2017'd and '31aug2017'd and
b.y1 between '01aug2017'd and '31aug2017'd and
e.t2 between '01aug2017'd and '31aug2017'd
;
Quit;
which means that you are actually asking for
(a*b) x (c*d*e*f) x g
where * is a join and x is a cartesian product. Are you certain that this is what you want?
I agree with @Kurt_Bremser.
* What varies between the 1000 runs? *
If you read the same tables several times, sort them (and/or index them, depending on what is needed) so they are ready to use by the merge process.
Without knowing the details of your data structures, at this point, your best bet would b to include indexes on join keys and on where condition variables, as @ChrisNZ suggested.
More importantly, you should reconsider your need to execute this query thousands of times. There is most likely a more efficient way.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.