BookmarkSubscribeRSS Feed
Jrunner1569
Calcite | Level 5
Proc sql;

create table JOIN as
select 
	a.key1, a.x1, a.x2, a.x3, 
	b.y1, b.y2, b.y3, b.y4,
	c.z1, c.z2, c.z3
	d.p1, d.p2
	e.t1, e.t2, e.t3, e.t4, 
	f.n1, f.n2, 
	g.w1, g.w2, g.w3

from DATASET1 a

	join DATASET2 b
	on b.y1  between '01aug2017'd and '31aug2017'd
	and a.key1 = b.key1
	and b.y3 = 'D'
	and b.y4 = 'P'

	join DATASET3 c
	on b.y3 = c.z1

	join DATASET4 d
	on c.key1 = d.key1
	and d.p2 = '31jul2017'd

	join DATASET5 e
	on d.key2 = e.key1

	join DATASET6 g
	on g.w1 in ('123')
	and e.t2 between '01aug2017'd and '31aug2017'd

	join DATASET7 f
	on c.key2 = f.key1

where a.x1 between '01aug2017'd and '31aug2017'd
and a.x2 = 'C'
and a.x3 in ('123');

Quit;

Hey everyone, I'm fairly new to SAS. I'm trying to join multiple datasets together on several different keys, and meeting certain criteria. I've posted my code and it does exactly what I want it to do... in about 80 seconds. Some of the tables (namely table e) are HUGE. Here's my issue: I need to cut down on the process time, because I need to loop this process several thousand times. I have the macro to do it, but it would take too long. I've read online that hash tables might be the solution but I can't wrap my head around how to code it. Any help would be appreciated! Thanks. (Using SAS EG 7.1)

 

6 REPLIES 6
Kurt_Bremser
Super User

Your problem is not the time of the join, but that you loop it 1000 times. There's a fair chance that what you do can be solved by by-group processing.

PGStats
Opal | Level 21

Your query can be rewritten as:

 

Proc sql;

create table JOIN as
select 
	a.key1, a.x1, a.x2, a.x3, 
	b.y1, b.y2, b.y3, b.y4,
	c.z1, c.z2, c.z3
	d.p1, d.p2
	e.t1, e.t2, e.t3, e.t4, 
	f.n1, f.n2, 
	g.w1, g.w2, g.w3

from 
    DATASET1 a                                join
    DATASET2 b on a.key1 = b.key1 join
    DATASET3 c                                join
    DATASET4 d on c.key1 = d.key1 join
    DATASET5 e on d.key2 = e.key1 join
    DATASET7 f on c.key2 = f.key1   join
    DATASET6 g
    
where 
    a.x2 = 'C'                                 and
    b.y3 = 'D'                                 and
    b.y4 = 'P'                                 and
    c.z1 = 'D'                                 and
    a.x3 in ('123')                            and
    g.w1 in ('123')                            and
    d.p2 = '31jul2017'd                        and
    a.x1 between '01aug2017'd and '31aug2017'd and
    b.y1 between '01aug2017'd and '31aug2017'd and  
    e.t2 between '01aug2017'd and '31aug2017'd
    ;    
Quit;

 which means that you are actually asking for

 

 

(a*b) x (c*d*e*f) x g

 

where * is a join and x is a cartesian product. Are you certain that this is what you want?

PG
Jrunner1569
Calcite | Level 5
So I took your code suggestion (added 'DATASET6 g on a.x3 = g.w1') and it runs about 30 seconds faster (great!) and produces the same result after doing a quick proc-compare. Are there further enhancements now that you know what I'm after?
ChrisNZ
Tourmaline | Level 20

I agree with @Kurt_Bremser.

* What varies between the 1000 runs? *

If you read the same tables several times, sort them (and/or index them, depending on what is needed) so they are ready to use by the merge process.

Jrunner1569
Calcite | Level 5
So essentially these tables are several billions of observations. I start with data from table A, but need the data from table E to be joined to it. E's data can only be gotten from a key field in table D. D's key field can only be gotten by matching on a key field in C....so on until I reach table A.
PGStats
Opal | Level 21

Without knowing the details of your data structures, at this point, your best bet would b to include indexes on join keys and on where condition variables, as @ChrisNZ suggested.

 

More importantly, you should reconsider your need to execute this query thousands of times. There is most likely a more efficient way.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 825 views
  • 0 likes
  • 4 in conversation