BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ADN
Fluorite | Level 6 ADN
Fluorite | Level 6

Hi,

 

I have 4 tables T1,T2,T3 and T4 and also a master table Z. The condition I have to join the tables to the master table is 

 

T1.ID1=Z.ID OR T1.ID2=Z.ID OR T1.ID3=Z.ID OR T1.ID4=Z.ID

 

Is there any way to optimize these type of joins? This join is taking too much time for me(> 1 hr).

 

The tables T1,T2,T3 and T4 have 100k obs and Z has 400k obs approx.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

If the ID's in T1-T4 are mutually exclusive (like they look in your example), I'd first concatenate those datasets:

data t1;
set t1-t4;
run;

and then sort/merge with Z:

proc sort data=T;
by id;
run;

proc sort data=Z;
by id;
run;

data want;
merge
  Z (in=in_z)
  T
;
by id;
if in_z;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

How is the relationship between those tables with regards to id? one-to-one, one-to-many, or many-to-many?

 

Depending on that, and the variables contained, sorting the tables by id and doing a data step merge might be the preferred solution.

 

Could you post some example data that shows the typical structures/contents (see my footnotes for converting datasets to data steps and posting the code). Just enough observations to clarify the issue.

ADN
Fluorite | Level 6 ADN
Fluorite | Level 6

The master table Z is a union of all the ID's present in tables T1/T2/T3 and T4. ID's present in all the 5 tables are disctinct.

 

Unfortunately the data belongs to my client and I cant post it. But here is some sample data for the same.

 

data T1;

input ID $20;

datalines;

1

2

3

;

run;

 

data T2;

input ID $20;

datalines;

4

5

6

;

run;

 

data T3;

input ID $20;

datalines;

7

8

9

;

run;

 

data T4;

input ID $20;

datalines;

10

11

12

;

run;

 

data Z;

input ID $20;

datalines;

1

2

3

4

5

6

7

8

9

10

11

12

;

run;

Kurt_Bremser
Super User

If the ID's in T1-T4 are mutually exclusive (like they look in your example), I'd first concatenate those datasets:

data t1;
set t1-t4;
run;

and then sort/merge with Z:

proc sort data=T;
by id;
run;

proc sort data=Z;
by id;
run;

data want;
merge
  Z (in=in_z)
  T
;
by id;
if in_z;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1786 views
  • 1 like
  • 2 in conversation