I have a join on a couple tables that takes FOREVER to run. Way way longer than it should and one of the notes I haven't noticed before states:
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
I'm not sure what that means exactly but a quick google indicates that it probably isn't what I'm trying to accomplish.
PROC SQL;
CREATE TABLE WORK.TABLE3 AS
SELECT t1.*,
t2.NAME,
t2.DATE
FROM WORK.TABLE1 t1 INNER JOIN
WORK.TABLE2 t2 ON (CASE WHEN t1.CAT1 NOT IN ('CATA', 'CATB') THEN (t1.ID1 = t2.ID1 AND t1.ID2= t2.ID2 AND t2.ID3 = t1.ID3)
WHEN t1.CAT1 IN ('CATA', 'CATB') THEN (t1.ID1 = t2.ID1 AND t1.ID2 = t2.ID2)
END);
QUIT;
Is this an acceptable way to join? Is there a better way?
Thanks in advance.
CASE in a join is not a good idea. Just do a normal join:
PROC SQL;
CREATE TABLE WORK.TABLE3 AS
SELECT t1.*,
t2.NAME,
t2.DATE
FROM WORK.TABLE1 t1
INNER JOIN WORK.TABLE2 t2
ON (t1.CAT1 NOT IN ('CATA', 'CATB') and t1.ID1 = t2.ID1 AND t1.ID2= t2.ID2 AND t2.ID3 = t1.ID3)
or (t1.CAT1 IN ('CATA', 'CATB') and t1.ID1 = t2.ID1 AND t1.ID2 = t2.ID2)
;
QUIT;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.