I have a join on a couple tables that takes FOREVER to run. Way way longer than it should and one of the notes I haven't noticed before states:
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
I'm not sure what that means exactly but a quick google indicates that it probably isn't what I'm trying to accomplish.
PROC SQL;
CREATE TABLE WORK.TABLE3 AS
SELECT t1.*,
t2.NAME,
t2.DATE
FROM WORK.TABLE1 t1 INNER JOIN
WORK.TABLE2 t2 ON (CASE WHEN t1.CAT1 NOT IN ('CATA', 'CATB') THEN (t1.ID1 = t2.ID1 AND t1.ID2= t2.ID2 AND t2.ID3 = t1.ID3)
WHEN t1.CAT1 IN ('CATA', 'CATB') THEN (t1.ID1 = t2.ID1 AND t1.ID2 = t2.ID2)
END);
QUIT;
Is this an acceptable way to join? Is there a better way?
Thanks in advance.
CASE in a join is not a good idea. Just do a normal join:
PROC SQL;
CREATE TABLE WORK.TABLE3 AS
SELECT t1.*,
t2.NAME,
t2.DATE
FROM WORK.TABLE1 t1
INNER JOIN WORK.TABLE2 t2
ON (t1.CAT1 NOT IN ('CATA', 'CATB') and t1.ID1 = t2.ID1 AND t1.ID2= t2.ID2 AND t2.ID3 = t1.ID3)
or (t1.CAT1 IN ('CATA', 'CATB') and t1.ID1 = t2.ID1 AND t1.ID2 = t2.ID2)
;
QUIT;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.