I have two tables that I want to join, call them ID_table and merge_table.
data Id_table;
input Name $ Parent_name $ var1 ;
datalines;
ABC ABC_Co 28.6
HLE ABC_Co 42.1
Ggl_inc google 18.8
APPLE APPL_Inc 123.4
;
run;
data merge_table;
input Comp_Name $ var2 ;
datalines;
ABC_Co 110
Ggl_inc 105.3
APPLE 112.1
XYZ 101
;
run;
I want sas to merge on either: Name = comp_name, or Parent_name = comp_name.
If I do this:
PROC SQL;
CREATE TABLE merge_test AS
SELECT l.*, r.*
FROM Id_table AS l LEFT JOIN merge_table AS r
ON (l.Name = r.Comp_Name or l.Parent_name = r.Comp_Name);
QUIT;
It works. However, my actual data set has 582k obs and 300k obs in both tables, and when I run the above command it seems to take forever and SAS times out.
Is there a more efficient way to program this?
Furthermore, I still want to keep unmerged matches in my data set, but SAS doesn't seem to be doing so in the above test sample (the values for comp_name = XYZ from merge_table are dropped). Is there a way it is included?
thanks!
Maxim 2: Read the Log.
The SQL procedure notifies you that it has to do cartesian joins, which is often a bad sign, performancewise.
Split your join into two, and concatenate the results:
proc sql;
create table merge_test as
select l.*, r.*
from Id_table as l left join merge_table as r
on l.Name = r.Comp_Name
where not missing(r.comp_name)
union
select l.*, r.*
from Id_table as l left join merge_table as r
on l.Parent_name = r.Comp_Name
where not missing(r.comp_name)
;
quit;
Maxim 2: Read the Log.
The SQL procedure notifies you that it has to do cartesian joins, which is often a bad sign, performancewise.
Split your join into two, and concatenate the results:
proc sql;
create table merge_test as
select l.*, r.*
from Id_table as l left join merge_table as r
on l.Name = r.Comp_Name
where not missing(r.comp_name)
union
select l.*, r.*
from Id_table as l left join merge_table as r
on l.Parent_name = r.Comp_Name
where not missing(r.comp_name)
;
quit;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.