Hi all! Is there a more efficient way to write this sql join? The tables I am querrying are not that large, but the processing time is taking forever. Basically, I have unique ids in table A (patid_copy). I want to pull all records from b,c,d,e,f - that have matching id (patid_copy=patuniq). Thanks!
proc sql;
create table  asthma.missing_sidr as
select a.*,b.*, c.*, d.*,e.*,f.*
from patid as a 
,sidr.mdrsidrfy10 as b
,sidr.mdrsidrfy11 as c
,sidr.mdrsidrfy12 as d
,sidr.mdrsidrfy13 as e
,sidr.mdrsidrfy14 as f
where a.patid_copy= b.patuniq or 
a.patid_copy= c.patuniq or
a.patid_copy= d.patuniq or
a.patid_copy= e.patuniq or
a.patid_copy= f.patuniq;
quit;
You apparently wnat all the records from b,c,d, or e whose PATID appears in A. And since you say they all have the same vars, I guess you don't need any data from A other than PATID. If so, then
data want;
if _n_=1 then do;
    if 0 then set A;
    declare hash h (dataset:'A (keep=patid)');
h.definekey('patid');
h.definedone();
end;
set b c d e;
rc=h.find();
Or perhaps you want the intersection:
data want;
if _n_=1 then do;
if 0 then set a b c d e;
declare hash ha (dataset:'a (keep=patid)'); ha.definekey('patid'); ha.definedone();
declare hash hb (dataset:'b (keep=patid)'); hb.definekey('patid'); hb.definedone();
declare hash hc (dataset:'c (keep=patid)'); hc.definekey('patid'); hc.definedone();
declare hash hd (dataset:'d (keep=patid)'); hd.definekey('patid'); hd.definedone();
declare hash he (dataset:'e (keep=patid)'); he.definekey('patid'); he.definedone();
end;
set b c d e;
rc=ha.find();
if rc=0 then rc=hb.find();
if rc=0 then rc=hc.find();
if rc=0 then rc=hd.find();
if rc=0 then rc=he.find();
if rc=0;
run;
Note this does not "join" B,C,D, and E. I think you intended an OUTE UNION of them, filtered for PATID found in A.
What is the relationship for the key variables? Are those unique in all datasets, or do you have m:n relationships?
Depending on that, I'd sort and use a data step merge.
It is also important to know which variables are present in the datasets. Variables common in several datasets might cause lost data.
AS written such as:
from patid as a 
,sidr.mdrsidrfy10 as b
This does a cartesian join or every record in Patid with every record in sidr.mdrsidrfy10 .
Each added data set is crossed with every single other record of the others.
So you have a*b*c*d*e*f numbers of records, if each of those data sets has 100 records that mean you have 1E12 generated records. Then each is compared to the where criteria to see if the result is kept.
So combining the mdrsidrfy sets with a data step Set statement and then maybe:
Proc sql;
create table want as
select a.*, b.*
from patid as a left join combined as b
on a.a.patid_copy=b.a.patid_copy;
quit;
You apparently wnat all the records from b,c,d, or e whose PATID appears in A. And since you say they all have the same vars, I guess you don't need any data from A other than PATID. If so, then
data want;
if _n_=1 then do;
    if 0 then set A;
    declare hash h (dataset:'A (keep=patid)');
h.definekey('patid');
h.definedone();
end;
set b c d e;
rc=h.find();
Or perhaps you want the intersection:
data want;
if _n_=1 then do;
if 0 then set a b c d e;
declare hash ha (dataset:'a (keep=patid)'); ha.definekey('patid'); ha.definedone();
declare hash hb (dataset:'b (keep=patid)'); hb.definekey('patid'); hb.definedone();
declare hash hc (dataset:'c (keep=patid)'); hc.definekey('patid'); hc.definedone();
declare hash hd (dataset:'d (keep=patid)'); hd.definekey('patid'); hd.definedone();
declare hash he (dataset:'e (keep=patid)'); he.definekey('patid'); he.definedone();
end;
set b c d e;
rc=ha.find();
if rc=0 then rc=hb.find();
if rc=0 then rc=hc.find();
if rc=0 then rc=hd.find();
if rc=0 then rc=he.find();
if rc=0;
run;
Note this does not "join" B,C,D, and E. I think you intended an OUTE UNION of them, filtered for PATID found in A.
An inner join should process it faster.
proc sql;
create table  asthma.missing_sidr as
select a.*,b.*, c.*, d.*,e.*,f.*
from  patid as a 
inner join  sidr.mdrsidrfy10 as b on a.patid_copy= b.patuniq
inner join sidr.mdrsidrfy11 as c on a.patid_copy= c.patuniq
inner join sidr.mdrsidrfy12 as d on a.patid_copy= d.patuniq
inner join sidr.mdrsidrfy13 as e on a.patid_copy= e.patuniq
inner join sidr.mdrsidrfy14 as f on .patid_copy= f.patuniq;
quit ;The SQL experts out there will need to fix this, but I think if you rely on no patients appearing in multiple data sets you can use something like this.
proc sql;
create table want as
select * from sidr.mdrsidrfy10
union
select * from sidr.mdrsidrfy11
union
select * from sidr.mdrsidrfy12
union
select * from sidr.mdrsidrfy13
where patuniq in (select patid_copy from patid);
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
