DATA Step, Macro, Functions and more

faster processing time for sql join

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 142
Accepted Solution

faster processing time for sql join

Hi all!  Is there a more efficient way to write this sql join?  The tables I am querrying are not that large, but the processing time is taking forever.  Basically, I have unique ids in table A (patid_copy).  I want to pull all records from b,c,d,e,f - that have matching id (patid_copy=patuniq).  Thanks!


proc sql;
create table  asthma.missing_sidr as
select a.*,b.*, c.*, d.*,e.*,f.*
from patid as a
,sidr.mdrsidrfy10 as b
,sidr.mdrsidrfy11 as c
,sidr.mdrsidrfy12 as d
,sidr.mdrsidrfy13 as e
,sidr.mdrsidrfy14 as f
where a.patid_copy= b.patuniq or
a.patid_copy= c.patuniq or
a.patid_copy= d.patuniq or
a.patid_copy= e.patuniq or
a.patid_copy= f.patuniq;
quit;


Accepted Solutions
Solution
‎02-09-2017 03:10 PM
Trusted Advisor
Posts: 1,022

Re: faster processing time for sql join

You apparently wnat all the records from b,c,d, or e whose PATID appears in A.  And since you say they all have the same vars, I guess you don't need any data from A other than PATID.  If so, then

 

data want;

  if _n_=1 then do;

    if 0 then set A;
    declare hash h (dataset:'A (keep=patid)');

      h.definekey('patid');

      h.definedone();

  end;

  set b c d e;

  rc=h.find();

 

Or perhaps you want the intersection:

 

data want;

  if _n_=1 then do;

    if 0 then set a b c d e;

    declare hash ha (dataset:'a (keep=patid)');    ha.definekey('patid');    ha.definedone();

    declare hash hb (dataset:'b (keep=patid)');    hb.definekey('patid');    hb.definedone();

    declare hash hc (dataset:'c (keep=patid)');    hc.definekey('patid');    hc.definedone();

    declare hash hd (dataset:'d (keep=patid)');    hd.definekey('patid');    hd.definedone();

    declare hash he (dataset:'e (keep=patid)');    he.definekey('patid');    he.definedone();

  end;

  set b c d e;

  rc=ha.find();

  if rc=0 then rc=hb.find();

  if rc=0 then rc=hc.find();

  if rc=0 then rc=hd.find();

  if rc=0 then rc=he.find();

  if rc=0;

run;

 

 

 

Note this does not "join" B,C,D, and E.   I think you intended an OUTE UNION of them, filtered for PATID found in A. 

View solution in original post


All Replies
Super User
Posts: 7,854

Re: faster processing time for sql join

What is the relationship for the key variables? Are those unique in all datasets, or do you have m:n relationships?

Depending on that, I'd sort and use a data step merge.

It is also important to know which variables are present in the datasets. Variables common in several datasets might cause lost data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 142

Re: faster processing time for sql join

Posted in reply to KurtBremser
all the variables in datasets are the same. so I should stack the tables in a data step and then pull out the observations needed using sql?
Super User
Posts: 11,343

Re: faster processing time for sql join

AS written such as:

from patid as a
,sidr.mdrsidrfy10 as b

 

This does a cartesian join or every record in Patid with every record in sidr.mdrsidrfy10 .

Each added data set is crossed with every single other record of the others.

So you have a*b*c*d*e*f  numbers of records, if each of those data sets has 100 records that mean you have 1E12 generated records. Then each is compared to the where criteria to see if the result is kept.

 

So combining the mdrsidrfy sets with a data step Set statement and then maybe:

 

Proc sql;

    create table want as

    select a.*, b.*

    from patid as a left join combined as b

    on a.a.patid_copy=b.a.patid_copy;

quit;

 

 

 

Solution
‎02-09-2017 03:10 PM
Trusted Advisor
Posts: 1,022

Re: faster processing time for sql join

You apparently wnat all the records from b,c,d, or e whose PATID appears in A.  And since you say they all have the same vars, I guess you don't need any data from A other than PATID.  If so, then

 

data want;

  if _n_=1 then do;

    if 0 then set A;
    declare hash h (dataset:'A (keep=patid)');

      h.definekey('patid');

      h.definedone();

  end;

  set b c d e;

  rc=h.find();

 

Or perhaps you want the intersection:

 

data want;

  if _n_=1 then do;

    if 0 then set a b c d e;

    declare hash ha (dataset:'a (keep=patid)');    ha.definekey('patid');    ha.definedone();

    declare hash hb (dataset:'b (keep=patid)');    hb.definekey('patid');    hb.definedone();

    declare hash hc (dataset:'c (keep=patid)');    hc.definekey('patid');    hc.definedone();

    declare hash hd (dataset:'d (keep=patid)');    hd.definekey('patid');    hd.definedone();

    declare hash he (dataset:'e (keep=patid)');    he.definekey('patid');    he.definedone();

  end;

  set b c d e;

  rc=ha.find();

  if rc=0 then rc=hb.find();

  if rc=0 then rc=hc.find();

  if rc=0 then rc=hd.find();

  if rc=0 then rc=he.find();

  if rc=0;

run;

 

 

 

Note this does not "join" B,C,D, and E.   I think you intended an OUTE UNION of them, filtered for PATID found in A. 

Frequent Contributor
Posts: 142

Re: faster processing time for sql join

@mkeintz The first hash work great and quick!! Thank you!
Contributor
Posts: 44

Re: faster processing time for sql join

An inner join should process it faster.

 

proc sql;
create table  asthma.missing_sidr as
select a.*,b.*, c.*, d.*,e.*,f.*
from  patid as a 
inner join  sidr.mdrsidrfy10 as b on a.patid_copy= b.patuniq
inner join sidr.mdrsidrfy11 as c on a.patid_copy= c.patuniq
inner join sidr.mdrsidrfy12 as d on a.patid_copy= d.patuniq
inner join sidr.mdrsidrfy13 as e on a.patid_copy= e.patuniq
inner join sidr.mdrsidrfy14 as f on .patid_copy= f.patuniq;
quit ;
Super User
Posts: 5,516

Re: faster processing time for sql join

The SQL experts out there will need to fix this, but I think if you rely on no patients appearing in multiple data sets you can use something like this.

 

proc sql;

create table want as

select * from sidr.mdrsidrfy10 

union

select * from sidr.mdrsidrfy11 

union

select * from sidr.mdrsidrfy12

union

select * from sidr.mdrsidrfy13

where patuniq in (select patid_copy from patid);

quit;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 225 views
  • 2 likes
  • 6 in conversation