BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9

Hi all!  Is there a more efficient way to write this sql join?  The tables I am querrying are not that large, but the processing time is taking forever.  Basically, I have unique ids in table A (patid_copy).  I want to pull all records from b,c,d,e,f - that have matching id (patid_copy=patuniq).  Thanks!


proc sql;
create table  asthma.missing_sidr as
select a.*,b.*, c.*, d.*,e.*,f.*
from patid as a
,sidr.mdrsidrfy10 as b
,sidr.mdrsidrfy11 as c
,sidr.mdrsidrfy12 as d
,sidr.mdrsidrfy13 as e
,sidr.mdrsidrfy14 as f
where a.patid_copy= b.patuniq or
a.patid_copy= c.patuniq or
a.patid_copy= d.patuniq or
a.patid_copy= e.patuniq or
a.patid_copy= f.patuniq;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You apparently wnat all the records from b,c,d, or e whose PATID appears in A.  And since you say they all have the same vars, I guess you don't need any data from A other than PATID.  If so, then

 

data want;

  if _n_=1 then do;

    if 0 then set A;
    declare hash h (dataset:'A (keep=patid)');

      h.definekey('patid');

      h.definedone();

  end;

  set b c d e;

  rc=h.find();

 

Or perhaps you want the intersection:

 

data want;

  if _n_=1 then do;

    if 0 then set a b c d e;

    declare hash ha (dataset:'a (keep=patid)');    ha.definekey('patid');    ha.definedone();

    declare hash hb (dataset:'b (keep=patid)');    hb.definekey('patid');    hb.definedone();

    declare hash hc (dataset:'c (keep=patid)');    hc.definekey('patid');    hc.definedone();

    declare hash hd (dataset:'d (keep=patid)');    hd.definekey('patid');    hd.definedone();

    declare hash he (dataset:'e (keep=patid)');    he.definekey('patid');    he.definedone();

  end;

  set b c d e;

  rc=ha.find();

  if rc=0 then rc=hb.find();

  if rc=0 then rc=hc.find();

  if rc=0 then rc=hd.find();

  if rc=0 then rc=he.find();

  if rc=0;

run;

 

 

 

Note this does not "join" B,C,D, and E.   I think you intended an OUTE UNION of them, filtered for PATID found in A. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

What is the relationship for the key variables? Are those unique in all datasets, or do you have m:n relationships?

Depending on that, I'd sort and use a data step merge.

It is also important to know which variables are present in the datasets. Variables common in several datasets might cause lost data.

jenim514
Pyrite | Level 9
all the variables in datasets are the same. so I should stack the tables in a data step and then pull out the observations needed using sql?
ballardw
Super User

AS written such as:

from patid as a
,sidr.mdrsidrfy10 as b

 

This does a cartesian join or every record in Patid with every record in sidr.mdrsidrfy10 .

Each added data set is crossed with every single other record of the others.

So you have a*b*c*d*e*f  numbers of records, if each of those data sets has 100 records that mean you have 1E12 generated records. Then each is compared to the where criteria to see if the result is kept.

 

So combining the mdrsidrfy sets with a data step Set statement and then maybe:

 

Proc sql;

    create table want as

    select a.*, b.*

    from patid as a left join combined as b

    on a.a.patid_copy=b.a.patid_copy;

quit;

 

 

 

mkeintz
PROC Star

You apparently wnat all the records from b,c,d, or e whose PATID appears in A.  And since you say they all have the same vars, I guess you don't need any data from A other than PATID.  If so, then

 

data want;

  if _n_=1 then do;

    if 0 then set A;
    declare hash h (dataset:'A (keep=patid)');

      h.definekey('patid');

      h.definedone();

  end;

  set b c d e;

  rc=h.find();

 

Or perhaps you want the intersection:

 

data want;

  if _n_=1 then do;

    if 0 then set a b c d e;

    declare hash ha (dataset:'a (keep=patid)');    ha.definekey('patid');    ha.definedone();

    declare hash hb (dataset:'b (keep=patid)');    hb.definekey('patid');    hb.definedone();

    declare hash hc (dataset:'c (keep=patid)');    hc.definekey('patid');    hc.definedone();

    declare hash hd (dataset:'d (keep=patid)');    hd.definekey('patid');    hd.definedone();

    declare hash he (dataset:'e (keep=patid)');    he.definekey('patid');    he.definedone();

  end;

  set b c d e;

  rc=ha.find();

  if rc=0 then rc=hb.find();

  if rc=0 then rc=hc.find();

  if rc=0 then rc=hd.find();

  if rc=0 then rc=he.find();

  if rc=0;

run;

 

 

 

Note this does not "join" B,C,D, and E.   I think you intended an OUTE UNION of them, filtered for PATID found in A. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jenim514
Pyrite | Level 9
@mkeintz The first hash work great and quick!! Thank you!
anoopmohandas7
Quartz | Level 8

An inner join should process it faster.

 

proc sql;
create table  asthma.missing_sidr as
select a.*,b.*, c.*, d.*,e.*,f.*
from  patid as a 
inner join  sidr.mdrsidrfy10 as b on a.patid_copy= b.patuniq
inner join sidr.mdrsidrfy11 as c on a.patid_copy= c.patuniq
inner join sidr.mdrsidrfy12 as d on a.patid_copy= d.patuniq
inner join sidr.mdrsidrfy13 as e on a.patid_copy= e.patuniq
inner join sidr.mdrsidrfy14 as f on .patid_copy= f.patuniq;
quit ;
Astounding
PROC Star

The SQL experts out there will need to fix this, but I think if you rely on no patients appearing in multiple data sets you can use something like this.

 

proc sql;

create table want as

select * from sidr.mdrsidrfy10 

union

select * from sidr.mdrsidrfy11 

union

select * from sidr.mdrsidrfy12

union

select * from sidr.mdrsidrfy13

where patuniq in (select patid_copy from patid);

quit;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1980 views
  • 2 likes
  • 6 in conversation