Use of MERGE in DATA step to generate all combinations of matched reco...

Rodcjones · Posted 05-02-2017 01:02 PM

I would like to know if MERGE can be used to generate a resulting dataset with all combinations of matching records in >2 datasets. For example, here are my three datasets:

data ORDERS;

input ID ONUM;

cards;

1 99

1 88

;

data MEDS;

input ID MNUM;

cards;

1 22

1 33

1 66

;

data DRIPS;

input ID DNUM;

cards;

1 55

;

I desire a dataset called ALL that contains these records:

　
ID ONUM MNUM DNUM

1     99      22      55
1     99      33      55
1     99      66      55
1     88      22      55
1     88      33      55
1     88      66      55

However when I use Merge, like this . . .

DATA ALL;

MERGE ORDERS MEDS DRIPS; BY ID;RUN;

... unfortunately I get

ID ONUM MNUM DNUM

1     99      22      55
1     88      33      55
1     88      66      55

I understand that there is a way to achieve this in proc sql. But before I go that route I wanted to make sure that there is not a DATA step option that will give me the desired dataset.

Important aspects of this are that I will need this to work for >2 datasets at a time. I am using SAS Enterprise Guide 7.13 HF3 (7.100.3.5438)

Thank you!

Tom · Posted 05-02-2017 01:48 PM

You cannot do it with MERGE.

You can do it using SET and some complex programming.

For example look at the POINT= option on the SET statement.

data ORDERS;
input ID ONUM;
cards;
1 99
1 88
2 1
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
2 4
;
data DRIPS;
input ID DNUM;
cards;
1 55
2 5
;

data want ;
  do until (last.id);
    merge orders(keep=id in=in1) meds(keep=id in=in2) drips(keep=id in=in3);
    by id;
    n1+in1;
    n2+in2;
    n3+in3;
    if first.id and in1 then s1=n1;
    if first.id and in2 then s2=n2;
    if first.id and in3 then s3=n3;
    call missing(of in1-in3);
  end;
  if s1 then do i=s1 to n1;
    set orders point=i;
    if s2 then do j=s2 to n2;
      set meds point=j;
      if s3 then do k=s3 to n3;
        set drips point=k;
        output;
      end;
    end;
  end;
  drop s1-s3 n1-n3 ;
run;

art297 · Posted 05-02-2017 02:06 PM

I agree with @Tom, but can be done a bit simpler if you don't need to do it separately for multiple IDs:

data want;
  set orders;
  do i=1 to n1;
    set meds point=i nobs=n1;
    do j=1 to n2;
      set drips point=j nobs=n2;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

mkeintz · Posted 05-02-2017 07:52 PM

Here's the kind of complex programming needed if you use SET with point, and have multiple ID levels. It does not have an intuitive look to it:

data want (drop=_:);
  retain _totalo _totalm _totald 0;

  _no=0; _nm=0; _nd=0;
  do until (last.id);
    set orders (in=ino)
        meds   (in=inm)
        drips  (in=ind);
    by id;
    _no+ino;
    _nm+inm;
    _nd+ind;
  end;


  do po=_totalo+min(1,_no) to _totalo+_no;
    if _no>0 then set orders point=po;
	do pm=_totalm+min(1,_nm) to _totalm+_nm;
      if _nm>0 then set meds point=pm;
	  do pd=_totald+min(1,_nd) to _totald+_nd;
	    if _nd>0 then set drips point=pd;
		output;
	  end;
	end;
  end;

  _totalo+_no;
  _totalm+_nm;
  _totald+_nd;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 05-02-2017 02:44 PM

Hash technique isn't too bad either:)

data want;

set orders;

if _N_ = 1 then do;

if 0 then do;

set meds;

set drips;

end;

declare hash myhash1(dataset:'meds',multidata:'YES' );

rc = myhash1.defineKey('id');

rc = myhash1.defineData('mnum');

rc = myhash1.defineDone( );

declare hash myhash2(dataset:'drips' );

rc = myhash2.defineKey('id');

rc = myhash2.defineData('dnum');

rc = myhash2.defineDone( );

end;

rc = myhash1.find();

rc1= myhash2.find();

if (rc = 0) then do;

output;

rc=myhash1.find_next();

do while(rc = 0);

output;

rc=myhash1.find_next();

end;

drop rc:;

run;

Regards,

Naveen Srinivasan

PGStats · Posted 05-02-2017 02:50 PM

There is always a way using keyed access, but as far as I can see, it isn't that simple

data ORDERS(index=(id));
input ID ONUM;
cards;
1 99
1 88
2 199
2 188
;
　
data MEDS(index=(id));
input ID MNUM;
cards;
1 22
1 33
1 66
2 122
2 133
2 166
;
　
data DRIPS(index=(id));
input ID DNUM;
cards;
1 55
2 155
;

data all;
set orders;
rmeds=1;
do until(medsDone);
    set meds key=id keyreset=rmeds;
    medsDone = _iorc_ ne 0;
    if not medsDone then do;
        rdrips=1; 
        do until (dripsDone);
            set drips key=id keyreset=rdrips;
            dripsDone = _iorc_ ne 0;
            if not dripsDone then output;
            end;
        end;
    end;
_error_ = 0;
drop medsDone dripsDone;
run;

proc print noobs; run;

/* Compare with */
proc sql;
create table simple as
select *
from 
    orders inner join 
    meds on orders.id = meds.id inner join
    drips on meds.id = drips.id;
select * from simple;
quit;

PG

Ksharp · Posted 05-03-2017 09:13 AM

Why not SQL's cartesian product ?

data ORDERS;
input ID ONUM;
cards;
1 99
1 88
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
;
data DRIPS;
input ID DNUM;
cards;
1 55
;

proc sql;
create table want as
 select a.*,b.mnum,c.dnum
  from orders as a,meds as b,drips as c
   where a.id=b.id and a.id=c.id;
quit;

Patrick · Posted 05-03-2017 10:18 AM

@Rodcjones

That's a great question in the sense that the answer really highlights the difference between a SAS datasetp merge and a SQL join. Once you understand these differences you can take full advantage of it and use the technique which suits your needs best on a case by case basis.

For what you're asking for: You're after the result of a cartesian join as @Ksharp mentions and this is something easily done with SQL but can involve a lot of coding with data step SAS.

Rodcjones · Posted 05-03-2017 01:16 PM

Thank you to Tom, art297, mkeintz, novinosrin, PGStats, Ksharp for your helpful replies. I am working through the solutions. And am also concluding that proc sql is the better approach. My limitation was that MERGE was giving me unexpected results and I didn't realize how MERGE works. I also thought it would be more efficient to use a data step than build the output through successive SQL joins (since I'm sometimes dealing with a large number of tables). Thanks for your expertise and effort.

Patrick · Posted 05-03-2017 06:14 PM

@Rodcjones

"I also thought it would be more efficient to use a data step than build the output through successive SQL joins"

You can join multiple tables at once with SQL and don't need successive steps for this.

Tom · Posted 05-03-2017 06:45 PM

Also note that the NATURAL JOIN can make SQL code easier to write. Works well for nice clear data structures like your simple example.

proc sql ;
create table all as
  select * 
  from ORDERS
  natural join MEDS
  natural join DRIPS
  order by 1
;
quit;

Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Classroom Training Available!