Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Reply
Occasional Contributor
Posts: 5

Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

I would like to know if MERGE can be used to generate a resulting dataset with all combinations of matching records in >2 datasets. For example, here are my three datasets:

data ORDERS;

input ID ONUM;

cards;

1 99

1 88

;

 

data MEDS;

input ID MNUM;

cards;

1 22

1 33

1 66

;

 

data DRIPS;

input ID DNUM;

cards;

1 55

;

 

I desire a dataset called ALL that contains these records:

 
ID   ONUM    MNUM    DNUM

 1     99      22      55
 1     99      33      55
 1     99      66      55
 1     88      22      55
 1     88      33      55
 1     88      66      55

 

However when I use Merge, like this . . .

 

DATA ALL;

MERGE ORDERS MEDS DRIPS; BY ID;RUN;

 

... unfortunately I get

ID    ONUM    MNUM    DNUM

 1     99      22      55
 1     88      33      55
 1     88      66      55

 

I understand that there is a way to achieve this in proc sql. But before I go that route I wanted to make sure that there is not a DATA step option that will give me the desired dataset.

 

Important aspects of this are that I will need this to work for >2 datasets at a time. I am using SAS Enterprise Guide 7.13 HF3 (7.100.3.5438)

 

Thank you!

Super User
Super User
Posts: 6,351

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

You cannot do it with MERGE.

You can do it using SET and some complex programming.  

For example look at the POINT= option on the SET statement.

 

data ORDERS;
input ID ONUM;
cards;
1 99
1 88
2 1
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
2 4
;
data DRIPS;
input ID DNUM;
cards;
1 55
2 5
;

data want ;
  do until (last.id);
    merge orders(keep=id in=in1) meds(keep=id in=in2) drips(keep=id in=in3);
    by id;
    n1+in1;
    n2+in2;
    n3+in3;
    if first.id and in1 then s1=n1;
    if first.id and in2 then s2=n2;
    if first.id and in3 then s3=n3;
    call missing(of in1-in3);
  end;
  if s1 then do i=s1 to n1;
    set orders point=i;
    if s2 then do j=s2 to n2;
      set meds point=j;
      if s3 then do k=s3 to n3;
        set drips point=k;
        output;
      end;
    end;
  end;
  drop s1-s3 n1-n3 ;
run;
Esteemed Advisor
Posts: 7,294

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

[ Edited ]

I agree with @Tom, but can be done a bit simpler if you don't need to do it separately for multiple IDs:

 

data want;
  set orders;
  do i=1 to n1;
    set meds point=i nobs=n1;
    do j=1 to n2;
      set drips point=j nobs=n2;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 789

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Here's the kind of complex programming needed if you use SET with point, and have multiple ID levels.  It does not have an intuitive look to it:

 

data want (drop=_:);
  retain _totalo _totalm _totald 0;

  _no=0; _nm=0; _nd=0;
  do until (last.id);
    set orders (in=ino)
        meds   (in=inm)
        drips  (in=ind);
    by id;
    _no+ino;
    _nm+inm;
    _nd+ind;
  end;


  do po=_totalo+min(1,_no) to _totalo+_no;
    if _no>0 then set orders point=po;
	do pm=_totalm+min(1,_nm) to _totalm+_nm;
      if _nm>0 then set meds point=pm;
	  do pd=_totald+min(1,_nd) to _totald+_nd;
	    if _nd>0 then set drips point=pd;
		output;
	  end;
	end;
  end;

  _totalo+_no;
  _totalm+_nm;
  _totald+_nd;
run;

Frequent Contributor
Posts: 107

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Hash technique isn't too bad eitherSmiley Happy

 

data want;

 set orders;

   if _N_ = 1 then do;

   if 0 then do;

   set meds;

   set drips;

   end;

     declare hash myhash1(dataset:'meds',multidata:'YES' );

      rc = myhash1.defineKey('id');

      rc = myhash1.defineData('mnum');

      rc = myhash1.defineDone( );

       declare hash myhash2(dataset:'drips' );

      rc = myhash2.defineKey('id');

      rc = myhash2.defineData('dnum');

      rc = myhash2.defineDone( );

   end;

      rc = myhash1.find();

     rc1= myhash2.find();

      if (rc = 0) then do;

       output;

         rc=myhash1.find_next();

         do while(rc = 0);

            output;

            rc=myhash1.find_next();

         end;

      end;

  drop rc:;

run;

 

Regards,

Naveen Srinivasan

Respected Advisor
Posts: 4,608

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

[ Edited ]

There is always a way using keyed access, but as far as I can see, it isn't that simple

 

data ORDERS(index=(id));
input ID ONUM;
cards;
1 99
1 88
2 199
2 188
;
 
data MEDS(index=(id));
input ID MNUM;
cards;
1 22
1 33
1 66
2 122
2 133
2 166
;
 
data DRIPS(index=(id));
input ID DNUM;
cards;
1 55
2 155
;

data all;
set orders;
rmeds=1;
do until(medsDone);
    set meds key=id keyreset=rmeds;
    medsDone = _iorc_ ne 0;
    if not medsDone then do;
        rdrips=1; 
        do until (dripsDone);
            set drips key=id keyreset=rdrips;
            dripsDone = _iorc_ ne 0;
            if not dripsDone then output;
            end;
        end;
    end;
_error_ = 0;
drop medsDone dripsDone;
run;

proc print noobs; run;

/* Compare with */
proc sql;
create table simple as
select *
from 
    orders inner join 
    meds on orders.id = meds.id inner join
    drips on meds.id = drips.id;
select * from simple;
quit;
PG
Grand Advisor
Posts: 9,578

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Why not SQL's cartesian product ?

 

data ORDERS;
input ID ONUM;
cards;
1 99
1 88
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
;
data DRIPS;
input ID DNUM;
cards;
1 55
;

proc sql;
create table want as
 select a.*,b.mnum,c.dnum
  from orders as a,meds as b,drips as c
   where a.id=b.id and a.id=c.id;
quit;
Respected Advisor
Posts: 3,837

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

@Rodcjones

That's a great question in the sense that the answer really highlights the difference between a SAS datasetp merge and a SQL join. Once you understand these differences you can take full advantage of it and use the technique which suits your needs best on a case by case basis.

 

For what you're asking for: You're after the result of a cartesian join as @Ksharp mentions and this is something easily done with SQL but can involve a lot of coding with data step SAS.

 

Occasional Contributor
Posts: 5

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Thank you to Tom, art297, mkeintz, novinosrin, PGStats, Ksharp for your helpful replies. I am working through the solutions. And am also concluding that proc sql is the better approach. My limitation was that MERGE was giving me unexpected results and I didn't realize how MERGE works. I also thought it would be more efficient to use a data step than build the output through successive SQL joins (since I'm sometimes dealing with a large number of tables). Thanks for your expertise and effort.

Respected Advisor
Posts: 3,837

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

@Rodcjones

"I also thought it would be more efficient to use a data step than build the output through successive SQL joins"

You can join multiple tables at once with SQL and don't need successive steps for this.

 

Super User
Super User
Posts: 6,351

Re: Use of MERGE in DATA step to generate all combinations of matched records from >2 datasets

Also note that the NATURAL JOIN can make SQL code easier to write. Works well for nice clear data structures like your simple example.

 

proc sql ;
create table all as
  select * 
  from ORDERS
  natural join MEDS
  natural join DRIPS
  order by 1
;
quit;

 

 

Ask a Question
Discussion stats
  • 10 replies
  • 263 views
  • 9 likes
  • 8 in conversation