BookmarkSubscribeRSS Feed
Rodcjones
Obsidian | Level 7

I would like to know if MERGE can be used to generate a resulting dataset with all combinations of matching records in >2 datasets. For example, here are my three datasets:

data ORDERS;

input ID ONUM;

cards;

1 99

1 88

;

 

data MEDS;

input ID MNUM;

cards;

1 22

1 33

1 66

;

 

data DRIPS;

input ID DNUM;

cards;

1 55

;

 

I desire a dataset called ALL that contains these records:

 
ID   ONUM    MNUM    DNUM

 1     99      22      55
 1     99      33      55
 1     99      66      55
 1     88      22      55
 1     88      33      55
 1     88      66      55

 

However when I use Merge, like this . . .

 

DATA ALL;

MERGE ORDERS MEDS DRIPS; BY ID;RUN;

 

... unfortunately I get

ID    ONUM    MNUM    DNUM

 1     99      22      55
 1     88      33      55
 1     88      66      55

 

I understand that there is a way to achieve this in proc sql. But before I go that route I wanted to make sure that there is not a DATA step option that will give me the desired dataset.

 

Important aspects of this are that I will need this to work for >2 datasets at a time. I am using SAS Enterprise Guide 7.13 HF3 (7.100.3.5438)

 

Thank you!

10 REPLIES 10
Tom
Super User Tom
Super User

You cannot do it with MERGE.

You can do it using SET and some complex programming.  

For example look at the POINT= option on the SET statement.

 

data ORDERS;
input ID ONUM;
cards;
1 99
1 88
2 1
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
2 4
;
data DRIPS;
input ID DNUM;
cards;
1 55
2 5
;

data want ;
  do until (last.id);
    merge orders(keep=id in=in1) meds(keep=id in=in2) drips(keep=id in=in3);
    by id;
    n1+in1;
    n2+in2;
    n3+in3;
    if first.id and in1 then s1=n1;
    if first.id and in2 then s2=n2;
    if first.id and in3 then s3=n3;
    call missing(of in1-in3);
  end;
  if s1 then do i=s1 to n1;
    set orders point=i;
    if s2 then do j=s2 to n2;
      set meds point=j;
      if s3 then do k=s3 to n3;
        set drips point=k;
        output;
      end;
    end;
  end;
  drop s1-s3 n1-n3 ;
run;
art297
Opal | Level 21

I agree with @Tom, but can be done a bit simpler if you don't need to do it separately for multiple IDs:

 

data want;
  set orders;
  do i=1 to n1;
    set meds point=i nobs=n1;
    do j=1 to n2;
      set drips point=j nobs=n2;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

mkeintz
PROC Star

Here's the kind of complex programming needed if you use SET with point, and have multiple ID levels.  It does not have an intuitive look to it:

 

data want (drop=_:);
  retain _totalo _totalm _totald 0;

  _no=0; _nm=0; _nd=0;
  do until (last.id);
    set orders (in=ino)
        meds   (in=inm)
        drips  (in=ind);
    by id;
    _no+ino;
    _nm+inm;
    _nd+ind;
  end;


  do po=_totalo+min(1,_no) to _totalo+_no;
    if _no>0 then set orders point=po;
	do pm=_totalm+min(1,_nm) to _totalm+_nm;
      if _nm>0 then set meds point=pm;
	  do pd=_totald+min(1,_nd) to _totald+_nd;
	    if _nd>0 then set drips point=pd;
		output;
	  end;
	end;
  end;

  _totalo+_no;
  _totalm+_nm;
  _totald+_nd;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Hash technique isn't too bad either:)

 

data want;

 set orders;

   if _N_ = 1 then do;

   if 0 then do;

   set meds;

   set drips;

   end;

     declare hash myhash1(dataset:'meds',multidata:'YES' );

      rc = myhash1.defineKey('id');

      rc = myhash1.defineData('mnum');

      rc = myhash1.defineDone( );

       declare hash myhash2(dataset:'drips' );

      rc = myhash2.defineKey('id');

      rc = myhash2.defineData('dnum');

      rc = myhash2.defineDone( );

   end;

      rc = myhash1.find();

     rc1= myhash2.find();

      if (rc = 0) then do;

       output;

         rc=myhash1.find_next();

         do while(rc = 0);

            output;

            rc=myhash1.find_next();

         end;

      end;

  drop rc:;

run;

 

Regards,

Naveen Srinivasan

PGStats
Opal | Level 21

There is always a way using keyed access, but as far as I can see, it isn't that simple

 

data ORDERS(index=(id));
input ID ONUM;
cards;
1 99
1 88
2 199
2 188
;
 
data MEDS(index=(id));
input ID MNUM;
cards;
1 22
1 33
1 66
2 122
2 133
2 166
;
 
data DRIPS(index=(id));
input ID DNUM;
cards;
1 55
2 155
;

data all;
set orders;
rmeds=1;
do until(medsDone);
    set meds key=id keyreset=rmeds;
    medsDone = _iorc_ ne 0;
    if not medsDone then do;
        rdrips=1; 
        do until (dripsDone);
            set drips key=id keyreset=rdrips;
            dripsDone = _iorc_ ne 0;
            if not dripsDone then output;
            end;
        end;
    end;
_error_ = 0;
drop medsDone dripsDone;
run;

proc print noobs; run;

/* Compare with */
proc sql;
create table simple as
select *
from 
    orders inner join 
    meds on orders.id = meds.id inner join
    drips on meds.id = drips.id;
select * from simple;
quit;
PG
Ksharp
Super User

Why not SQL's cartesian product ?

 

data ORDERS;
input ID ONUM;
cards;
1 99
1 88
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
;
data DRIPS;
input ID DNUM;
cards;
1 55
;

proc sql;
create table want as
 select a.*,b.mnum,c.dnum
  from orders as a,meds as b,drips as c
   where a.id=b.id and a.id=c.id;
quit;
Patrick
Opal | Level 21

@Rodcjones

That's a great question in the sense that the answer really highlights the difference between a SAS datasetp merge and a SQL join. Once you understand these differences you can take full advantage of it and use the technique which suits your needs best on a case by case basis.

 

For what you're asking for: You're after the result of a cartesian join as @Ksharp mentions and this is something easily done with SQL but can involve a lot of coding with data step SAS.

 

Rodcjones
Obsidian | Level 7

Thank you to Tom, art297, mkeintz, novinosrin, PGStats, Ksharp for your helpful replies. I am working through the solutions. And am also concluding that proc sql is the better approach. My limitation was that MERGE was giving me unexpected results and I didn't realize how MERGE works. I also thought it would be more efficient to use a data step than build the output through successive SQL joins (since I'm sometimes dealing with a large number of tables). Thanks for your expertise and effort.

Patrick
Opal | Level 21

@Rodcjones

"I also thought it would be more efficient to use a data step than build the output through successive SQL joins"

You can join multiple tables at once with SQL and don't need successive steps for this.

 

Tom
Super User Tom
Super User

Also note that the NATURAL JOIN can make SQL code easier to write. Works well for nice clear data structures like your simple example.

 

proc sql ;
create table all as
  select * 
  from ORDERS
  natural join MEDS
  natural join DRIPS
  order by 1
;
quit;

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 5979 views
  • 11 likes
  • 8 in conversation