I would like to know if MERGE can be used to generate a resulting dataset with all combinations of matching records in >2 datasets. For example, here are my three datasets:
data ORDERS;
input ID ONUM;
cards;
1 99
1 88
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
;
data DRIPS;
input ID DNUM;
cards;
1 55
;
I desire a dataset called ALL that contains these records:
ID ONUM MNUM DNUM
1 99 22 55
1 99 33 55
1 99 66 55
1 88 22 55
1 88 33 55
1 88 66 55
However when I use Merge, like this . . .
DATA ALL;
MERGE ORDERS MEDS DRIPS; BY ID;RUN;
... unfortunately I get
ID ONUM MNUM DNUM
1 99 22 55
1 88 33 55
1 88 66 55
I understand that there is a way to achieve this in proc sql. But before I go that route I wanted to make sure that there is not a DATA step option that will give me the desired dataset.
Important aspects of this are that I will need this to work for >2 datasets at a time. I am using SAS Enterprise Guide 7.13 HF3 (7.100.3.5438)
Thank you!
You cannot do it with MERGE.
You can do it using SET and some complex programming.
For example look at the POINT= option on the SET statement.
data ORDERS;
input ID ONUM;
cards;
1 99
1 88
2 1
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
2 4
;
data DRIPS;
input ID DNUM;
cards;
1 55
2 5
;
data want ;
do until (last.id);
merge orders(keep=id in=in1) meds(keep=id in=in2) drips(keep=id in=in3);
by id;
n1+in1;
n2+in2;
n3+in3;
if first.id and in1 then s1=n1;
if first.id and in2 then s2=n2;
if first.id and in3 then s3=n3;
call missing(of in1-in3);
end;
if s1 then do i=s1 to n1;
set orders point=i;
if s2 then do j=s2 to n2;
set meds point=j;
if s3 then do k=s3 to n3;
set drips point=k;
output;
end;
end;
end;
drop s1-s3 n1-n3 ;
run;
I agree with @Tom, but can be done a bit simpler if you don't need to do it separately for multiple IDs:
data want; set orders; do i=1 to n1; set meds point=i nobs=n1; do j=1 to n2; set drips point=j nobs=n2; output; end; end; run;
Art, CEO, AnalystFinder.com
Here's the kind of complex programming needed if you use SET with point, and have multiple ID levels. It does not have an intuitive look to it:
data want (drop=_:);
retain _totalo _totalm _totald 0;
_no=0; _nm=0; _nd=0;
do until (last.id);
set orders (in=ino)
meds (in=inm)
drips (in=ind);
by id;
_no+ino;
_nm+inm;
_nd+ind;
end;
do po=_totalo+min(1,_no) to _totalo+_no;
if _no>0 then set orders point=po;
do pm=_totalm+min(1,_nm) to _totalm+_nm;
if _nm>0 then set meds point=pm;
do pd=_totald+min(1,_nd) to _totald+_nd;
if _nd>0 then set drips point=pd;
output;
end;
end;
end;
_totalo+_no;
_totalm+_nm;
_totald+_nd;
run;
Hash technique isn't too bad either:)
data want;
set orders;
if _N_ = 1 then do;
if 0 then do;
set meds;
set drips;
end;
declare hash myhash1(dataset:'meds',multidata:'YES' );
rc = myhash1.defineKey('id');
rc = myhash1.defineData('mnum');
rc = myhash1.defineDone( );
declare hash myhash2(dataset:'drips' );
rc = myhash2.defineKey('id');
rc = myhash2.defineData('dnum');
rc = myhash2.defineDone( );
end;
rc = myhash1.find();
rc1= myhash2.find();
if (rc = 0) then do;
output;
rc=myhash1.find_next();
do while(rc = 0);
output;
rc=myhash1.find_next();
end;
end;
drop rc:;
run;
Regards,
Naveen Srinivasan
There is always a way using keyed access, but as far as I can see, it isn't that simple
data ORDERS(index=(id));
input ID ONUM;
cards;
1 99
1 88
2 199
2 188
;
data MEDS(index=(id));
input ID MNUM;
cards;
1 22
1 33
1 66
2 122
2 133
2 166
;
data DRIPS(index=(id));
input ID DNUM;
cards;
1 55
2 155
;
data all;
set orders;
rmeds=1;
do until(medsDone);
set meds key=id keyreset=rmeds;
medsDone = _iorc_ ne 0;
if not medsDone then do;
rdrips=1;
do until (dripsDone);
set drips key=id keyreset=rdrips;
dripsDone = _iorc_ ne 0;
if not dripsDone then output;
end;
end;
end;
_error_ = 0;
drop medsDone dripsDone;
run;
proc print noobs; run;
/* Compare with */
proc sql;
create table simple as
select *
from
orders inner join
meds on orders.id = meds.id inner join
drips on meds.id = drips.id;
select * from simple;
quit;
Why not SQL's cartesian product ?
data ORDERS;
input ID ONUM;
cards;
1 99
1 88
;
data MEDS;
input ID MNUM;
cards;
1 22
1 33
1 66
;
data DRIPS;
input ID DNUM;
cards;
1 55
;
proc sql;
create table want as
select a.*,b.mnum,c.dnum
from orders as a,meds as b,drips as c
where a.id=b.id and a.id=c.id;
quit;
That's a great question in the sense that the answer really highlights the difference between a SAS datasetp merge and a SQL join. Once you understand these differences you can take full advantage of it and use the technique which suits your needs best on a case by case basis.
For what you're asking for: You're after the result of a cartesian join as @Ksharp mentions and this is something easily done with SQL but can involve a lot of coding with data step SAS.
Thank you to Tom, art297, mkeintz, novinosrin, PGStats, Ksharp for your helpful replies. I am working through the solutions. And am also concluding that proc sql is the better approach. My limitation was that MERGE was giving me unexpected results and I didn't realize how MERGE works. I also thought it would be more efficient to use a data step than build the output through successive SQL joins (since I'm sometimes dealing with a large number of tables). Thanks for your expertise and effort.
"I also thought it would be more efficient to use a data step than build the output through successive SQL joins"
You can join multiple tables at once with SQL and don't need successive steps for this.
Also note that the NATURAL JOIN can make SQL code easier to write. Works well for nice clear data structures like your simple example.
proc sql ;
create table all as
select *
from ORDERS
natural join MEDS
natural join DRIPS
order by 1
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.