05-02-2017 01:02 PM
I would like to know if MERGE can be used to generate a resulting dataset with all combinations of matching records in >2 datasets. For example, here are my three datasets:
input ID ONUM;
input ID MNUM;
input ID DNUM;
I desire a dataset called ALL that contains these records:
ID ONUM MNUM DNUM
1 99 22 55
1 99 33 55
1 99 66 55
1 88 22 55
1 88 33 55
1 88 66 55
However when I use Merge, like this . . .
MERGE ORDERS MEDS DRIPS; BY ID;RUN;
... unfortunately I get
ID ONUM MNUM DNUM
1 99 22 55
1 88 33 55
1 88 66 55
I understand that there is a way to achieve this in proc sql. But before I go that route I wanted to make sure that there is not a DATA step option that will give me the desired dataset.
Important aspects of this are that I will need this to work for >2 datasets at a time. I am using SAS Enterprise Guide 7.13 HF3 (220.127.116.1138)
05-02-2017 01:48 PM
You cannot do it with MERGE.
You can do it using SET and some complex programming.
For example look at the POINT= option on the SET statement.
data ORDERS; input ID ONUM; cards; 1 99 1 88 2 1 ; data MEDS; input ID MNUM; cards; 1 22 1 33 1 66 2 4 ; data DRIPS; input ID DNUM; cards; 1 55 2 5 ; data want ; do until (last.id); merge orders(keep=id in=in1) meds(keep=id in=in2) drips(keep=id in=in3); by id; n1+in1; n2+in2; n3+in3; if first.id and in1 then s1=n1; if first.id and in2 then s2=n2; if first.id and in3 then s3=n3; call missing(of in1-in3); end; if s1 then do i=s1 to n1; set orders point=i; if s2 then do j=s2 to n2; set meds point=j; if s3 then do k=s3 to n3; set drips point=k; output; end; end; end; drop s1-s3 n1-n3 ; run;
05-02-2017 02:06 PM - edited 05-02-2017 02:08 PM
I agree with @Tom, but can be done a bit simpler if you don't need to do it separately for multiple IDs:
data want; set orders; do i=1 to n1; set meds point=i nobs=n1; do j=1 to n2; set drips point=j nobs=n2; output; end; end; run;
Art, CEO, AnalystFinder.com
05-02-2017 07:52 PM
Here's the kind of complex programming needed if you use SET with point, and have multiple ID levels. It does not have an intuitive look to it:
data want (drop=_:); retain _totalo _totalm _totald 0; _no=0; _nm=0; _nd=0; do until (last.id); set orders (in=ino) meds (in=inm) drips (in=ind); by id; _no+ino; _nm+inm; _nd+ind; end; do po=_totalo+min(1,_no) to _totalo+_no; if _no>0 then set orders point=po; do pm=_totalm+min(1,_nm) to _totalm+_nm; if _nm>0 then set meds point=pm; do pd=_totald+min(1,_nd) to _totald+_nd; if _nd>0 then set drips point=pd; output; end; end; end; _totalo+_no; _totalm+_nm; _totald+_nd; run;
05-02-2017 02:44 PM
Hash technique isn't too bad either
if _N_ = 1 then do;
if 0 then do;
declare hash myhash1(dataset:'meds',multidata:'YES' );
rc = myhash1.defineKey('id');
rc = myhash1.defineData('mnum');
rc = myhash1.defineDone( );
declare hash myhash2(dataset:'drips' );
rc = myhash2.defineKey('id');
rc = myhash2.defineData('dnum');
rc = myhash2.defineDone( );
rc = myhash1.find();
if (rc = 0) then do;
do while(rc = 0);
05-02-2017 02:50 PM - edited 05-02-2017 02:53 PM
There is always a way using keyed access, but as far as I can see, it isn't that simple
data ORDERS(index=(id)); input ID ONUM; cards; 1 99 1 88 2 199 2 188 ; data MEDS(index=(id)); input ID MNUM; cards; 1 22 1 33 1 66 2 122 2 133 2 166 ; data DRIPS(index=(id)); input ID DNUM; cards; 1 55 2 155 ; data all; set orders; rmeds=1; do until(medsDone); set meds key=id keyreset=rmeds; medsDone = _iorc_ ne 0; if not medsDone then do; rdrips=1; do until (dripsDone); set drips key=id keyreset=rdrips; dripsDone = _iorc_ ne 0; if not dripsDone then output; end; end; end; _error_ = 0; drop medsDone dripsDone; run; proc print noobs; run; /* Compare with */ proc sql; create table simple as select * from orders inner join meds on orders.id = meds.id inner join drips on meds.id = drips.id; select * from simple; quit;
05-03-2017 09:13 AM
Why not SQL's cartesian product ?
data ORDERS; input ID ONUM; cards; 1 99 1 88 ; data MEDS; input ID MNUM; cards; 1 22 1 33 1 66 ; data DRIPS; input ID DNUM; cards; 1 55 ; proc sql; create table want as select a.*,b.mnum,c.dnum from orders as a,meds as b,drips as c where a.id=b.id and a.id=c.id; quit;
05-03-2017 10:18 AM
That's a great question in the sense that the answer really highlights the difference between a SAS datasetp merge and a SQL join. Once you understand these differences you can take full advantage of it and use the technique which suits your needs best on a case by case basis.
For what you're asking for: You're after the result of a cartesian join as @Ksharp mentions and this is something easily done with SQL but can involve a lot of coding with data step SAS.
05-03-2017 01:16 PM
Thank you to Tom, art297, mkeintz, novinosrin, PGStats, Ksharp for your helpful replies. I am working through the solutions. And am also concluding that proc sql is the better approach. My limitation was that MERGE was giving me unexpected results and I didn't realize how MERGE works. I also thought it would be more efficient to use a data step than build the output through successive SQL joins (since I'm sometimes dealing with a large number of tables). Thanks for your expertise and effort.
05-03-2017 06:14 PM
"I also thought it would be more efficient to use a data step than build the output through successive SQL joins"
You can join multiple tables at once with SQL and don't need successive steps for this.
05-03-2017 06:45 PM
Also note that the NATURAL JOIN can make SQL code easier to write. Works well for nice clear data structures like your simple example.
proc sql ; create table all as select * from ORDERS natural join MEDS natural join DRIPS order by 1 ; quit;