Hello,
I have these two datasets:
data dsa;
input x y$ id;
datalines;
1 a 1
1 a 2
1 a 3
1 a 4
1 b 5
2 c 6
;run;
data dsb;
input x y$;
datalines;
1 a
1 a
1 b
2 c
;run;
I like to merge them somehow to get the following dataset (actually the id values for the first two rows can be any of 1, 2, 3, or 4 except they can not be same ) :
x y id
1 a 1
1 a 2
1 b 5
2 c 6
Seems it is very tricky. Thanks.
You could brute force it by adding a counter within the BY values to both tables and include that in the merge.
data a;
do row=1 by until(last.y);
set dsa;
by x y;
output;
end;
run;
Similarly for DSB.
data want;
merge A(in=in1) B(in=in2);
by x y ROW ;
if in1 and in2;
run;
You could also just take advantage of how SAS normally handles MANY to MANY merges. Only output when both datasets contribute a record and reset the IN= variables so that it is only true when the data set is contributing a NEW observation.
data dsa;
input x y $ id @@;
cards;
1 a 1 1 a 2 1 a 3 1 a 4 1 b 5 2 c 6
run;
data dsb;
input x y $ @@;
cards;
1 a 1 a 1 b 2 c
run;
data want ;
call missing(in1,in2);
merge dsa(in=in1) dsb(in=in2);
by x y ;
if in1 and in2 ;
run;
You could brute force it by adding a counter within the BY values to both tables and include that in the merge.
data a;
do row=1 by until(last.y);
set dsa;
by x y;
output;
end;
run;
Similarly for DSB.
data want;
merge A(in=in1) B(in=in2);
by x y ROW ;
if in1 and in2;
run;
You could also just take advantage of how SAS normally handles MANY to MANY merges. Only output when both datasets contribute a record and reset the IN= variables so that it is only true when the data set is contributing a NEW observation.
data dsa;
input x y $ id @@;
cards;
1 a 1 1 a 2 1 a 3 1 a 4 1 b 5 2 c 6
run;
data dsb;
input x y $ @@;
cards;
1 a 1 a 1 b 2 c
run;
data want ;
call missing(in1,in2);
merge dsa(in=in1) dsb(in=in2);
by x y ;
if in1 and in2 ;
run;
Thank you very much, Tom. Actually the method 2 works like dream!.
But the first method doesn't work, I know what you try to do, but your code may have some glitches. Seems my SAS (v9.3) doesn't recognize keyword UNTIL.
Should be
do row=1 by 1 until(last.y);
UNTIL was not the problem. It was the missing value for the BY .
Thanks Tom. Yes, I modified your code the same way, but it didn't merge the datasets as we expected. But your idea is a solution too, I think we just need to modify the code a little bit.
Generates the same output for me.
data a;
do row =1 by 1 until(last.y);
set dsa;
by x y;
output;
end;
run;
data b;
do row =1 by 1 until(last.y);
set dsb;
by x y;
output;
end;
run;
data want2 ;
merge a(in=in1) b(in=in2);
by x y ROW ;
if in1 and in2;
run;
proc compare data=want compare=want2;
run;
Thanks Tom. yap! it works! keyword OUTPUT that I missed in my code makes big different.
Thanks again for the awesome solutions.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.