Hello,
I have these two datasets:
data dsa;
input x y$ id;
datalines;
1 a 1
1 a 2
1 a 3
1 a 4
1 b 5
2 c 6
;run;
data dsb;
input x y$;
datalines;
1 a
1 a
1 b
2 c
;run;
I like to merge them somehow to get the following dataset (actually the id values for the first two rows can be any of 1, 2, 3, or 4 except they can not be same ) :
x y id
1 a 1
1 a 2
1 b 5
2 c 6
Seems it is very tricky. Thanks.
You could brute force it by adding a counter within the BY values to both tables and include that in the merge.
data a;
do row=1 by until(last.y);
set dsa;
by x y;
output;
end;
run;
Similarly for DSB.
data want;
merge A(in=in1) B(in=in2);
by x y ROW ;
if in1 and in2;
run;
You could also just take advantage of how SAS normally handles MANY to MANY merges. Only output when both datasets contribute a record and reset the IN= variables so that it is only true when the data set is contributing a NEW observation.
data dsa;
input x y $ id @@;
cards;
1 a 1 1 a 2 1 a 3 1 a 4 1 b 5 2 c 6
run;
data dsb;
input x y $ @@;
cards;
1 a 1 a 1 b 2 c
run;
data want ;
call missing(in1,in2);
merge dsa(in=in1) dsb(in=in2);
by x y ;
if in1 and in2 ;
run;
You could brute force it by adding a counter within the BY values to both tables and include that in the merge.
data a;
do row=1 by until(last.y);
set dsa;
by x y;
output;
end;
run;
Similarly for DSB.
data want;
merge A(in=in1) B(in=in2);
by x y ROW ;
if in1 and in2;
run;
You could also just take advantage of how SAS normally handles MANY to MANY merges. Only output when both datasets contribute a record and reset the IN= variables so that it is only true when the data set is contributing a NEW observation.
data dsa;
input x y $ id @@;
cards;
1 a 1 1 a 2 1 a 3 1 a 4 1 b 5 2 c 6
run;
data dsb;
input x y $ @@;
cards;
1 a 1 a 1 b 2 c
run;
data want ;
call missing(in1,in2);
merge dsa(in=in1) dsb(in=in2);
by x y ;
if in1 and in2 ;
run;
Thank you very much, Tom. Actually the method 2 works like dream!.
But the first method doesn't work, I know what you try to do, but your code may have some glitches. Seems my SAS (v9.3) doesn't recognize keyword UNTIL.
Should be
do row=1 by 1 until(last.y);
UNTIL was not the problem. It was the missing value for the BY .
Thanks Tom. Yes, I modified your code the same way, but it didn't merge the datasets as we expected. But your idea is a solution too, I think we just need to modify the code a little bit.
Generates the same output for me.
data a;
do row =1 by 1 until(last.y);
set dsa;
by x y;
output;
end;
run;
data b;
do row =1 by 1 until(last.y);
set dsb;
by x y;
output;
end;
run;
data want2 ;
merge a(in=in1) b(in=in2);
by x y ROW ;
if in1 and in2;
run;
proc compare data=want compare=want2;
run;
Thanks Tom. yap! it works! keyword OUTPUT that I missed in my code makes big different.
Thanks again for the awesome solutions.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.