i have 2 datasets :
A 1 p
B 2 p
C 1 p
D 1 p
and :
B 2 s
D 1 s
and want to merge to this :
A 1 p
B 2 s
C 1 p
D 1 s
this is the V1 is always populated with values from both tables A and B
data a;
input ID1 $ ID2 V1 $;
datalines;
A 1 p
B 2 p
C 1 p
D 1 p
;
run;
data b;
input ID1 $ ID2 V1 $;
datalines;
B 2 s
D 1 s
;
run;
proc sql;
create table c as
select
x.ID1,
x.ID2,
y.v1
from A as x left join B as y
on x.ID1 =y.ID1 and x.ID2 =y.ID2;
quit;
data a; input ID1 $ ID2 V1 $; datalines; A 1 p B 2 p C 1 p D 1 p ; run; data b; input ID1 $ ID2 V1 $; datalines; B 2 s D 1 s ; run; proc sql; create table c as select x.ID1, x.ID2, coalescec(y.v1,x.v1) as v1 from A as x left join B as y on x.ID1 =y.ID1 and x.ID2 =y.ID2; quit;
data a; input ID1 $ ID2 V1 $; datalines; A 1 p B 2 p C 1 p D 1 p ; run; data b; input ID1 $ ID2 V1 $; datalines; B 2 s D 1 s ; run; proc sql; create table c as select x.ID1, x.ID2, coalescec(y.v1,x.v1) as v1 from A as x left join B as y on x.ID1 =y.ID1 and x.ID2 =y.ID2; quit;
Thank you for providing all the sample data as fully working data steps, providing the SQL you've already got and also showing us the desired result. That made it really simple to "fill in" the last little gap which was that you didn't actually formulate your question.
What you're looking for is the coalesce() function which will pick the first non-missing value.
data a;
input ID1 $ ID2 V1 $;
datalines;
A 1 p
B 2 p
C 1 p
D 1 p
;
run;
data b;
input ID1 $ ID2 V1 $;
datalines;
B 2 s
D 1 s
;
run;
proc sql;
create table c as
select
x.ID1,
x.ID2,
coalesce(y.v1,x.v1) as v1
from A as x left join B as y
on x.ID1 =y.ID1 and x.ID2 =y.ID2;
quit;
For the posted code you could just use normal SAS code instead of SQL. The only reason to be forced to have to jury rig something using SQL would be if you wanted to do a many to many join.
If you want the values from B to overwrite the values from A then use a MERGE.
data want;
merge a b;
by id1;
run;
If you only want the non-missing values from B to override the values from A then use UPDATE instead.
data want;
update a b;
by id1;
run;
Generic comment: If there are no duplicates of the ID and ID2 combination in the first data set then this is actually an UPDATE, replacing a value on matching values.
data c; update a b; by id1 id2; run;
Though the data step UPDATE will require sets A and B to be sorted by the By variables if not actually in that order.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.