Solved: Proc SQL left join

Toni2 · Posted 05-26-2022 07:52 AM

i have 2 datasets :

A 1 p
B 2 p
C 1 p
D 1 p

and :

B 2 s
D 1 s

and want to merge to this :

A 1 p
B 2 s
C 1 p
D 1 s

this is the V1 is always populated with values from both tables A and B

data a;
input ID1 $ ID2 V1 $;
 datalines;
A 1 p 
B 2 p
C 1 p
D 1 p
;
run;

data b;
input ID1 $ ID2 V1 $;
 datalines;
B 2 s
D 1 s
;
run;


proc sql;
create table c  as
 select
 x.ID1,
 x.ID2,
 y.v1
  from A  as x left join B as y
on x.ID1 =y.ID1 and x.ID2 =y.ID2; 
quit;

Ksharp · Posted 05-26-2022 08:09 AM

data a;
input ID1 $ ID2 V1 $;
 datalines;
A 1 p 
B 2 p
C 1 p
D 1 p
;
run;

data b;
input ID1 $ ID2 V1 $;
 datalines;
B 2 s
D 1 s
;
run;
proc sql;
create table c  as
 select
 x.ID1,
 x.ID2,
 coalescec(y.v1,x.v1) as v1
  from A  as x left join B as y
on x.ID1 =y.ID1 and x.ID2 =y.ID2; 
quit;

View solution in original post

Ksharp · Posted 05-26-2022 08:09 AM

data a;
input ID1 $ ID2 V1 $;
 datalines;
A 1 p 
B 2 p
C 1 p
D 1 p
;
run;

data b;
input ID1 $ ID2 V1 $;
 datalines;
B 2 s
D 1 s
;
run;
proc sql;
create table c  as
 select
 x.ID1,
 x.ID2,
 coalescec(y.v1,x.v1) as v1
  from A  as x left join B as y
on x.ID1 =y.ID1 and x.ID2 =y.ID2; 
quit;

Patrick · Posted 05-26-2022 08:12 AM

Thank you for providing all the sample data as fully working data steps, providing the SQL you've already got and also showing us the desired result. That made it really simple to "fill in" the last little gap which was that you didn't actually formulate your question.

What you're looking for is the coalesce() function which will pick the first non-missing value.

data a;
  input ID1 $ ID2 V1 $;
  datalines;
A 1 p 
B 2 p
C 1 p
D 1 p
;
run;

data b;
  input ID1 $ ID2 V1 $;
  datalines;
B 2 s
D 1 s
;
run;

proc sql;
  create table c  as
    select
      x.ID1,
      x.ID2,
      coalesce(y.v1,x.v1) as v1
    from A  as x left join B as y
      on x.ID1 =y.ID1 and x.ID2 =y.ID2;
quit;

Tom · Posted 05-26-2022 08:49 AM

For the posted code you could just use normal SAS code instead of SQL. The only reason to be forced to have to jury rig something using SQL would be if you wanted to do a many to many join.

If you want the values from B to overwrite the values from A then use a MERGE.

data want;
  merge a b;
  by id1;
run;

If you only want the non-missing values from B to override the values from A then use UPDATE instead.

data want;
  update a b;
  by id1;
run;

ballardw · Posted 05-26-2022 10:08 AM

Generic comment: If there are no duplicates of the ID and ID2 combination in the first data set then this is actually an UPDATE, replacing a value on matching values.

data c;
  update a b;
  by id1 id2;
run;

Though the data step UPDATE will require sets A and B to be sorted by the By variables if not actually in that order.

Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

Re: Proc SQL left join

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!