Hello! How can I make this join between these 3 tables?
As others already shared first you need to have certainty how the SQL logic needs to look like ...and then you need to define this logic in DI Studio.
Based on the sample data @A_SAS_Man created for you below two SQL coding options
data a;
input Id A $1. ;
datalines;
1 X
2 X
3 X
4 X
5 X
6 X
8 X
;
data b;
input Id B $1.;
datalines;
4 Y
5 Y
6 Y
7 Y
8 Y
9 Y
;
data c;
input Id C $1.;
datalines;
1 Z
2 Z
8 Z
9 Z
10 Z
11 Z
;
/* option 1 */
proc sql;
create table want as
select
coalesce(a.id,b.id,c.id) as id,
a.a,
b.b,
c.c
from
a
full outer join b
on a.id=b.id
full outer join c
on a.id=c.id or b.id=c.id
order by id
;
quit;
/* option 2 */
proc sql;
create view v_inter as
select
coalesce(a.id,b.id) as id,
a.a,
b.b
from
a full outer join b
on a.id=b.id
;
quit;
proc sql;
create table want as
select
coalesce(i.id,c.id) as id,
i.a,
i.b,
c.c
from
v_inter i full outer join c
on i.id=c.id
order by id
;
quit;
As @ErikLund_Jensen hints it's sometimes not that intuitive to define a SQL using the SQL transformation even if you exactly know what syntax you want it to generate.
For this reason I sometimes used to "break" things up into more simple SQLs. In above code you could implement option 2 using two SQL transformation nodes - the first one creates the view, the second one then joins your 3rd table to the view to create the table you're after. Because the first SQL only creates a view going for two nodes in the DI flow won't have a negative impact on performance (a view is basically encapsulated SQL code that only gets executed when you use the view ...which is only in the 2nd SQL node).
data a;
input Id A $1. ;
datalines;
1 X
2 X
3 X
4 X
5 X
6 X
;
data b;
input Id B $1.;
datalines;
4 Y
5 Y
6 Y
7 Y
8 Y
9 Y
;
data c;
input Id C $1.;
datalines;
1 Z
2 Z
8 Z
9 Z
10 Z
11 Z
;
proc sql;
create table Output_Table as
select case when a.Id is not missing
then a.Id
when b.Id is not missing
then b.Id
when c.Id is not missing
then c.Id
end as Id,
a.A,
b.B,
c.C
from a full outer join b
on
(a.Id=b.Id)
full outer join c on
(a.Id=c.Id)
or
(b.Id=c.Id)
order by Id
;
quit;
If none of the data sets have repeats of the Id variable and are sorted by Id:
data want; merge a b c; by id; run;
given the example data. If there are other variables involved then more details.
If you wanted a point and click approach, can't help.
As shown in the previous answer, the code for a join of 3 tables is not complicated in itself. But building the code in Data Integration Studio's Join transformation is a different matter. It is more complicated, and it is harder to debug and maintain afterwards. So my advice is: Dont do it. Use two joins instead and make a full join of two tables, and then join the result with the third table.
Add the expression coalesce(tab1.ID, tab2.ID) in the mapping pane in order to get the ID in the output table's ID column whether the input for the current row comes from one or the other or both input tables.
If you insist on using one join-transformation only, there is an explanation in the DI Studio Help, in the page "Adding a Join to an SQL Query on the Designer Tab". But it is so much easier to use two joins instead.
As others already shared first you need to have certainty how the SQL logic needs to look like ...and then you need to define this logic in DI Studio.
Based on the sample data @A_SAS_Man created for you below two SQL coding options
data a;
input Id A $1. ;
datalines;
1 X
2 X
3 X
4 X
5 X
6 X
8 X
;
data b;
input Id B $1.;
datalines;
4 Y
5 Y
6 Y
7 Y
8 Y
9 Y
;
data c;
input Id C $1.;
datalines;
1 Z
2 Z
8 Z
9 Z
10 Z
11 Z
;
/* option 1 */
proc sql;
create table want as
select
coalesce(a.id,b.id,c.id) as id,
a.a,
b.b,
c.c
from
a
full outer join b
on a.id=b.id
full outer join c
on a.id=c.id or b.id=c.id
order by id
;
quit;
/* option 2 */
proc sql;
create view v_inter as
select
coalesce(a.id,b.id) as id,
a.a,
b.b
from
a full outer join b
on a.id=b.id
;
quit;
proc sql;
create table want as
select
coalesce(i.id,c.id) as id,
i.a,
i.b,
c.c
from
v_inter i full outer join c
on i.id=c.id
order by id
;
quit;
As @ErikLund_Jensen hints it's sometimes not that intuitive to define a SQL using the SQL transformation even if you exactly know what syntax you want it to generate.
For this reason I sometimes used to "break" things up into more simple SQLs. In above code you could implement option 2 using two SQL transformation nodes - the first one creates the view, the second one then joins your 3rd table to the view to create the table you're after. Because the first SQL only creates a view going for two nodes in the DI flow won't have a negative impact on performance (a view is basically encapsulated SQL code that only gets executed when you use the view ...which is only in the 2nd SQL node).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.