Solved: Full join between 3 tables - SAS Data Integration Studio

InêsMaximiano · Posted 07-22-2022 11:46 AM

Hello! How can I make this join between these 3 tables?

Patrick · Posted 07-23-2022 07:10 AM

As others already shared first you need to have certainty how the SQL logic needs to look like ...and then you need to define this logic in DI Studio.

Based on the sample data @A_SAS_Man created for you below two SQL coding options

data a;
 input Id  A $1. ;
 datalines;
1 X
2 X
3 X
4 X
5 X
6 X
8 X
;

data b;
 input Id B $1.;
 datalines;
4 Y
5 Y
6 Y
7 Y
8 Y
9 Y
;

 data c;
 input Id C $1.;
 datalines;
1 Z
2 Z
8 Z
9 Z
10 Z
11 Z
;

/* option 1 */
proc sql;
  create table want as
    select 
    coalesce(a.id,b.id,c.id) as id,
    a.a,
    b.b,
    c.c
    from 
      a 
      full outer join b
        on a.id=b.id
      full outer join c
        on a.id=c.id or b.id=c.id
    order by id
  ;
quit;

/* option 2 */
proc sql;
  create view v_inter as
    select 
    coalesce(a.id,b.id) as id,
    a.a,
    b.b
    from 
      a full outer join b
        on a.id=b.id
  ;
quit;

proc sql;
  create table want as
    select 
    coalesce(i.id,c.id) as id,
    i.a,
    i.b,
    c.c
    from 
      v_inter i full outer join c
        on i.id=c.id
    order by id
  ;
quit;

As @ErikLund_Jensen hints it's sometimes not that intuitive to define a SQL using the SQL transformation even if you exactly know what syntax you want it to generate.

For this reason I sometimes used to "break" things up into more simple SQLs. In above code you could implement option 2 using two SQL transformation nodes - the first one creates the view, the second one then joins your 3rd table to the view to create the table you're after. Because the first SQL only creates a view going for two nodes in the DI flow won't have a negative impact on performance (a view is basically encapsulated SQL code that only gets executed when you use the view ...which is only in the 2nd SQL node).

View solution in original post

A_SAS_Man · Posted 07-22-2022 01:19 PM

data a;
 input Id  A $1. ;
 datalines;
1 X
2 X
3 X
4 X
5 X
6 X
 ;

data b;
 input Id B $1.;
 datalines;
4 Y
5 Y
6 Y
7 Y
8 Y
9 Y
 ;

 data c;
 input Id C $1.;
 datalines;
1 Z
2 Z
8 Z
9 Z
10 Z
11 Z
 ;

proc sql;
create table Output_Table as
	select case when a.Id is not missing
				then a.Id
				when b.Id is not missing
				then b.Id
				when c.Id is not missing
				then c.Id
			end as Id,
			a.A,
		   b.B,
		   c.C
	from a full outer join b 
		 on
		 (a.Id=b.Id)
		 full outer join c on
		 (a.Id=c.Id)
		 or
		 (b.Id=c.Id)
	order by Id
;
quit;

ballardw · Posted 07-22-2022 02:39 PM

If none of the data sets have repeats of the Id variable and are sorted by Id:

data want;
   merge a b c;
   by id;
run;

given the example data. If there are other variables involved then more details.

If you wanted a point and click approach, can't help.

ErikLund_Jensen · Posted 07-22-2022 02:42 PM

Hi @InêsMaximiano

As shown in the previous answer, the code for a join of 3 tables is not complicated in itself. But building the code in Data Integration Studio's Join transformation is a different matter. It is more complicated, and it is harder to debug and maintain afterwards. So my advice is: Dont do it. Use two joins instead and make a full join of two tables, and then join the result with the third table.

Add the expression coalesce(tab1.ID, tab2.ID) in the mapping pane in order to get the ID in the output table's ID column whether the input for the current row comes from one or the other or both input tables.

If you insist on using one join-transformation only, there is an explanation in the DI Studio Help, in the page "Adding a Join to an SQL Query on the Designer Tab". But it is so much easier to use two joins instead.

Patrick · Posted 07-23-2022 07:10 AM

As others already shared first you need to have certainty how the SQL logic needs to look like ...and then you need to define this logic in DI Studio.

Based on the sample data @A_SAS_Man created for you below two SQL coding options

data a;
 input Id  A $1. ;
 datalines;
1 X
2 X
3 X
4 X
5 X
6 X
8 X
;

data b;
 input Id B $1.;
 datalines;
4 Y
5 Y
6 Y
7 Y
8 Y
9 Y
;

 data c;
 input Id C $1.;
 datalines;
1 Z
2 Z
8 Z
9 Z
10 Z
11 Z
;

/* option 1 */
proc sql;
  create table want as
    select 
    coalesce(a.id,b.id,c.id) as id,
    a.a,
    b.b,
    c.c
    from 
      a 
      full outer join b
        on a.id=b.id
      full outer join c
        on a.id=c.id or b.id=c.id
    order by id
  ;
quit;

/* option 2 */
proc sql;
  create view v_inter as
    select 
    coalesce(a.id,b.id) as id,
    a.a,
    b.b
    from 
      a full outer join b
        on a.id=b.id
  ;
quit;

proc sql;
  create table want as
    select 
    coalesce(i.id,c.id) as id,
    i.a,
    i.b,
    c.c
    from 
      v_inter i full outer join c
        on i.id=c.id
    order by id
  ;
quit;

As @ErikLund_Jensen hints it's sometimes not that intuitive to define a SQL using the SQL transformation even if you exactly know what syntax you want it to generate.

For this reason I sometimes used to "break" things up into more simple SQLs. In above code you could implement option 2 using two SQL transformation nodes - the first one creates the view, the second one then joins your 3rd table to the view to create the table you're after. Because the first SQL only creates a view going for two nodes in the DI flow won't have a negative impact on performance (a view is basically encapsulated SQL code that only gets executed when you use the view ...which is only in the 2nd SQL node).

Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Re: Full join between 3 tables - SAS Data Integration Studio

Registration is open