DATA Step, Macro, Functions and more

Key for several tables in Proc SQL

Reply
Frequent Contributor
Posts: 142

Key for several tables in Proc SQL

[ Edited ]

Could you please remind how to join several tables 

 

proc sql;
      select key.x1, bb.x1, cc.x2, dd.x3, ee.x4
from x1,x2,x3,x4
where key.x1=key.x2=key.x3=key.x4
;
quit;

 What is the correct statement: key.x1=key.x2=key.x3=key.x4 ?

 

Respected Advisor
Posts: 4,173

Re: Key for several tables in Proc SQL

[ Edited ]
Posted in reply to DmytroYermak

@DmytroYermak

You need to refer to variables using <table name|table alias>.<variable name> so it should be x1.key and not key.x1

Assuming you want an inner join you need an AND between the conditions so something like:

x1.key=x2.key and x1.key=x3.key and x1.key=x4.key and x2.key=x3.key and x2.key=x4.key and x3.key=x4.key

 

Contributor
Posts: 20

Re: Key for several tables in Proc SQL

Frequent Contributor
Posts: 142

Re: Key for several tables in Proc SQL

Thank you, Patrick. I actually thought about this solution but wanted to find anything shorter. That is, actually, the case.
Frequent Contributor
Posts: 142

Re: Key for several tables in Proc SQL

You are right, it is my mistake, definitely xn.key=..
Super User
Posts: 19,878

Re: Key for several tables in Proc SQL

Posted in reply to DmytroYermak
Here's a link to a SAS SQL tutorial:
https://onlinecourses.science.psu.edu/stat482/node/39
Respected Advisor
Posts: 4,936

Re: Key for several tables in Proc SQL

Posted in reply to DmytroYermak

Sometimes there is a way to make a SAS/SQL join without having to name the variables. If the keys in your tables are the only column names your tables have in common, you can request natural joins.

 

proc sql;

select *

from x1 natural join x2 natural join x3 natural join x4;

quit;

 

The joins will be performed on all shared column names.

PG
Frequent Contributor
Posts: 142

Re: Key for several tables in Proc SQL

[ Edited ]
Posted in reply to DmytroYermak

I have to comment my question as I think that mislead you in someway.

I have the following tables:

6.jpg

I think that the optimal code is as mentioned here - https://communities.sas.com/t5/SAS-Data-Management/PROC-SQL-Joins-on-3-Tables/td-p/196808:

 

 

proc sql
  create table abc as
  select X
  from table1
  left join table2
  on table1.X = table2.X
  left join table3
  on table1.X = table3.X;
quit;

 

The question is how to eliminate columns C and D.

 

I understand that the solution was metioned above - to use just simple 'select...from' statement with the following 'where':

table1.x=table2.x and table1.x=table2.x and table1.x=table3.x. But would it be possible to use join and eliminate the appropriate columns C and D as in real task the number of tables is quite significant as well as the number of columns to be selected?

Respected Advisor
Posts: 4,936

Re: Key for several tables in Proc SQL

Posted in reply to DmytroYermak

select table1.x, table2.b, table3.d

from ...

PG
Frequent Contributor
Posts: 142

Re: Key for several tables in Proc SQL

Could you please write the 'where' statement as well?
Respected Advisor
Posts: 4,173

Re: Key for several tables in Proc SQL

Posted in reply to DmytroYermak

@DmytroYermak


DmytroYermak wrote:
Could you please write the 'where' statement as well?

The rest of the SQL code you've posted was fine for the example you've posted. Just combine the SELECT bit as posted by @PGStats with the other parts you've posted already.

Super User
Super User
Posts: 7,080

Re: Key for several tables in Proc SQL

Posted in reply to DmytroYermak

DmytroYermak wrote:

I have to comment my question as I think that mislead you in someway.

I have the following tables:

6.jpg

I think that the optimal code is as mentioned here - https://communities.sas.com/t5/SAS-Data-Management/PROC-SQL-Joins-on-3-Tables/td-p/196808:

 

 

proc sql
  create table abc as
  select X
  from table1
  left join table2
  on table1.X = table2.X
  left join table3
  on table1.X = table3.X;
quit;

 

The question is how to eliminate columns C and D.

 

I understand that the solution was metioned above - to use just simple 'select...from' statement with the following 'where':

table1.x=table2.x and table1.x=table2.x and table1.x=table3.x. But would it be possible to use join and eliminate the appropriate columns C and D as in real task the number of tables is quite significant as well as the number of columns to be selected?


Sometimes it is easier to take advantage of SAS dataset options.  

For example if you want everything but C and D then you drop them on the way out.

proc sql ;
create table abc(drop=C D) as
  select *
  from table1
  natural left join table2
  natural left join table3
;
quit;

Or drop them on the way in.

proc sql ;
create table abc as
  select *
  from table1
  natural left join table2(drop=C)
  natural left join table3(drop=D)
;
quit;
Respected Advisor
Posts: 4,936

Re: Key for several tables in Proc SQL

Hi @Tom, I also thought of this solution, but natural left joins are somewhat counter intuitive. There seems to be no way to force

 

select x, b, d
from t1 natural left join t2 natural left join t3;

 

to be like

 

select t1.x, b, d
from t1 left join t2 on t1.x=t2.x left join t3 on t1.x=t3.x;

 

it keeps being interpreted as

 

select t1.x, b, d
from t1 left join t2 on t1.x=t2.x left join t3 on t2.x=t3.x;

 

even with parentheses, which means that t1 cannot be treated as the key reference for both joins.

 

One way around this with natural joins is

 

data t1;

infile datalines missover;

input x a$;

datalines;

1

2 a

3 a

4 a

5 a

6

;

data t2;

infile datalines missover;

input x b$ c$;

datalines;

1 b c

3 b c

5 b c

;

data t3;

infile datalines missover;

input x d$ e$;

datalines;

2 d e

4 d e

6 d e

;

proc sql;

select x, b, d

from

( select *

from t1 natural left join t2 )

natural join

( select *

from t1 natural left join t3 )

;

quit;

 

PG
Super User
Super User
Posts: 7,080

Re: Key for several tables in Proc SQL

@PGStats

Good to know about how SAS performs the NATURAL joins.

 

In this case you can nest the queries to get SAS to do the same thing as it would in a normal merge.

Note that if you want the SQL process to put the variables in the same order as a normal merge would then you need to reference the tables in SQL query in the reverse order. 

data want ;
  merge t1(in=in1) t2 t3;
  by x ;
  if in1;
run;

proc sql ;
  create table test2 as
  select * 
  from (select * from t3 natural full join t2 ) c1
  natural right join t1
  ;
quit;
proc print data=want;
run;
proc print data=test2;
run;
proc compare data=want compare=test2 ; run;
Obs    x    a    b    c    d    e

 1     1         b    c
 2     2    a              d    e
 3     3    a    b    c
 4     4    a              d    e
 5     5    a    b    c
 6     6                   d    e


Obs    x    a    b    c    d    e

 1     1         b    c
 2     2    a              d    e
 3     3    a    b    c
 4     4    a              d    e
 5     5    a    b    c
 6     6                   d    e



The COMPARE Procedure
Comparison of WORK.WANT with WORK.TEST2
(Method=EXACT)

NOTE: No unequal values were found. All values compared are exactly equal.
Frequent Contributor
Posts: 142

Re: Key for several tables in Proc SQL

Thank you both for the detailed analysis. I wanted to put here my not fancy code but am seeing that have to investigatate yours that is working ).

Ask a Question
Discussion stats
  • 15 replies
  • 314 views
  • 0 likes
  • 6 in conversation