BookmarkSubscribeRSS Feed
optimist
Calcite | Level 5

I  thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I   am convinced that the following query should return three records, but it returns eleven.  Am I missing something basic or is SAS not following the rules?

 

DATA class_grades;
    INPUT student_id $ class $ grade $ ;
    DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1
  LEFT JOIN class_grades t2
    ON t1.student_id = t2.student_id
      AND t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

Title;

 

 

======== OUTPUT =========

obs  student_id   mth101_grade  mth102_grade
 --------------------  
 1 101 C  B
 2 101 B
 3 102 B  A
 4 102 A
 5 103 A  A
 6 103 A
 7 104 D
 8 105 C
 9 106 A
10 106 A
11 107 B

 

======== DESIRED OUTPUT =========

 student_id   mth101_grade  mth102_grade

101 C  B
102 B  A
103 A  A

104 D  <blank>      <- this was added later (Apologies to first respondent) 

105 C  <blank>      <- this was added later (Apologies to first respondent)

4 REPLIES 4
LinusH
Tourmaline | Level 20
SQL Join always result in a product which is filtered by any join/where clause. So if you have multiple rows for the join key in one table, the result will have it as well.
Data never sleeps
Reeza
Super User

A left join brings everything from the first table.

 

An inner join is what you want here, or use a WHERE to filter the data (ON won't filter, just controls the JOIN conditions). 

 

TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1 
  left JOIN class_grades t2
    ON t1.student_id = t2.student_id 
      WHERE t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

 OR

 

TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1 
inner JOIN class_grades t2
    ON t1.student_id = t2.student_id 
      and t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

@optimist wrote:

I  thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I   am convinced that the following query should return three records, but it returns eleven.  Am I missing something basic or is SAS not following the rules?

 

DATA class_grades;
    INPUT student_id $ class $ grade $ ;
    DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1
  LEFT JOIN class_grades t2
    ON t1.student_id = t2.student_id
      AND t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

Title;

 

 

======== OUTPUT =========

obs  student_id   mth101_grade  mth102_grade
 --------------------  
 1 101 C  B
 2 101 B
 3 102 B  A
 4 102 A
 5 103 A  A
 6 103 A
 7 104 D
 8 105 C
 9 106 A
10 106 A
11 107 B

 

======== DESIRED OUTPUT =========

 student_id   mth101_grade  mth102_grade

101 C  B
102 B  A
103 A  A


 

optimist
Calcite | Level 5

Reeza,

 

Thanks for the quick reply.   I made a mistake in saying what I wanted as my desired output. (Sorry!  I had to map my real query to the one shown below and screwed up on the desired results.)  I should have said I wanted to see: 

101 C  B
102 B  A
103 A  A

104 D  <Blank>
105 C  <Blank>

 

In code, I could have done it this way: 

 

PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades (WHERE=(class = 'MTH101'))  t1 
  LEFT JOIN class_grades (WHERE=(class = 'MTH102')) t2
    ON t1.student_id = t2.student_id ;
QUIT;

 

But I'm trying to find out if there is a problem with how I am thinking about joins. According to the reference I cited, I thought the first query I showed (without the SAS dataset options) should work for a self join.   

 

 

 

 

 

PGStats
Opal | Level 21

Applying conditions in outer joins is often tricky. The starting point is always a cartesian product. Then join conditions are applied. Then WHERE conditions do further subsetting. Not necessarily  what the programmer expected. You'll get your expected result by applying a WHERE condition first:

 

TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM (select * from class_grades where class = 'MTH101') as t1 
  LEFT JOIN (select * from class_grades where class = 'MTH102') as t2
    ON t1.student_id = t2.student_id;
QUIT;
PG
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1230 views
  • 2 likes
  • 4 in conversation