BookmarkSubscribeRSS Feed
optimist
Calcite | Level 5

I  thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I   am convinced that the following query should return three records, but it returns eleven.  Am I missing something basic or is SAS not following the rules?

 

DATA class_grades;
    INPUT student_id $ class $ grade $ ;
    DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1
  LEFT JOIN class_grades t2
    ON t1.student_id = t2.student_id
      AND t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

Title;

 

 

======== OUTPUT =========

obs  student_id   mth101_grade  mth102_grade
 --------------------  
 1 101 C  B
 2 101 B
 3 102 B  A
 4 102 A
 5 103 A  A
 6 103 A
 7 104 D
 8 105 C
 9 106 A
10 106 A
11 107 B

 

======== DESIRED OUTPUT =========

 student_id   mth101_grade  mth102_grade

101 C  B
102 B  A
103 A  A

104 D  <blank>      <- this was added later (Apologies to first respondent) 

105 C  <blank>      <- this was added later (Apologies to first respondent)

4 REPLIES 4
LinusH
Tourmaline | Level 20
SQL Join always result in a product which is filtered by any join/where clause. So if you have multiple rows for the join key in one table, the result will have it as well.
Data never sleeps
Reeza
Super User

A left join brings everything from the first table.

 

An inner join is what you want here, or use a WHERE to filter the data (ON won't filter, just controls the JOIN conditions). 

 

TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1 
  left JOIN class_grades t2
    ON t1.student_id = t2.student_id 
      WHERE t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

 OR

 

TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1 
inner JOIN class_grades t2
    ON t1.student_id = t2.student_id 
      and t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

@optimist wrote:

I  thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I   am convinced that the following query should return three records, but it returns eleven.  Am I missing something basic or is SAS not following the rules?

 

DATA class_grades;
    INPUT student_id $ class $ grade $ ;
    DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades t1
  LEFT JOIN class_grades t2
    ON t1.student_id = t2.student_id
      AND t1.class = 'MTH101'
      AND t2.class = 'MTH102';
QUIT;

Title;

 

 

======== OUTPUT =========

obs  student_id   mth101_grade  mth102_grade
 --------------------  
 1 101 C  B
 2 101 B
 3 102 B  A
 4 102 A
 5 103 A  A
 6 103 A
 7 104 D
 8 105 C
 9 106 A
10 106 A
11 107 B

 

======== DESIRED OUTPUT =========

 student_id   mth101_grade  mth102_grade

101 C  B
102 B  A
103 A  A


 

optimist
Calcite | Level 5

Reeza,

 

Thanks for the quick reply.   I made a mistake in saying what I wanted as my desired output. (Sorry!  I had to map my real query to the one shown below and screwed up on the desired results.)  I should have said I wanted to see: 

101 C  B
102 B  A
103 A  A

104 D  <Blank>
105 C  <Blank>

 

In code, I could have done it this way: 

 

PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM class_grades (WHERE=(class = 'MTH101'))  t1 
  LEFT JOIN class_grades (WHERE=(class = 'MTH102')) t2
    ON t1.student_id = t2.student_id ;
QUIT;

 

But I'm trying to find out if there is a problem with how I am thinking about joins. According to the reference I cited, I thought the first query I showed (without the SAS dataset options) should work for a self join.   

 

 

 

 

 

PGStats
Opal | Level 21

Applying conditions in outer joins is often tricky. The starting point is always a cartesian product. Then join conditions are applied. Then WHERE conditions do further subsetting. Not necessarily  what the programmer expected. You'll get your expected result by applying a WHERE condition first:

 

TITLE 'Self Left Join';
PROC SQL;
  SELECT t1.student_id
    ,t1.grade AS calc1_grade
    ,t2.grade AS calc2_grade
  FROM (select * from class_grades where class = 'MTH101') as t1 
  LEFT JOIN (select * from class_grades where class = 'MTH102') as t2
    ON t1.student_id = t2.student_id;
QUIT;
PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 818 views
  • 2 likes
  • 4 in conversation