I thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I am convinced that the following query should return three records, but it returns eleven. Am I missing something basic or is SAS not following the rules?
DATA class_grades;
INPUT student_id $ class $ grade $ ;
DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
LEFT JOIN class_grades t2
ON t1.student_id = t2.student_id
AND t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;
Title;
======== OUTPUT =========
obs student_id mth101_grade mth102_grade
--------------------
1 101 C B
2 101 B
3 102 B A
4 102 A
5 103 A A
6 103 A
7 104 D
8 105 C
9 106 A
10 106 A
11 107 B
======== DESIRED OUTPUT =========
student_id mth101_grade mth102_grade
101 C B
102 B A
103 A A
104 D <blank> <- this was added later (Apologies to first respondent)
105 C <blank> <- this was added later (Apologies to first respondent)
A left join brings everything from the first table.
An inner join is what you want here, or use a WHERE to filter the data (ON won't filter, just controls the JOIN conditions).
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
left JOIN class_grades t2
ON t1.student_id = t2.student_id
WHERE t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;
OR
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
inner JOIN class_grades t2
ON t1.student_id = t2.student_id
and t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;
@optimist wrote:
I thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I am convinced that the following query should return three records, but it returns eleven. Am I missing something basic or is SAS not following the rules?
DATA class_grades;
INPUT student_id $ class $ grade $ ;
DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
LEFT JOIN class_grades t2
ON t1.student_id = t2.student_id
AND t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;Title;
======== OUTPUT =========
obs student_id mth101_grade mth102_grade
--------------------
1 101 C B
2 101 B
3 102 B A
4 102 A
5 103 A A
6 103 A
7 104 D
8 105 C
9 106 A
10 106 A
11 107 B
======== DESIRED OUTPUT =========
student_id mth101_grade mth102_grade
101 C B
102 B A
103 A A
Reeza,
Thanks for the quick reply. I made a mistake in saying what I wanted as my desired output. (Sorry! I had to map my real query to the one shown below and screwed up on the desired results.) I should have said I wanted to see:
101 C B
102 B A
103 A A104 D <Blank>
105 C <Blank>
In code, I could have done it this way:
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades (WHERE=(class = 'MTH101')) t1
LEFT JOIN class_grades (WHERE=(class = 'MTH102')) t2
ON t1.student_id = t2.student_id ;
QUIT;
But I'm trying to find out if there is a problem with how I am thinking about joins. According to the reference I cited, I thought the first query I showed (without the SAS dataset options) should work for a self join.
Applying conditions in outer joins is often tricky. The starting point is always a cartesian product. Then join conditions are applied. Then WHERE conditions do further subsetting. Not necessarily what the programmer expected. You'll get your expected result by applying a WHERE condition first:
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM (select * from class_grades where class = 'MTH101') as t1
LEFT JOIN (select * from class_grades where class = 'MTH102') as t2
ON t1.student_id = t2.student_id;
QUIT;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.