I thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I am convinced that the following query should return three records, but it returns eleven. Am I missing something basic or is SAS not following the rules?
DATA class_grades;
INPUT student_id $ class $ grade $ ;
DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
LEFT JOIN class_grades t2
ON t1.student_id = t2.student_id
AND t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;
Title;
======== OUTPUT =========
obs student_id mth101_grade mth102_grade
--------------------
1 101 C B
2 101 B
3 102 B A
4 102 A
5 103 A A
6 103 A
7 104 D
8 105 C
9 106 A
10 106 A
11 107 B
======== DESIRED OUTPUT =========
student_id mth101_grade mth102_grade
101 C B
102 B A
103 A A
104 D <blank> <- this was added later (Apologies to first respondent)
105 C <blank> <- this was added later (Apologies to first respondent)
A left join brings everything from the first table.
An inner join is what you want here, or use a WHERE to filter the data (ON won't filter, just controls the JOIN conditions).
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
left JOIN class_grades t2
ON t1.student_id = t2.student_id
WHERE t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;
OR
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
inner JOIN class_grades t2
ON t1.student_id = t2.student_id
and t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;
@optimist wrote:
I thought I knew how outer joins worked in SQL, but this simple join has me stumped. After referring to SQL for Smarties: Advanced SQL Programming, 3rd Ed, by Joe Celko (2005), I am convinced that the following query should return three records, but it returns eleven. Am I missing something basic or is SAS not following the rules?
DATA class_grades;
INPUT student_id $ class $ grade $ ;
DATALINES;
101 MTH101 C
102 MTH101 B
103 MTH101 A
104 MTH101 D
105 MTH101 C
106 MTH103 A
101 MTH102 B
102 MTH102 A
103 MTH102 A
106 MTH201 A
107 MTH102 B
;
RUN;
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades t1
LEFT JOIN class_grades t2
ON t1.student_id = t2.student_id
AND t1.class = 'MTH101'
AND t2.class = 'MTH102';
QUIT;Title;
======== OUTPUT =========
obs student_id mth101_grade mth102_grade
--------------------
1 101 C B
2 101 B
3 102 B A
4 102 A
5 103 A A
6 103 A
7 104 D
8 105 C
9 106 A
10 106 A
11 107 B
======== DESIRED OUTPUT =========
student_id mth101_grade mth102_grade
101 C B
102 B A
103 A A
Reeza,
Thanks for the quick reply. I made a mistake in saying what I wanted as my desired output. (Sorry! I had to map my real query to the one shown below and screwed up on the desired results.) I should have said I wanted to see:
101 C B
102 B A
103 A A104 D <Blank>
105 C <Blank>
In code, I could have done it this way:
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM class_grades (WHERE=(class = 'MTH101')) t1
LEFT JOIN class_grades (WHERE=(class = 'MTH102')) t2
ON t1.student_id = t2.student_id ;
QUIT;
But I'm trying to find out if there is a problem with how I am thinking about joins. According to the reference I cited, I thought the first query I showed (without the SAS dataset options) should work for a self join.
Applying conditions in outer joins is often tricky. The starting point is always a cartesian product. Then join conditions are applied. Then WHERE conditions do further subsetting. Not necessarily what the programmer expected. You'll get your expected result by applying a WHERE condition first:
TITLE 'Self Left Join';
PROC SQL;
SELECT t1.student_id
,t1.grade AS calc1_grade
,t2.grade AS calc2_grade
FROM (select * from class_grades where class = 'MTH101') as t1
LEFT JOIN (select * from class_grades where class = 'MTH102') as t2
ON t1.student_id = t2.student_id;
QUIT;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.