I have a left table with 153 unduplicated observations and a right table with 15000+ observations.
If I use a where clause on the left table it only returns 153 records. If I remove the where clause and subset the data prior to query processing it returns the correct result (201 records). Why does this happen? Should it not return the same number of records? It acts like an inner join if the where clause is present regardless of the join used.
This returns 201 records (correct);
proc sql;
create table checkit as
select
a.an,
b.an as cfa,
b.load_date,
b.amount,
b.data_source,
b.fa_code
from
list a
left join
rfcorex b<<<-subset prior to query processing
on
a.an= b.an
;
This returns 153 records (inner join behavior - not correct)
proc sql;
create table checkit as
select
a.an,
b.an as cfa,
b.load_date,
b.amount,
b.data_source,
b.fa_code
from
list a
left join
rfcore b<<<-no subset prior to query processing
on
a.an= b.an
where b.data_source contains "XXXXX"
The Where clause you have is applied after the join in the second case so ONLY includes any records where the clause is true.
Left join in the first case can create multiple records from match AND all of the records from the A data set are included regardless of the value of any of the B variables, such as not present in the B set at all based on the ON criteria.
data work.one; input x y; datalines; 1 1 2 2 3 3 ; data work.two; input x y; datalines; 5 3 ; proc sql; create table work.merged as select a.x , b.y from work.one as a left join work.two as b on a.x=b.x ; quit; /* result 3 records, all from work.one*/ proc sql; create table work.merged2 as select a.x , b.y from work.one as a left join work.two as b on a.x=b.x where b.y=3 ; quit; /* result 0 records because none of x match and the where would only output matches where the work.two y value is 3*/
@Doug____ wrote:
I have a left table with 153 unduplicated observations and a right table with 15000+ observations.
If I use a where clause on the left table it only returns 153 records. If I remove the where clause and subset the data prior to query processing it returns the correct result (201 records). Why does this happen? Should it not return the same number of records? It acts like an inner join if the where clause is present regardless of the join used.
These questions are impossible to answer without having the exact data sets (and code to do the subsetting) that you are using.
Perhaps you could make a smaller example that illustrates the problem, say 10 to 20 records, and provide that to us following these instructions (do not skip this step): https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
WHERE filters on the way in.
If you want to filter after the join use HAVING.
The Where clause you have is applied after the join in the second case so ONLY includes any records where the clause is true.
Left join in the first case can create multiple records from match AND all of the records from the A data set are included regardless of the value of any of the B variables, such as not present in the B set at all based on the ON criteria.
data work.one; input x y; datalines; 1 1 2 2 3 3 ; data work.two; input x y; datalines; 5 3 ; proc sql; create table work.merged as select a.x , b.y from work.one as a left join work.two as b on a.x=b.x ; quit; /* result 3 records, all from work.one*/ proc sql; create table work.merged2 as select a.x , b.y from work.one as a left join work.two as b on a.x=b.x where b.y=3 ; quit; /* result 0 records because none of x match and the where would only output matches where the work.two y value is 3*/
I took the where clause off and just did the join first then selected the records I wanted which seems to work.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.