BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Doug____
Pyrite | Level 9

I have a left table with 153 unduplicated observations and a right table with 15000+ observations.

 

If I use a where clause on the left table it only returns 153 records. If I remove the where clause and subset the data prior to query processing it returns the correct result (201 records). Why does this happen? Should it not return the same number of records? It acts like an inner join if the where clause is present regardless of the join used.

 

This returns 201 records (correct);
proc sql;
create table checkit as
select
a.an,
b.an as cfa,
b.load_date,
b.amount,
b.data_source,
b.fa_code
from
list a
left join
rfcorex b<<<-subset prior to query processing
on
a.an= b.an
;

 

This returns 153 records (inner join behavior  - not correct)

proc sql;
create table checkit as
select
a.an,
b.an as cfa,
b.load_date,
b.amount,
b.data_source,
b.fa_code
from
list a
left join
rfcore b<<<-no subset prior to query processing
on
a.an= b.an

where b.data_source contains "XXXXX"

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The Where clause you have is applied after the join in the second case so ONLY includes any records where the clause is true.

 

Left join in the first case can create multiple records from match  AND all of the records from the A data set are included regardless of the value of any of the B variables, such as not present in the B set at all based on the ON criteria.

 

data work.one;
   input x y;
datalines;
1 1
2 2
3 3
;

data work.two;
   input x y;
datalines;
5  3
;

proc sql;
   create table work.merged as
   select a.x , b.y
   from work.one as a
       left join
       work.two as b
       on a.x=b.x
   ;
quit;
/* result 3 records, all from work.one*/
proc sql;
   create table work.merged2 as
   select a.x , b.y
   from work.one as a
       left join
       work.two as b
       on a.x=b.x
   where b.y=3         
   ;
quit;
/* result 0 records because none of x match and the where would only
   output matches where the work.two y value is 3*/

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

@Doug____ wrote:

I have a left table with 153 unduplicated observations and a right table with 15000+ observations.

 

If I use a where clause on the left table it only returns 153 records. If I remove the where clause and subset the data prior to query processing it returns the correct result (201 records). Why does this happen? Should it not return the same number of records? It acts like an inner join if the where clause is present regardless of the join used.


These questions are impossible to answer without having the exact data sets (and code to do the subsetting) that you are using.

 

Perhaps you could make a smaller example that illustrates the problem, say 10 to 20 records, and provide that to us following these instructions (do not skip this step): https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller
Tom
Super User Tom
Super User

WHERE filters on the way in.

If you want to filter after the join use HAVING.

ballardw
Super User

The Where clause you have is applied after the join in the second case so ONLY includes any records where the clause is true.

 

Left join in the first case can create multiple records from match  AND all of the records from the A data set are included regardless of the value of any of the B variables, such as not present in the B set at all based on the ON criteria.

 

data work.one;
   input x y;
datalines;
1 1
2 2
3 3
;

data work.two;
   input x y;
datalines;
5  3
;

proc sql;
   create table work.merged as
   select a.x , b.y
   from work.one as a
       left join
       work.two as b
       on a.x=b.x
   ;
quit;
/* result 3 records, all from work.one*/
proc sql;
   create table work.merged2 as
   select a.x , b.y
   from work.one as a
       left join
       work.two as b
       on a.x=b.x
   where b.y=3         
   ;
quit;
/* result 0 records because none of x match and the where would only
   output matches where the work.two y value is 3*/

Doug____
Pyrite | Level 9

I took the where clause off and just did the join first then selected the records I wanted which seems to work.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 936 views
  • 0 likes
  • 4 in conversation