BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dxiao2017
Lapis Lazuli | Level 10

This actually is not a question, but I was a little bit confused and frustrated by the very different thinking and coding habits people have. The practice question is as follows:

Untitled1.png

The question asks to combine two table using left join, and there are some entries exist in one table but not in the other, and these (i.e., merchants with no transactions) are what should be in the result table.

 

I tried twice using left join and did not come up with the correct answer. Then I tried except and still got no correct answer. And then I tried where var not in (subquery) and it works and got the same result as the above screen capture. My code is as follows:

proc sql;
select distinct merchantname 
         label='Merchant Name',
       merchantid
         label='Merchant ID',
       type
         label='Merchant Type',
       zip
         label='Merchant Zipcode'
   from sq.merchant
   where zip=10001 and 
         merchantid not in
            (select merchantid
               from sq.transaction)
    order by 2;
quit;

Then I was thinking how does the answer using left join to create this result. The code in the answer is as follows, and it creates the correct result (as shown in last screen capture):

dxiao2017_1-1759061059285.png

What makes me a little bit confuse and frustrated is: using left join in this situation. If the question did not ask to use left join, I would probably choose where var not in (subquery) first, because thinking this way is much easier for me. 

 

And I have two questions on using left join in the above code: (1) the query does not select any column from one of the table (i.e., table b, sq.transaction), I guess this is not the common practice of using left/right/inner join? Are we suppose to select columns from all tables when using left/right/inner join? (2) where b.merchantid is null;, this part is bit difficult to understand, also I do not know how this part work, because table b (i.e., the sq.transaction table) contains and only contains merchantid that HAS transactions, also for the merchantid column, there is no missing(or null) value, so how does SAS process where b.merchantid is null;?

 

Anyone did this question can offer some ideas? Thanks a lot!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

When you do a LEFT join it means that all observations from the LEFT table will make it into the intermediate results.  And when there is an observation that did not match any observation from the RIGHT table then all of the variables contributed by the RIGHT table will have a missing value.  That is why the WHERE condition works.

 

Personally I find your answer much easier to follow.  But if you need multiple variables to perform the join then you cannot use the IN operator. 

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

When you do a LEFT join it means that all observations from the LEFT table will make it into the intermediate results.  And when there is an observation that did not match any observation from the RIGHT table then all of the variables contributed by the RIGHT table will have a missing value.  That is why the WHERE condition works.

 

Personally I find your answer much easier to follow.  But if you need multiple variables to perform the join then you cannot use the IN operator. 

dxiao2017
Lapis Lazuli | Level 10
Thanks a lot, Tom! So the where condition works on the intermediate results.
Tom
Super User Tom
Super User

Note it is much easier to do in normal SAS instead of SQL.

data want;
  merge sq.merchant sq.transaction(in=in2);
  by merchantid;
  if not in2;
  keep merchantname merchantid type zip;
  label merchantname ='Merchant Name'
        merchantid='Merchant ID'
        type='Merchant Type'
        zip='Merchant Zipcode'
  ;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Autotuning Deep Learning Models Using SAS

Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 201 views
  • 2 likes
  • 2 in conversation