This actually is not a question, but I was a little bit confused and frustrated by the very different thinking and coding habits people have. The practice question is as follows:
The question asks to combine two table using left join, and there are some entries exist in one table but not in the other, and these (i.e., merchants with no transactions) are what should be in the result table.
I tried twice using left join and did not come up with the correct answer. Then I tried except and still got no correct answer. And then I tried where var not in (subquery) and it works and got the same result as the above screen capture. My code is as follows:
proc sql;
select distinct merchantname
label='Merchant Name',
merchantid
label='Merchant ID',
type
label='Merchant Type',
zip
label='Merchant Zipcode'
from sq.merchant
where zip=10001 and
merchantid not in
(select merchantid
from sq.transaction)
order by 2;
quit;
Then I was thinking how does the answer using left join to create this result. The code in the answer is as follows, and it creates the correct result (as shown in last screen capture):
What makes me a little bit confuse and frustrated is: using left join in this situation. If the question did not ask to use left join, I would probably choose where var not in (subquery) first, because thinking this way is much easier for me.
And I have two questions on using left join in the above code: (1) the query does not select any column from one of the table (i.e., table b, sq.transaction), I guess this is not the common practice of using left/right/inner join? Are we suppose to select columns from all tables when using left/right/inner join? (2) where b.merchantid is null;, this part is bit difficult to understand, also I do not know how this part work, because table b (i.e., the sq.transaction table) contains and only contains merchantid that HAS transactions, also for the merchantid column, there is no missing(or null) value, so how does SAS process where b.merchantid is null;?
Anyone did this question can offer some ideas? Thanks a lot!
When you do a LEFT join it means that all observations from the LEFT table will make it into the intermediate results. And when there is an observation that did not match any observation from the RIGHT table then all of the variables contributed by the RIGHT table will have a missing value. That is why the WHERE condition works.
Personally I find your answer much easier to follow. But if you need multiple variables to perform the join then you cannot use the IN operator.
When you do a LEFT join it means that all observations from the LEFT table will make it into the intermediate results. And when there is an observation that did not match any observation from the RIGHT table then all of the variables contributed by the RIGHT table will have a missing value. That is why the WHERE condition works.
Personally I find your answer much easier to follow. But if you need multiple variables to perform the join then you cannot use the IN operator.
Note it is much easier to do in normal SAS instead of SQL.
data want;
merge sq.merchant sq.transaction(in=in2);
by merchantid;
if not in2;
keep merchantname merchantid type zip;
label merchantname ='Merchant Name'
merchantid='Merchant ID'
type='Merchant Type'
zip='Merchant Zipcode'
;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.
Find more tutorials on the SAS Users YouTube channel.