Here is a mistake I make too often, how would you fix it :
data cust;
length name $12;
input id name & @@;
datalines;
1 Joe Who 2 Joan Doe 3 Ben Hur 4 Lucy Brown
;
data buy;
length item $12;
input id item nb & @@;
datalines;
1 coat 1 1 scarf 2 1 gloves 1
2 socks 6
3 gloves 1 3 socks 6
4 coat 1
;
/* Intent: get a list of all customer IDs with the number of gloves they bought */
/* Why this left join doesn't work? */
proc sql;
select c.id, c.name, sum(b.nb) as nbGloves
from cust as c left join buy as b on c.id=b.id
where b.item="gloves"
group by c.id, c.name;
quit;
/* Expected answer :
id name nbGloves
--------------------------------
1 Joe Who 1
2 Joan Doe 0
3 Ben Hur 1
4 Lucy Brown 0
*/
/* How can that query be fixed? */
PG
The issue is the use of the WHERE condition in addition to the ON condition. This is removing the people without gloves even though you used a LEFT JOIN. You can change the WHERE to an AND so that it becomes part of the condition used to find the matching records. If you just do this then the people without gloves will have missing values for the new nbGloves variable. You can fix that by using the COALESCE function.
proc sql;
select c.id
, c.name
, coalesce(sum(b.nb),0) as nbGloves
from cust as c
left join
buy as b
on c.id=b.id
and b.item="gloves"
group by c.id, c.name
;
quit;
PG, I am no where near SQL expert, what I am going to comment could be far off, so you are warned. From what I see, the 'where' clause is the game changer, even your intention is to limit the input from 'buy', it mutates the left join into an inner join. And I don't think you will get a '0' when actually it is missing. (update: unless you tell SQL to give you one)
proc sql;
select c.id, c.name, case when not missing (sum(b.nb)) then sum(b.nb) else 0 end as nbGloves
from cust as c left join buy as b on c.id=b.id
and b.item="gloves"
group by c.id, c.name;
quit;
Haikuo
Hi PG,
You can try this code. It's bit long, but it creates your desired dataset.
proc sql;
create table nb_gloves as
select a.*,b.item,b.nb from
cust as a left join buy as b
on a.id=b.id;
quit;
proc summary data=nb_gloves nway missing;
class id name item;
var nb;
output out=aa (drop=_:) sum(nb)=nb_gloves;
run;
data aa1(drop=item);
set aa;
if item ^='gloves' then delete;
run;
proc sql;
create table ab as
select a.*,b.nb_gloves from
cust as a left join aa1 as b
on a.id=b.id and a.name=b.name;
quit;
The issue is the use of the WHERE condition in addition to the ON condition. This is removing the people without gloves even though you used a LEFT JOIN. You can change the WHERE to an AND so that it becomes part of the condition used to find the matching records. If you just do this then the people without gloves will have missing values for the new nbGloves variable. You can fix that by using the COALESCE function.
proc sql;
select c.id
, c.name
, coalesce(sum(b.nb),0) as nbGloves
from cust as c
left join
buy as b
on c.id=b.id
and b.item="gloves"
group by c.id, c.name
;
quit;
If you do this query, you can see where the problem arises:
select c.*, b.*
from cust as c left join buy as b on c.id=b.id
/* where b.item="gloves" */
/* group by c.id, c.name */;
The left join is working correctly, all of your records from cust are in the result. But the where clause is then throwing them out.
One way to fix it is to summarize on buy first, and then implement the left join:
proc sql;
select c.id, c.name, nbGloves
from cust as c left join
(select b.id as subid, sum(b.nb) as nbGloves
from buy as b
where b.item="gloves"
group by b.id) on c.id = subid
;
quit;
While I was working on this, Tom posted his much better solution, but I'll post this anyway.
Other Tom
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.