DATA Step, Macro, Functions and more

A simple SQL mistake

Accepted Solution Solved
Reply
Respected Advisor
Posts: 4,920
Accepted Solution

A simple SQL mistake

Here is a mistake I make too often, how would you fix it :

data cust;
length name $12;
input id name & @@;
datalines;
1 Joe Who  2 Joan Doe  3 Ben Hur  4 Lucy Brown
;

data buy;
length item $12;
input id item nb & @@;
datalines;
1 coat 1  1 scarf 2  1 gloves 1
2 socks 6
3 gloves 1  3 socks 6
4 coat 1
;

/* Intent: get a list of all customer IDs with the number of gloves they bought */
/* Why this left join doesn't work? */
proc sql;
select c.id, c.name, sum(b.nb) as nbGloves
from cust as c left join buy as b on c.id=b.id
where b.item="gloves"
group by c.id, c.name;
quit;

/* Expected answer :
          id  name          nbGloves
    --------------------------------
           1  Joe Who              1
           2  Joan Doe             0
           3  Ben Hur              1
           4  Lucy Brown           0

*/

/* How can that query be fixed? */

PG

PG

Accepted Solutions
Solution
‎12-30-2012 10:25 AM
Super User
Super User
Posts: 7,039

Re: A simple SQL mistake

The issue is the use of the WHERE condition in addition to the ON condition.  This is removing the people without gloves even though you used a LEFT JOIN.  You can change the WHERE to an AND so that it becomes part of the condition used to find the matching records. If you just do this then the people without gloves will have missing values for the new nbGloves variable.  You can fix that by using the COALESCE function.

proc sql;

  select c.id

       , c.name

       , coalesce(sum(b.nb),0) as nbGloves

  from cust as c

       left join

       buy as b

  on c.id=b.id

  and b.item="gloves"

  group by c.id, c.name

  ;

quit;

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: A simple SQL mistake

PG, I am no where near SQL expert, what I am going to comment could be far off, so you are warned. From what I see, the 'where' clause is the game changer, even your intention is to limit the input from 'buy', it mutates the left join into an inner join. And I don't think you will get a '0' when actually it is missing. (update: unless you tell SQL to give you one)

proc sql;

select c.id, c.name, case when not missing (sum(b.nb)) then sum(b.nb) else 0 end as nbGloves

from cust as c left join buy as b on c.id=b.id

and b.item="gloves"

group by c.id, c.name;

quit;

Haikuo

Contributor
Posts: 22

Re: A simple SQL mistake

Hi PG,

You can try this code. It's bit long, but it creates your desired dataset.

proc sql;

    create table nb_gloves as

       select  a.*,b.item,b.nb from

            cust as a left join buy as b

                on a.id=b.id;

quit;

proc summary data=nb_gloves nway missing;

class id name item;

var nb;

output out=aa (drop=_Smiley Happy sum(nb)=nb_gloves;

run;

data aa1(drop=item);

set aa;

if item ^='gloves' then delete;

run;

proc sql;

    create table ab as

        select a.*,b.nb_gloves from

            cust as a left join aa1 as b

             on a.id=b.id and a.name=b.name;

quit;

Solution
‎12-30-2012 10:25 AM
Super User
Super User
Posts: 7,039

Re: A simple SQL mistake

The issue is the use of the WHERE condition in addition to the ON condition.  This is removing the people without gloves even though you used a LEFT JOIN.  You can change the WHERE to an AND so that it becomes part of the condition used to find the matching records. If you just do this then the people without gloves will have missing values for the new nbGloves variable.  You can fix that by using the COALESCE function.

proc sql;

  select c.id

       , c.name

       , coalesce(sum(b.nb),0) as nbGloves

  from cust as c

       left join

       buy as b

  on c.id=b.id

  and b.item="gloves"

  group by c.id, c.name

  ;

quit;

Respected Advisor
Posts: 4,920

Re: A simple SQL mistake

Thanks .  I hadn't even tried this alternative. I had forgotten that WHERE conditions are supported by SAS in joins. They are not supported everywhere, for instance in MS-ACCESS-SQL. - PG

PG
PROC Star
Posts: 1,167

Re: A simple SQL mistake

If you do this query, you can see where the problem arises:

select c.*, b.*
from cust as c left join buy as b on c.id=b.id
/* where b.item="gloves" */
/* group by c.id, c.name */;

The left join is working correctly, all of your records from cust are in the result. But the where clause is then throwing them out.

One way to fix it is to summarize on buy first, and then implement the left join:

proc sql;
select c.id, c.name, nbGloves
  from cust as c left join
   (select b.id as subid, sum(b.nb) as nbGloves
    from buy as b
     where b.item="gloves"
      group by b.id) on c.id = subid
;
quit;

While I was working on this, Tom posted his much better solution, but I'll post this anyway.

Other Tom

Respected Advisor
Posts: 4,920

Re: A simple SQL mistake

Thank you , a subquery is also the workaround that I used :

proc sql;

select c.id, c.name, coalesce(sum(b.nb), 0) as nbGloves

from cust as c left join

     (select * from buy where item="gloves") as b on c.id=b.id

group by c.id, c.name;

quit;

but I agree with you that Tom's solution seems better.

PG

PG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 506 views
  • 3 likes
  • 5 in conversation