BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PGStats
Opal | Level 21

Here is a mistake I make too often, how would you fix it :

data cust;
length name $12;
input id name & @@;
datalines;
1 Joe Who  2 Joan Doe  3 Ben Hur  4 Lucy Brown
;

data buy;
length item $12;
input id item nb & @@;
datalines;
1 coat 1  1 scarf 2  1 gloves 1
2 socks 6
3 gloves 1  3 socks 6
4 coat 1
;

/* Intent: get a list of all customer IDs with the number of gloves they bought */
/* Why this left join doesn't work? */
proc sql;
select c.id, c.name, sum(b.nb) as nbGloves
from cust as c left join buy as b on c.id=b.id
where b.item="gloves"
group by c.id, c.name;
quit;

/* Expected answer :
          id  name          nbGloves
    --------------------------------
           1  Joe Who              1
           2  Joan Doe             0
           3  Ben Hur              1
           4  Lucy Brown           0

*/

/* How can that query be fixed? */

PG

PG
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The issue is the use of the WHERE condition in addition to the ON condition.  This is removing the people without gloves even though you used a LEFT JOIN.  You can change the WHERE to an AND so that it becomes part of the condition used to find the matching records. If you just do this then the people without gloves will have missing values for the new nbGloves variable.  You can fix that by using the COALESCE function.

proc sql;

  select c.id

       , c.name

       , coalesce(sum(b.nb),0) as nbGloves

  from cust as c

       left join

       buy as b

  on c.id=b.id

  and b.item="gloves"

  group by c.id, c.name

  ;

quit;

View solution in original post

6 REPLIES 6
Haikuo
Onyx | Level 15

PG, I am no where near SQL expert, what I am going to comment could be far off, so you are warned. From what I see, the 'where' clause is the game changer, even your intention is to limit the input from 'buy', it mutates the left join into an inner join. And I don't think you will get a '0' when actually it is missing. (update: unless you tell SQL to give you one)

proc sql;

select c.id, c.name, case when not missing (sum(b.nb)) then sum(b.nb) else 0 end as nbGloves

from cust as c left join buy as b on c.id=b.id

and b.item="gloves"

group by c.id, c.name;

quit;

Haikuo

newbie_ari
Fluorite | Level 6

Hi PG,

You can try this code. It's bit long, but it creates your desired dataset.

proc sql;

    create table nb_gloves as

       select  a.*,b.item,b.nb from

            cust as a left join buy as b

                on a.id=b.id;

quit;

proc summary data=nb_gloves nway missing;

class id name item;

var nb;

output out=aa (drop=_:) sum(nb)=nb_gloves;

run;

data aa1(drop=item);

set aa;

if item ^='gloves' then delete;

run;

proc sql;

    create table ab as

        select a.*,b.nb_gloves from

            cust as a left join aa1 as b

             on a.id=b.id and a.name=b.name;

quit;

Tom
Super User Tom
Super User

The issue is the use of the WHERE condition in addition to the ON condition.  This is removing the people without gloves even though you used a LEFT JOIN.  You can change the WHERE to an AND so that it becomes part of the condition used to find the matching records. If you just do this then the people without gloves will have missing values for the new nbGloves variable.  You can fix that by using the COALESCE function.

proc sql;

  select c.id

       , c.name

       , coalesce(sum(b.nb),0) as nbGloves

  from cust as c

       left join

       buy as b

  on c.id=b.id

  and b.item="gloves"

  group by c.id, c.name

  ;

quit;

PGStats
Opal | Level 21

Thanks .  I hadn't even tried this alternative. I had forgotten that WHERE conditions are supported by SAS in joins. They are not supported everywhere, for instance in MS-ACCESS-SQL. - PG

PG
TomKari
Onyx | Level 15

If you do this query, you can see where the problem arises:

select c.*, b.*
from cust as c left join buy as b on c.id=b.id
/* where b.item="gloves" */
/* group by c.id, c.name */;

The left join is working correctly, all of your records from cust are in the result. But the where clause is then throwing them out.

One way to fix it is to summarize on buy first, and then implement the left join:

proc sql;
select c.id, c.name, nbGloves
  from cust as c left join
   (select b.id as subid, sum(b.nb) as nbGloves
    from buy as b
     where b.item="gloves"
      group by b.id) on c.id = subid
;
quit;

While I was working on this, Tom posted his much better solution, but I'll post this anyway.

Other Tom

PGStats
Opal | Level 21

Thank you , a subquery is also the workaround that I used :

proc sql;

select c.id, c.name, coalesce(sum(b.nb), 0) as nbGloves

from cust as c left join

     (select * from buy where item="gloves") as b on c.id=b.id

group by c.id, c.name;

quit;

but I agree with you that Tom's solution seems better.

PG

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1620 views
  • 3 likes
  • 5 in conversation