DATA Step, Macro, Functions and more

SQL join question

Reply
Frequent Contributor
Posts: 77

SQL join question

Suppose I have two datasets like below:

CUSTOMER dataset:

customer_id, last_name

001, Natha

002, Cooper

003, David

VISIT dataset:

customer_id, visit_date

001, 01-MAR-2011

001, 03-APR-2011

001, 15-MAY-2011

002, 21-FEB-2011

I want to use one PROC SQL to produce the visit date count for each customer during APR-2011. Here is my query:

proc sql;

  create table report as

  select a.customer_id, count(distinct b.visit_date) as visit_cnt

  from customer a left join visit b on a.customer_id = b.customer_id

  where b.visit_date between '01apr2011'd and '30apr2011'd

  group by a.customer_id

; quit;

This produces:

001, 1

But the output I want is:

001, 1

002, 0

003, 0

Any suggestion for using one PROC SQL to produce the desired result?

PROC Star
Posts: 7,363

SQL join question

I think that the following will provide what you want:

proc sql;

  create table report as

    select a.customer_id,

     max(count(distinct b.visit_date),0) as visit_cnt

       from customer a

         left join (select visit_date from visit b

           where b.visit_date between '01apr2011'd and '30apr2011'd)

             on a.customer_id = b.customer_id

               group by a.customer_id

;

quit;

SAS Employee
Posts: 104

SQL join question

This should do it for you:

proc sql;
   create table report as
   select c.customer_id
          , coalesce(Count,0) as visit_cnt
      from customer c
         left join 
        (SELECT customer_id, count(*) as Count
            from visit 
            where visit_date between '01apr2011'd and '30apr2011'd
            group by 1) v
      on c.customer_id=v.customer_id
; 
quit;
Super User
Posts: 9,676

Re: SQL join question

Haha.

It has been mentioned by Howles.You should use 'and' instead of 'where' statement which will filter the obs.

proc sql;

  create table report as

  select a.customer_id, count(distinct b.visit_date) as visit_cnt

  from customer a left join visit b on a.customer_id = b.customer_id

  and b.visit_date between '01apr2011'd and '30apr2011'd

  group by a.customer_id

; quit;

data customer;
input customer_id $ last_name $;
cards;
001 Natha
002 Cooper
003 David
;
run;
data VISIT  ;
input customer_id $ visit_date date12.;
format visit_date date9.;
cards;
001 01-MAR-2011
001 03-APR-2011
001 15-MAY-2011
002 21-FEB-2011
;
run; 

proc sql;
  create table report as
  select a.customer_id, count(distinct b.visit_date) as visit_cnt
  from customer a left join visit b on a.customer_id = b.customer_id
  and b.visit_date between '01apr2011'd and '30apr2011'd
  group by a.customer_id
; quit;

Ksharp

Frequent Contributor
Posts: 77

Re: SQL join question

Ksharp,

Thanks for pointing out this trick.

Actually should we say 'and' enables filtering on visit dataset while 'where' filers on the join results?

Would this query executed in ANSI SQL behave the same?

Frequent Contributor
Posts: 77

Re: SQL join question

Ksharp, Could you tell me which section of Howes' sqlbook tak about this trick?

http://www.howles.com/sqlbook/

PROC Star
Posts: 7,363

Re: SQL join question

Marc,

Howard mentions it in one of his posts to the forum.  Take a look at: http://communities.sas.com/message/57333#57333

PROC Star
Posts: 7,363

Re: SQL join question

I also should mention that the link you provided was only to the examples from Howard's book.  If you are going to be using SQL, I strongly recommend Howard's book.  He is a master at understanding how to simplify almost any proc sql code.  As for the other question you asked, other than the SAS dates, I would think that the code does indeed conform to ANSI SQL.

Super User
Posts: 9,676

Re: SQL join question

In the url Art.T gave ,There is already answered your question.

Where statement will filter the observation after left join,

'and' will filter the observations during left join ( but filtering will keep all the observation from left dataset, just as left join operation required).

So 'and' in left jion will keep all the observations from left dataset,but where statement will filter observations ,even if obs are from left dataset.

Ksharp

Ask a Question
Discussion stats
  • 8 replies
  • 226 views
  • 0 likes
  • 4 in conversation