Suppose I have two datasets like below:
CUSTOMER dataset:
customer_id, last_name
001, Natha
002, Cooper
003, David
VISIT dataset:
customer_id, visit_date
001, 01-MAR-2011
001, 03-APR-2011
001, 15-MAY-2011
002, 21-FEB-2011
I want to use one PROC SQL to produce the visit date count for each customer during APR-2011. Here is my query:
proc sql;
create table report as
select a.customer_id, count(distinct b.visit_date) as visit_cnt
from customer a left join visit b on a.customer_id = b.customer_id
where b.visit_date between '01apr2011'd and '30apr2011'd
group by a.customer_id
; quit;
This produces:
001, 1
But the output I want is:
001, 1
002, 0
003, 0
Any suggestion for using one PROC SQL to produce the desired result?
I think that the following will provide what you want:
proc sql;
create table report as
select a.customer_id,
max(count(distinct b.visit_date),0) as visit_cnt
from customer a
left join (select visit_date from visit b
where b.visit_date between '01apr2011'd and '30apr2011'd)
on a.customer_id = b.customer_id
group by a.customer_id
;
quit;
This should do it for you:
proc sql; create table report as select c.customer_id , coalesce(Count,0) as visit_cnt from customer c left join (SELECT customer_id, count(*) as Count from visit where visit_date between '01apr2011'd and '30apr2011'd group by 1) v on c.customer_id=v.customer_id ; quit;
Haha.
It has been mentioned by Howles.You should use 'and' instead of 'where' statement which will filter the obs.
proc sql;
create table report as
select a.customer_id, count(distinct b.visit_date) as visit_cnt
from customer a left join visit b on a.customer_id = b.customer_id
and b.visit_date between '01apr2011'd and '30apr2011'd
group by a.customer_id
; quit;
data customer; input customer_id $ last_name $; cards; 001 Natha 002 Cooper 003 David ; run; data VISIT ; input customer_id $ visit_date date12.; format visit_date date9.; cards; 001 01-MAR-2011 001 03-APR-2011 001 15-MAY-2011 002 21-FEB-2011 ; run; proc sql; create table report as select a.customer_id, count(distinct b.visit_date) as visit_cnt from customer a left join visit b on a.customer_id = b.customer_id and b.visit_date between '01apr2011'd and '30apr2011'd group by a.customer_id ; quit;
Ksharp
Ksharp,
Thanks for pointing out this trick.
Actually should we say 'and' enables filtering on visit dataset while 'where' filers on the join results?
Would this query executed in ANSI SQL behave the same?
Ksharp, Could you tell me which section of Howes' sqlbook tak about this trick?
Marc,
Howard mentions it in one of his posts to the forum. Take a look at: http://communities.sas.com/message/57333#57333
I also should mention that the link you provided was only to the examples from Howard's book. If you are going to be using SQL, I strongly recommend Howard's book. He is a master at understanding how to simplify almost any proc sql code. As for the other question you asked, other than the SAS dates, I would think that the code does indeed conform to ANSI SQL.
In the url Art.T gave ,There is already answered your question.
Where statement will filter the observation after left join,
'and' will filter the observations during left join ( but filtering will keep all the observation from left dataset, just as left join operation required).
So 'and' in left jion will keep all the observations from left dataset,but where statement will filter observations ,even if obs are from left dataset.
Ksharp
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.