Help using Base SAS procedures

How PROC SQL uses DISTINCT and GROUP by

Reply
New Contributor
Posts: 3

How PROC SQL uses DISTINCT and GROUP by

This is a pretty basic question, but I wanted some confirmation since I haven't been able to find a clear answer. 

 

I have a database of users of a certain services we provide in a given year. Sometimes these users return for the same service or other services. I want to find how many distinct users I have of each service so I constructed this code:

 

proc sql;
create table SERVICES as
select
service,
count(distinct user_id) as Num_Users
from users
where year = 2017
group by service;
quit;

 

I want to know if this is counting unique users of each service or unique users from the total database. For example, if person A used Service1 and Service2 are they being counted in both service groupings or are they being counted as 1 unique user? Is this the correct interpretation of this code? 

Super User
Super User
Posts: 9,599

Re: How PROC SQL uses DISTINCT and GROUP by

Posted in reply to KellyJade

You wouldn't want to use a where clause with a group by.  In SQL you would use a having clause with a group by:

group by service having year=2017;

Or you could sub-query:

from (select * from users where year=2017)
group by service;
Super User
Posts: 13,583

Re: How PROC SQL uses DISTINCT and GROUP by

Posted in reply to KellyJade

A very small data set to test with such as:

data users;
   input user_id Service;
   year=2017;
datalines;
1  1
1  2
;
run;

should fairly easily answer this.

 

Note that the output of the Proc SQL for that example is 2 records with a count of one each. Since there are only two records input ...

Super User
Posts: 5,884

Re: How PROC SQL uses DISTINCT and GROUP by

Posted in reply to KellyJade
Sorry @RW9 I'm not sure I'm following you, why can't I filter the source data with a WHERE clause like this? Beside the logical part of it, HAVING will never use indexes for optimization.
I pretty much use HAVING only for filtering on aggregations.
Data never sleeps
Super User
Super User
Posts: 9,599

Re: How PROC SQL uses DISTINCT and GROUP by

I can't remember why exactly, must have had issues with it at some point in the past so now I totally avoid using where clauses on group by's and put them in subqueries or havings.  It may just be habit from my side.  Definately can't use use where for aggregates.

Ask a Question
Discussion stats
  • 4 replies
  • 181 views
  • 0 likes
  • 4 in conversation