01-11-2018 10:38 AM
This is a pretty basic question, but I wanted some confirmation since I haven't been able to find a clear answer.
I have a database of users of a certain services we provide in a given year. Sometimes these users return for the same service or other services. I want to find how many distinct users I have of each service so I constructed this code:
create table SERVICES as
count(distinct user_id) as Num_Users
where year = 2017
group by service;
I want to know if this is counting unique users of each service or unique users from the total database. For example, if person A used Service1 and Service2 are they being counted in both service groupings or are they being counted as 1 unique user? Is this the correct interpretation of this code?
01-11-2018 10:44 AM
You wouldn't want to use a where clause with a group by. In SQL you would use a having clause with a group by:
group by service having year=2017;
Or you could sub-query:
from (select * from users where year=2017) group by service;
01-11-2018 10:47 AM
A very small data set to test with such as:
data users; input user_id Service; year=2017; datalines; 1 1 1 2 ; run;
should fairly easily answer this.
Note that the output of the Proc SQL for that example is 2 records with a count of one each. Since there are only two records input ...
01-11-2018 11:58 AM
01-12-2018 03:57 AM
I can't remember why exactly, must have had issues with it at some point in the past so now I totally avoid using where clauses on group by's and put them in subqueries or havings. It may just be habit from my side. Definately can't use use where for aggregates.