This is a pretty basic question, but I wanted some confirmation since I haven't been able to find a clear answer.
I have a database of users of a certain services we provide in a given year. Sometimes these users return for the same service or other services. I want to find how many distinct users I have of each service so I constructed this code:
proc sql;
create table SERVICES as
select
service,
count(distinct user_id) as Num_Users
from users
where year = 2017
group by service;
quit;
I want to know if this is counting unique users of each service or unique users from the total database. For example, if person A used Service1 and Service2 are they being counted in both service groupings or are they being counted as 1 unique user? Is this the correct interpretation of this code?
You wouldn't want to use a where clause with a group by. In SQL you would use a having clause with a group by:
group by service having year=2017;
Or you could sub-query:
from (select * from users where year=2017) group by service;
A very small data set to test with such as:
data users; input user_id Service; year=2017; datalines; 1 1 1 2 ; run;
should fairly easily answer this.
Note that the output of the Proc SQL for that example is 2 records with a count of one each. Since there are only two records input ...
I can't remember why exactly, must have had issues with it at some point in the past so now I totally avoid using where clauses on group by's and put them in subqueries or havings. It may just be habit from my side. Definately can't use use where for aggregates.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
