What does "unique" do here for the count function alongside group by s...

newbatprocsql1 · Posted 08-10-2023 11:02 AM

I'm pretty new to proc sql, but I was trying to count the number of people. Each unique person can have multiple rows (multiple application submissions). An example of a table below is:

Person_ID	Category	Ref_type	Total_amt
100	Green	2	350
100	Blue	2	300
100	Red	3	100
200	Green	1	20
200	Black	3	500
300	Blue	2	200

I want to count the number of people in each Category*Ref_type, so it'll be

Category	Ref_type	No_ppl
Green	1	1
Green	2	2
Blue	2	2
Red	3	1
Black	3	1

I'm not at home right now so I can't check but I remember that I tried doing

proc sql;
create table want as
select Category
,Ref_type
,count(unique Person_ID)
from have 
group by Category, Ref_type
;quit;

which gave me not exactly what I wanted (note that this was for a large dataset, probably 2 million rows). When I removed the "unique" keyword, I got what I wanted.

Can someone explain to me what the code with the "unique" when I'm using it with the count function, and a group by statement, and also what the code without the "unique" function does when I'm counting a specific variable that's not in the group by statement?

JosvanderVelden · Posted 08-10-2023 11:08 AM

Have a look at the solution for this track: https://communities.sas.com/t5/SAS-Procedures/unique-function-in-proc-sql/td-p/251359

Tom · Posted 08-10-2023 11:17 AM

How many people do you think you have in your fist listing?

Person_ID Category Ref_type Total_amt

100 Green 2 350

100 Blue 2 300

100 Red 3 100

200 Green 1 20

200 Black 3 500

300 Blue 2 200

I see 6 observations for 3 people.

COUNT(*) will count how many observations.

COUNT(PERSON_ID) will count how many observations have a non-missing value of PERSON_ID.

COUNT(DISTINCT PERSON_ID) will count how many people.

newbatprocsql1 · Posted 08-10-2023 11:25 AM

That makes sense - so if PERSON_ID is always not missing then count(*) = count(person_id)? Also, how does that work with the group by statement? I see @JosvanderVelden 's reply, but the post they linked only says there's a caveat with the aggregate function, but doesn't say what the caveat is.

Tom · Posted 08-10-2023 11:47 AM

GROUP BY separates the observations into groups based on the values of the variables listed. When you have GROUP BY the aggregate functions, like COUNT(),MAX(),SUM(), etc operate over each group separately.

Also when you have GROUP BY you need to take care what columns (variables) you list in the SELECT clause.

data have;
  input Person_ID $ Category $ Ref_type $ Total_amt;
cards;
100 Green 2 350
100 Blue 2 300
100 Red 3 100
200 Green 1 20
200 Black 3 500
300 Blue 2 200
;

proc sql ;
 select category,ref_type
      , count(*) as nobs
      , count(person_id) as n_with_ids
      , count(person_id) as n_people
      , sum(total_amt) as sum_total_amt
   from have
   group by category,ref_type
 ;
quit;

Result

                                                      sum_total_
Category  Ref_type      nobs  n_with_ids  n_people           amt
----------------------------------------------------------------
Black     3                1           1         1           500
Blue      2                2           2         2           500
Green     1                1           1         1            20
Green     2                1           1         1           350
Red       3                1           1         1           100

But if you include a variable that is neither one of the grouping variables nor an aggregate result then SAS will re-merge the computed aggregate values back with all of the observations selected in that group. It will tell you this in the LOG.

1178  proc sql ;
1179   select category,ref_type,total_amt
1180        , count(*) as nobs
1181        , count(person_id) as n_with_ids
1182        , count(person_id) as n_people
1183        , sum(total_amt) as sum_total_amt
1184     from have
1185     group by category,ref_type
1186   ;
NOTE: The query requires remerging summary statistics back with the original data.
1187  quit;

Result

                                                                 sum_total_
Category  Ref_type  Total_amt      nobs  n_with_ids  n_people           amt
---------------------------------------------------------------------------
Black     3               500         1           1         1           500
Blue      2               200         2           2         2           500
Blue      2               300         2           2         2           500
Green     1                20         1           1         1            20
Green     2               350         1           1         1           350
Red       3               100         1           1         1           100

newbatprocsql1 · Posted 08-10-2023 11:55 AM

Thanks! That's really helpful. I think now I'm just not understanding what the difference between using
count(*), count(unique person_id) and count(person_id) is when using that "group by category, ref_type". It seems like it will always give the same result (but I know that count(unique person_id) and count(person_id) would give different results in general, I just can't see why, since we are using the group by statement).

newbatprocsql1 · Posted 08-10-2023 11:56 AM

I say that will give different results in general, because I remember taking away the "unique" keyword and it gave different results. Also, did you mean to type count(person_id) twice in the proc sql statements?

Tom · Posted 08-10-2023 12:00 PM

No. Second one should have the DISTINCT keyword.

They will yield the same count when the three variable combination of PERSON_ID, CATEGORY, and REF_TYPE uniquely identify the observation.

What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

Re: What does "unique" do here for the count function alongside group by statement?

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away