Re: Count ID once in a Category

d0816 · Posted 11-20-2018 03:56 PM

I have following dataset "Have". I want to create a table "want" where the ID is only counted once in a category ClassStartQuarter. If the ClassStartQuarter is missing then the individual did not take a class and therefore do not count in any quarter. Proc freq will give duplicated count, is there other option to get a distinct count for each quarter?

Thank you.

Dataset: Have
ID	ClassStartQuarter
1
1
1
1
1
2	2017Q3
2	2017Q3
2	2017Q3
2	2017Q3
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
4	2018Q2
4	2018Q2
4	2018Q2
4	2018Q2
4	2018Q2
4	2018Q2
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
6	2017Q2
7	2017Q2
7	2017Q3
7	2018Q3
7	2017Q2
7	2017Q3
7	2018Q3
7	2017Q2
7	2017Q3
7	2018Q3
7	2017Q2
7	2017Q3
7	2018Q3
7	2017Q2
7	2017Q3
7	2018Q3
8
8
8
8
8
8
9	2016Q1
9	2017Q1
9	2017Q4
9	2016Q1
9	2017Q1
9	2017Q4
9	2016Q1
9	2017Q1
9	2017Q4
9	2016Q1
9	2017Q1
9	2017Q4
9	2016Q1
9	2017Q1
9	2017Q4
9	2016Q1
9	2017Q1
9	2017Q4
10	2017Q4
10	2017Q4
10	2018Q2
10	2018Q2
10	2017Q4
10	2017Q4
10	2018Q2
10	2018Q2
11	2017Q4
11	2017Q4
11	2017Q4
11	2017Q4
12	2014Q3
12	2017Q1
12	2014Q3
12	2017Q1
12	2014Q3
12	2017Q1
12	2014Q3
12	2017Q1

Table: Want (This is just an example)
Quarter	No.of Distinct ID
2013Q1	0
2013Q2	0
2013Q3	12
2013Q4	4
2014Q1	4
2014Q2	3
2014Q3	16
2014Q4	28
2015Q1	20
2015Q2	21
2015Q3	0
2015Q4	15
2016Q1	28
2016Q2	29
2016Q3	0
2016Q4	0
2017Q1	28
2017Q2	27
2017Q3	60
2017Q4	25
2018Q1	27
2018Q2	33
2018Q3	13
2018Q4	13

data_null__ · Posted 11-20-2018 05:20 PM

This might work for you. It counts the Occurrences and the number if ID(subjects) per Quarter. You may not need the IDGROUP output.

data class;
   infile cards missover firstobs=2;
   input id Quarter:yyq.;
   format quarter yyq.;
   cards;
ID	ClassStartQuarter
1	
1	
1	
1	
1	
2	2017Q3
2	2017Q3
2	2017Q3
2	2017Q3
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
3	2018Q1
4	2018Q2
4	2018Q2
4	2018Q2
4	2018Q2
4	2018Q2
4	2018Q2
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
5	2015Q4
5	2017Q1
5	2017Q3
5	2018Q1
6	2017Q3
6	2017Q3
6	2018Q1
6	2018Q1
6	2018Q1
6	2018Q1
6	2018Q1
6	2018Q1
6	2018Q1
6	2018Q1
;;;;
   run;
proc print;
   run;
proc summary nway data=class missing;
   class id quarter;
   output out=u1(drop=_type_ rename=_freq_=occ);
   run;
proc print;
   run;
data quarter;
   do year=2015 to 2018;
      do q=1 to 4;
         Quarter=yyq(year,q);
         output;
         end;
      end;
   keep quarter;
   format quarter yyq.;
   run;

proc summary nway missing data=u1 classdata=quarter;
   class quarter;
   output out=u2(drop=_type_) sum(occ)= idgroup(out[5](id)=);
   run;
proc print label;
   label _freq_='No.of Distinct ID';
   run;

Astounding · Posted 11-20-2018 05:46 PM

PROC FREQ can do this. Start by getting the (unsatisfactory) duplicated counts:

proc freq data=have;

where ClassStartQuarter > ' ';

tables id * ClassStartQuarter / noprint out=unique;

run;

These counts contain duplicates, but the output also contains just one observation per ID. Just count the result:

proc freq data=unique;

tables ClassStartQuarter;

run;

Ksharp · Posted 11-21-2018 07:40 AM


proc sql;
create table want as
 select Quarter,count(distinct id) as n_distinct_id
  from class
   group by Quarter;
quit;

Count ID once in a Category