I have following dataset "Have". I want to create a table "want" where the ID is only counted once in a category ClassStartQuarter. If the ClassStartQuarter is missing then the individual did not take a class and therefore do not count in any quarter. Proc freq will give duplicated count, is there other option to get a distinct count for each quarter?
Thank you.
| Dataset: Have | |
| ID | ClassStartQuarter |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 2 | 2017Q3 |
| 2 | 2017Q3 |
| 2 | 2017Q3 |
| 2 | 2017Q3 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 3 | 2018Q1 |
| 4 | 2018Q2 |
| 4 | 2018Q2 |
| 4 | 2018Q2 |
| 4 | 2018Q2 |
| 4 | 2018Q2 |
| 4 | 2018Q2 |
| 5 | 2015Q4 |
| 5 | 2017Q1 |
| 5 | 2017Q3 |
| 5 | 2018Q1 |
| 5 | 2015Q4 |
| 5 | 2017Q1 |
| 5 | 2017Q3 |
| 5 | 2018Q1 |
| 5 | 2015Q4 |
| 5 | 2017Q1 |
| 5 | 2017Q3 |
| 5 | 2018Q1 |
| 5 | 2015Q4 |
| 5 | 2017Q1 |
| 5 | 2017Q3 |
| 5 | 2018Q1 |
| 5 | 2015Q4 |
| 5 | 2017Q1 |
| 5 | 2017Q3 |
| 5 | 2018Q1 |
| 5 | 2015Q4 |
| 5 | 2017Q1 |
| 5 | 2017Q3 |
| 5 | 2018Q1 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 6 | 2017Q2 |
| 7 | 2017Q2 |
| 7 | 2017Q3 |
| 7 | 2018Q3 |
| 7 | 2017Q2 |
| 7 | 2017Q3 |
| 7 | 2018Q3 |
| 7 | 2017Q2 |
| 7 | 2017Q3 |
| 7 | 2018Q3 |
| 7 | 2017Q2 |
| 7 | 2017Q3 |
| 7 | 2018Q3 |
| 7 | 2017Q2 |
| 7 | 2017Q3 |
| 7 | 2018Q3 |
| 8 | |
| 8 | |
| 8 | |
| 8 | |
| 8 | |
| 8 | |
| 9 | 2016Q1 |
| 9 | 2017Q1 |
| 9 | 2017Q4 |
| 9 | 2016Q1 |
| 9 | 2017Q1 |
| 9 | 2017Q4 |
| 9 | 2016Q1 |
| 9 | 2017Q1 |
| 9 | 2017Q4 |
| 9 | 2016Q1 |
| 9 | 2017Q1 |
| 9 | 2017Q4 |
| 9 | 2016Q1 |
| 9 | 2017Q1 |
| 9 | 2017Q4 |
| 9 | 2016Q1 |
| 9 | 2017Q1 |
| 9 | 2017Q4 |
| 10 | 2017Q4 |
| 10 | 2017Q4 |
| 10 | 2018Q2 |
| 10 | 2018Q2 |
| 10 | 2017Q4 |
| 10 | 2017Q4 |
| 10 | 2018Q2 |
| 10 | 2018Q2 |
| 11 | 2017Q4 |
| 11 | 2017Q4 |
| 11 | 2017Q4 |
| 11 | 2017Q4 |
| 12 | 2014Q3 |
| 12 | 2017Q1 |
| 12 | 2014Q3 |
| 12 | 2017Q1 |
| 12 | 2014Q3 |
| 12 | 2017Q1 |
| 12 | 2014Q3 |
| 12 | 2017Q1 |
| Table: Want (This is just an example) | |
| Quarter | No.of Distinct ID |
| 2013Q1 | 0 |
| 2013Q2 | 0 |
| 2013Q3 | 12 |
| 2013Q4 | 4 |
| 2014Q1 | 4 |
| 2014Q2 | 3 |
| 2014Q3 | 16 |
| 2014Q4 | 28 |
| 2015Q1 | 20 |
| 2015Q2 | 21 |
| 2015Q3 | 0 |
| 2015Q4 | 15 |
| 2016Q1 | 28 |
| 2016Q2 | 29 |
| 2016Q3 | 0 |
| 2016Q4 | 0 |
| 2017Q1 | 28 |
| 2017Q2 | 27 |
| 2017Q3 | 60 |
| 2017Q4 | 25 |
| 2018Q1 | 27 |
| 2018Q2 | 33 |
| 2018Q3 | 13 |
| 2018Q4 | 13 |
This might work for you. It counts the Occurrences and the number if ID(subjects) per Quarter. You may not need the IDGROUP output.
data class;
infile cards missover firstobs=2;
input id Quarter:yyq.;
format quarter yyq.;
cards;
ID ClassStartQuarter
1
1
1
1
1
2 2017Q3
2 2017Q3
2 2017Q3
2 2017Q3
3 2018Q1
3 2018Q1
3 2018Q1
3 2018Q1
3 2018Q1
3 2018Q1
3 2018Q1
3 2018Q1
4 2018Q2
4 2018Q2
4 2018Q2
4 2018Q2
4 2018Q2
4 2018Q2
5 2015Q4
5 2017Q1
5 2017Q3
5 2018Q1
5 2015Q4
5 2017Q1
5 2017Q3
5 2018Q1
6 2017Q3
6 2017Q3
6 2018Q1
6 2018Q1
6 2018Q1
6 2018Q1
6 2018Q1
6 2018Q1
6 2018Q1
6 2018Q1
;;;;
run;
proc print;
run;
proc summary nway data=class missing;
class id quarter;
output out=u1(drop=_type_ rename=_freq_=occ);
run;
proc print;
run;
data quarter;
do year=2015 to 2018;
do q=1 to 4;
Quarter=yyq(year,q);
output;
end;
end;
keep quarter;
format quarter yyq.;
run;
proc summary nway missing data=u1 classdata=quarter;
class quarter;
output out=u2(drop=_type_) sum(occ)= idgroup(out[5](id)=);
run;
proc print label;
label _freq_='No.of Distinct ID';
run;
PROC FREQ can do this. Start by getting the (unsatisfactory) duplicated counts:
proc freq data=have;
where ClassStartQuarter > ' ';
tables id * ClassStartQuarter / noprint out=unique;
run;
These counts contain duplicates, but the output also contains just one observation per ID. Just count the result:
proc freq data=unique;
tables ClassStartQuarter;
run;
proc sql;
create table want as
select Quarter,count(distinct id) as n_distinct_id
from class
group by Quarter;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.