topic Re: Count obervations by id in SAS Programming

Count obervations by id

lulu3 — Tue, 12 Jan 2021 19:06:08 GMT

Hi expert,

I wanted to count the 'disease' variable but need to avoid misisng 'disease' and duplicate dates.

data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;

proc sort data=have;
by id date disease;
run;

output I need:

id	disease
1	2
2	3
4	1
5	1

I tried data step with first.id but it doesn't work.

Any advice would be greatly appreciated! Thank you!

Re: Count obervations by id

Shmuel — Tue, 12 Jan 2021 19:19:05 GMT

proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;

  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;

Re: Count obervations by id

novinosrin — Tue, 12 Jan 2021 19:12:15 GMT



data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;

data want;
 set have;
 by id date notsorted;
 if first.id then count=0;
 if first.date then count+disease;
 if last.id and count;
run;

Re: Count obervations by id

mkeintz — Wed, 13 Jan 2021 05:35:13 GMT

If the data are not sorted, then a recursive PROC FREQ might be the most efficient:

data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
run;

proc freq data=have noprint;
  table id*date / out=need;
  where disease=1;
run;
proc freq data=need noprint;
  table id / out=want (keep=id count);
run;

The first proc freq generates dataset NEED with one observation per ID*DATE, with variables ID DATE COUNT and PERCENT. The WHERE statement forces it to ignore non-disease observations. The only variable in NEED that you care about is ID. Just do a proc freq of ID from NEED to generate the number of unique disease dates for each id.

Re: Count obervations by id

andreas_lds — Wed, 13 Jan 2021 06:52:57 GMT

Just to add another solution

proc sort data=have(where=(not missing(disease))) out=sorted nodupkey;
   by Id date;
run;

proc summary data=sorted nway;
   by Id;
   output out=want1(drop= _type_ rename=(_freq_=disease));
run;