Solved: Count obervations by id

lulu3 · Posted 01-12-2021 02:06 PM

Hi expert,

I wanted to count the 'disease' variable but need to avoid misisng 'disease' and duplicate dates.

data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;

proc sort data=have;
by id date disease;
run;

output I need:

id	disease
1	2
2	3
4	1
5	1

I tried data step with first.id but it doesn't work.

Any advice would be greatly appreciated! Thank you!

Shmuel · Posted 01-12-2021 02:11 PM

proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;

  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;

View solution in original post

Shmuel · Posted 01-12-2021 02:11 PM

proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;

  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;

novinosrin · Posted 01-12-2021 02:12 PM



data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;

data want;
 set have;
 by id date notsorted;
 if first.id then count=0;
 if first.date then count+disease;
 if last.id and count;
run;

mkeintz · Posted 01-12-2021 03:54 PM

If the data are not sorted, then a recursive PROC FREQ might be the most efficient:

data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
run;

proc freq data=have noprint;
  table id*date / out=need;
  where disease=1;
run;
proc freq data=need noprint;
  table id / out=want (keep=id count);
run;

The first proc freq generates dataset NEED with one observation per ID*DATE, with variables ID DATE COUNT and PERCENT. The WHERE statement forces it to ignore non-disease observations. The only variable in NEED that you care about is ID. Just do a proc freq of ID from NEED to generate the number of unique disease dates for each id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

andreas_lds · Posted 01-13-2021 01:52 AM

Just to add another solution

proc sort data=have(where=(not missing(disease))) out=sorted nodupkey;
   by Id date;
run;

proc summary data=sorted nway;
   by Id;
   output out=want1(drop= _type_ rename=(_freq_=disease));
run;

Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Re: Count obervations by id

Registration is open

SAS Training: Just a Click Away