Hi expert,
I wanted to count the 'disease' variable but need to avoid misisng 'disease' and duplicate dates.
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;
proc sort data=have;
by id date disease;
run;
output I need:
| id | disease | 
| 1 | 2 | 
| 2 | 3 | 
| 4 | 1 | 
| 5 | 1 | 
I tried data step with first.id but it doesn't work.
Any advice would be greatly appreciated! Thank you!
proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;
  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;
proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;
  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;
data want;
 set have;
 by id date notsorted;
 if first.id then count=0;
 if first.date then count+disease;
 if last.id and count;
run;
If the data are not sorted, then a recursive PROC FREQ might be the most efficient:
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
run;
proc freq data=have noprint;
  table id*date / out=need;
  where disease=1;
run;
proc freq data=need noprint;
  table id / out=want (keep=id count);
run;The first proc freq generates dataset NEED with one observation per ID*DATE, with variables ID DATE COUNT and PERCENT. The WHERE statement forces it to ignore non-disease observations. The only variable in NEED that you care about is ID. Just do a proc freq of ID from NEED to generate the number of unique disease dates for each id.
Just to add another solution
proc sort data=have(where=(not missing(disease))) out=sorted nodupkey;
   by Id date;
run;
proc summary data=sorted nway;
   by Id;
   output out=want1(drop= _type_ rename=(_freq_=disease));
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
