Hi expert,
I wanted to count the 'disease' variable but need to avoid misisng 'disease' and duplicate dates.
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;
proc sort data=have;
by id date disease;
run;
output I need:
id | disease |
1 | 2 |
2 | 3 |
4 | 1 |
5 | 1 |
I tried data step with first.id but it doesn't work.
Any advice would be greatly appreciated! Thank you!
proc sql;
create table temp as
select distinct id, date,
max(disease) as disease
from have(where=(disease ne .))
group by id,date;
create table want
as select distinct id,
sum(disease) as disease
from temp group by id;
quit;
proc sql;
create table temp as
select distinct id, date,
max(disease) as disease
from have(where=(disease ne .))
group by id,date;
create table want
as select distinct id,
sum(disease) as disease
from temp group by id;
quit;
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;
data want;
set have;
by id date notsorted;
if first.id then count=0;
if first.date then count+disease;
if last.id and count;
run;
If the data are not sorted, then a recursive PROC FREQ might be the most efficient:
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
run;
proc freq data=have noprint;
table id*date / out=need;
where disease=1;
run;
proc freq data=need noprint;
table id / out=want (keep=id count);
run;
The first proc freq generates dataset NEED with one observation per ID*DATE, with variables ID DATE COUNT and PERCENT. The WHERE statement forces it to ignore non-disease observations. The only variable in NEED that you care about is ID. Just do a proc freq of ID from NEED to generate the number of unique disease dates for each id.
Just to add another solution
proc sort data=have(where=(not missing(disease))) out=sorted nodupkey;
by Id date;
run;
proc summary data=sorted nway;
by Id;
output out=want1(drop= _type_ rename=(_freq_=disease));
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.