Hi expert,
I wanted to count the 'disease' variable but need to avoid misisng 'disease' and duplicate dates.
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;
proc sort data=have;
by id date disease;
run;
output I need:
id | disease |
1 | 2 |
2 | 3 |
4 | 1 |
5 | 1 |
I tried data step with first.id but it doesn't work.
Any advice would be greatly appreciated! Thank you!
proc sql;
create table temp as
select distinct id, date,
max(disease) as disease
from have(where=(disease ne .))
group by id,date;
create table want
as select distinct id,
sum(disease) as disease
from temp group by id;
quit;
proc sql;
create table temp as
select distinct id, date,
max(disease) as disease
from have(where=(disease ne .))
group by id,date;
create table want
as select distinct id,
sum(disease) as disease
from temp group by id;
quit;
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;
data want;
set have;
by id date notsorted;
if first.id then count=0;
if first.date then count+disease;
if last.id and count;
run;
If the data are not sorted, then a recursive PROC FREQ might be the most efficient:
data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
run;
proc freq data=have noprint;
table id*date / out=need;
where disease=1;
run;
proc freq data=need noprint;
table id / out=want (keep=id count);
run;
The first proc freq generates dataset NEED with one observation per ID*DATE, with variables ID DATE COUNT and PERCENT. The WHERE statement forces it to ignore non-disease observations. The only variable in NEED that you care about is ID. Just do a proc freq of ID from NEED to generate the number of unique disease dates for each id.
Just to add another solution
proc sort data=have(where=(not missing(disease))) out=sorted nodupkey;
by Id date;
run;
proc summary data=sorted nway;
by Id;
output out=want1(drop= _type_ rename=(_freq_=disease));
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.