BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lulu3
Obsidian | Level 7

Hi expert,

I wanted to count the 'disease' variable but need to avoid misisng 'disease' and duplicate dates.

 

data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;

 

proc sort data=have;
by id date disease;
run;

 

output I need:

iddisease
12
23
41
51

 

I tried data step with first.id but it doesn't work. 

 

Any advice would be greatly appreciated!  Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18
proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;

  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;

View solution in original post

4 REPLIES 4
Shmuel
Garnet | Level 18
proc sql;
  create table temp as
  select distinct id, date,
     max(disease) as disease
	 from have(where=(disease ne .))
	 group by id,date;

  create table want
  as select distinct id,
     sum(disease) as disease
  from temp group by id;
quit;
novinosrin
Tourmaline | Level 20


data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
;
run;

data want;
 set have;
 by id date notsorted;
 if first.id then count=0;
 if first.date then count+disease;
 if last.id and count;
run;
mkeintz
PROC Star

If the data are not sorted, then a recursive PROC FREQ might be the most efficient:

 

data have;
input id $ date mmddyy10. disease @@;
format date mmddyy10.;
datalines;
1 01/08/2016 1
1 01/08/2016 1
1 01/10/2016 .
1 01/11/2016 1
2 01/13/2016 1
2 04/04/2016 1
2 06/05/2016 1
3 12/16/2016 .
3 11/17/2016 .
4 05/18/2016 1
4 05/18/2016 1
4 05/18/2016 1
4 03/16/2016 .
5 04/10/2016 .
5 04/16/2016 1
run;

proc freq data=have noprint;
  table id*date / out=need;
  where disease=1;
run;
proc freq data=need noprint;
  table id / out=want (keep=id count);
run;

The first proc freq generates dataset NEED with one observation per ID*DATE, with variables ID DATE COUNT and PERCENT.  The WHERE statement forces it to ignore non-disease observations.  The only variable in NEED that you care about is ID.  Just do a proc freq of ID from NEED to generate the number of unique disease dates for each id.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Just to add another solution

proc sort data=have(where=(not missing(disease))) out=sorted nodupkey;
   by Id date;
run;

proc summary data=sorted nway;
   by Id;
   output out=want1(drop= _type_ rename=(_freq_=disease));
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1304 views
  • 5 likes
  • 5 in conversation