SQL is slightly better at this type of analysis.
proc sql;
create table want as
select ID, count(distinct indexdate) as num_dates
from have
where asthma=1
group by id
having num_dates >=2;
quit;
@lulu3 wrote:
Hi experts,
I have a dataset below and I would like to select subject who had ≥2 claims on separate dates for an asthma diagnosis, and for the final dataset, keep the earliest data as the indexdate. I had run my codes below, and it seems my codes were redundant. I was wondering if someone could help me to improve my codes?
Thank you in advance!
data have; input id $ indexdate : mmddyy10. asthma $; format indexdate mmddyy10.; datalines; 201 04/10/2013 1 201 04/11/2013 1 201 04/12/2013 1 202 02/21/2013 1 202 03/07/2013 1 202 03/07/2013 1 202 03/08/2013 1 202 03/09/2013 1 202 03/20/2013 1 203 02/14/2013 1 203 02/15/2013 1 204 01/23/2013 1 204 01/23/2013 1 204 01/23/2013 1 204 01/23/2013 1 204 01/23/2013 1 204 01/23/2013 1 205 03/18/2013 1 206 05/01/2013 1 206 05/01/2013 1 ; run;
proc sort data=have out=have_nodup nodupkey; by id indexdate; run;
proc sort data=have_nodup nouniquekeys out=have_a; by id; run;
proc sort data=have_a nodupkey out=want; by id; run;
... View more