Hi experts,
I have a dataset below and I would like to select subject who had ≥2 claims on separate dates for an asthma diagnosis, and for the final dataset, keep the earliest data as the indexdate. I had run my codes below, and it seems my codes were redundant. I was wondering if someone could help me to improve my codes?
Thank you in advance!
data have;
input id $ indexdate : mmddyy10. asthma $;
format indexdate mmddyy10.;
datalines;
201 04/10/2013 1
201 04/11/2013 1
201 04/12/2013 1
202 02/21/2013 1
202 03/07/2013 1
202 03/07/2013 1
202 03/08/2013 1
202 03/09/2013 1
202 03/20/2013 1
203 02/14/2013 1
203 02/15/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
205 03/18/2013 1
206 05/01/2013 1
206 05/01/2013 1
;
run;
proc sort data=have out=have_nodup nodupkey;
by id indexdate;
run;
proc sort data=have_nodup nouniquekeys out=have_a;
by id;
run;
proc sort data=have_a nodupkey out=want;
by id;
run;
SQL is slightly better at this type of analysis.
proc sql;
create table want as
select ID, count(distinct indexdate) as num_dates
from have
where asthma=1
group by id
having num_dates >=2;
quit;
@lulu3 wrote:
Hi experts,
I have a dataset below and I would like to select subject who had ≥2 claims on separate dates for an asthma diagnosis, and for the final dataset, keep the earliest data as the indexdate. I had run my codes below, and it seems my codes were redundant. I was wondering if someone could help me to improve my codes?
Thank you in advance!
data have;
input id $ indexdate : mmddyy10. asthma $;
format indexdate mmddyy10.;
datalines;
201 04/10/2013 1
201 04/11/2013 1
201 04/12/2013 1
202 02/21/2013 1
202 03/07/2013 1
202 03/07/2013 1
202 03/08/2013 1
202 03/09/2013 1
202 03/20/2013 1
203 02/14/2013 1
203 02/15/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
205 03/18/2013 1
206 05/01/2013 1
206 05/01/2013 1
;
run;
proc sort data=have out=have_nodup nodupkey;
by id indexdate;
run;proc sort data=have_nodup nouniquekeys out=have_a;
by id;
run;proc sort data=have_a nodupkey out=want;
by id;
run;
Is that what you are looking for:
proc sort data=have out=temp nodupkey;
by id indexdate;
run;
data want;
set temp;
by id indexdate;
if not (first.indexdate and last.indexdate); /* more then 1 claim */
if first.indexdate; /* the earliest claim */
run;
It would be easy to check if you post the wanted results.
SQL is slightly better at this type of analysis.
proc sql;
create table want as
select ID, count(distinct indexdate) as num_dates
from have
where asthma=1
group by id
having num_dates >=2;
quit;
@lulu3 wrote:
Hi experts,
I have a dataset below and I would like to select subject who had ≥2 claims on separate dates for an asthma diagnosis, and for the final dataset, keep the earliest data as the indexdate. I had run my codes below, and it seems my codes were redundant. I was wondering if someone could help me to improve my codes?
Thank you in advance!
data have;
input id $ indexdate : mmddyy10. asthma $;
format indexdate mmddyy10.;
datalines;
201 04/10/2013 1
201 04/11/2013 1
201 04/12/2013 1
202 02/21/2013 1
202 03/07/2013 1
202 03/07/2013 1
202 03/08/2013 1
202 03/09/2013 1
202 03/20/2013 1
203 02/14/2013 1
203 02/15/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
204 01/23/2013 1
205 03/18/2013 1
206 05/01/2013 1
206 05/01/2013 1
;
run;
proc sort data=have out=have_nodup nodupkey;
by id indexdate;
run;proc sort data=have_nodup nouniquekeys out=have_a;
by id;
run;proc sort data=have_a nodupkey out=want;
by id;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.