Hi - I am trying to count the number of medications in this sample data by id and within 90 days from the start date for each prescription.
For example: take the first record, add 90 days to get the end date as 04/19/12. Now start counting # of meds (med_cnt) until the start date reaches this end date 04/19/12, then go to the next record and start over again until last.id.
Any help is greatly appreciated!
Here is the sample data:
data med;
input id $ start_dt:mmddyy8. med;
cards;
001 01/20/12 1
001 01/31/12 1
001 02/17/12 1
001 04/25/12 1
002 02/02/12 1
002 05/01/12 2
;
The output would be something like this:
id start_dt med end_dt med_cnt
001 01/20/12 1 04/19/12 1
001 01/31/12 1 04/30/12 2
001 02/17/12 1 05/17/12 3
001 04/25/12 1 07/24/12 3 ==> as soon as start date 4/5 > the first end date 4/19, then go to the 2nd observation and start again.
001 10/11/12 1 01/09/14 1
002 02/02/12 1 05/03/12 1
002 05/01/13 2 07/30/12 3
A datastep alternative, using random access:
data want3;
set med;
do point=_n_-1 to 1 by -1;
set med(keep=id start_date med
rename=(id=same_id start_date=other_date med=other_med))
point=point;
if id ne same_id or
intck("DAY", other_date, start_date) > 89 then leave;
med + other_med;
end;
keep id start_date med;
rename med=total_med;
run;
PG
One way to go
data med;
input id $ start_date:mmddyy8. med;
format start_date date9.;
cards;
001 01/20/12 1
001 01/31/12 1
001 02/17/12 1
001 04/25/12 1
001 10/11/12 1
002 02/02/12 1
002 05/01/12 2
;
run;
proc sql;
create table want as
select o.*
,(select sum(i.med) from med i where o.id=i.id and 0 <= o.start_date-i.start_date <= 90) as med_count
from med o
;
quit;
Hi Patrick - thanks for your help, the code works well, but it's been running for hours with 1.2 milllions prescriptions.
I was trying to do it with the data step approach and it's been a challenge. Will a "do until loop" do the trick? Thanks again!
Not obvious at all. You may try the alternate query:
proc sql;
create table want2 as
select m1.id, m1.start_date, m1.med, sum(m2.med) as medCount
from
med as m1 inner join
med as m2 on
m1.id=m2.id and
m2.start_date between intnx("DAY", m1.start_date, -90) and m1.start_date
group by m1.id, m1.start_date, m1.med;
quit;
which may or may not be more optimizable by SAS.
Good luck.
PG
A datastep alternative, using random access:
data want3;
set med;
do point=_n_-1 to 1 by -1;
set med(keep=id start_date med
rename=(id=same_id start_date=other_date med=other_med))
point=point;
if id ne same_id or
intck("DAY", other_date, start_date) > 89 then leave;
med + other_med;
end;
keep id start_date med;
rename med=total_med;
run;
PG
Hi Kyle
The SQL I've posted is executing the expression in the Select clause for every single row. That is very inefficient. Sorry about that. I didn't expect you to have that many rows.
I've done some testing with 1.2M rows. The SQL Pierre posted is much more efficient (takes around 56 seconds real time on my laptop) but as expected the data step is much faster (even if you would have to pre-sort the data). The sort takes me only 0.18 seconds of real time and the data step 1.6 seconds.
Below the code I've used:
data med;
attrib id length=8 start_date format=date9. med length=8;
do id=1 to 10000;
start_date='01jan2000'd;
do while(start_date<today());
start_date=start_date+ceil(ranuni(1)*30);
med=ceil(ranuni(2)*1.2);
output;
_n+1;
end;
if _n>1200000 then
stop;
end;
run;
proc sql;
create table want2 as
select m1.id, m1.start_date, m1.med, sum(m2.med) as medCount
from med as m1 inner join med as m2 on
m1.id=m2.id
and m2.start_date between intnx("DAY", m1.start_date, -90) and m1.start_date
group by m1.id, m1.start_date, m1.med;
quit;
proc sort data=med out=med;
by id start_date;
run;
data want3;
set med;
do point=_n_-1 to 1 by -1;
set med(keep=id start_date med
rename=(id=same_id start_date=other_date med=other_med))
point=point;
if id ne same_id or intck("DAY", other_date, start_date) > 89 then
leave;
med + other_med;
end;
keep id start_date med;
rename med=total_med;
run;
Thanks
Patrick
I have tested all three methods and they all work fine. The last two were on a data sample, I have not had a chance to test them on the 1.2M data yet. I will let you know. Thank both so much for your help. Greatly appreciated!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.