Solved: Counting medications by id within 90 days

Kyle · Posted 05-22-2014 07:21 PM

Hi - I am trying to count the number of medications in this sample data by id and within 90 days from the start date for each prescription.

For example: take the first record, add 90 days to get the end date as 04/19/12. Now start counting # of meds (med_cnt) until the start date reaches this end date 04/19/12, then go to the next record and start over again until last.id.

Any help is greatly appreciated!

Here is the sample data:

data med;

input id $ start_dt:mmddyy8. med;

cards;

001 01/20/12 1

001 01/31/12 1

001 02/17/12 1

001 04/25/12 1

002 02/02/12 1

002 05/01/12 2

;

The output would be something like this:

id start_dt med end_dt med_cnt

001 01/20/12 1 04/19/12 1

001 01/31/12 1 04/30/12 2

001 02/17/12 1 05/17/12 3

001 04/25/12 1 07/24/12 3 ==> as soon as start date 4/5 > the first end date 4/19, then go to the 2nd observation and start again.

001 10/11/12 1 01/09/14 1

002 02/02/12 1 05/03/12 1

002 05/01/13 2 07/30/12 3

PGStats · Posted 05-23-2014 04:50 PM

A datastep alternative, using random access:

data want3;

set med;

do point=_n_-1 to 1 by -1;

set med(keep=id start_date med

rename=(id=same_id start_date=other_date med=other_med))

point=point;

if id ne same_id or

intck("DAY", other_date, start_date) > 89 then leave;

med + other_med;

end;

keep id start_date med;

rename med=total_med;

run;

PG

View solution in original post

Patrick · Posted 05-23-2014 12:35 AM

One way to go

data med;
input id $ start_date:mmddyy8. med;
format start_date date9.;
cards;
001 01/20/12 1
001 01/31/12 1
001 02/17/12 1
001 04/25/12 1
001 10/11/12 1
002 02/02/12 1
002 05/01/12 2
;
run;

proc sql;
create table want as
    select o.*
      ,(select sum(i.med) from med i where o.id=i.id and 0 <= o.start_date-i.start_date <= 90) as med_count
    from med o
;
quit;

Kyle · Posted 05-23-2014 03:47 PM

Hi Patrick - thanks for your help, the code works well, but it's been running for hours with 1.2 milllions prescriptions.

I was trying to do it with the data step approach and it's been a challenge. Will a "do until loop" do the trick? Thanks again!

PGStats · Posted 05-23-2014 04:07 PM

Not obvious at all. You may try the alternate query:

proc sql;

create table want2 as

select m1.id, m1.start_date, m1.med, sum(m2.med) as medCount

from

med as m1 inner join

med as m2 on

m1.id=m2.id and

m2.start_date between intnx("DAY", m1.start_date, -90) and m1.start_date

group by m1.id, m1.start_date, m1.med;

quit;

which may or may not be more optimizable by SAS.

Good luck.

PG

PGStats · Posted 05-23-2014 04:50 PM

A datastep alternative, using random access:

data want3;

set med;

do point=_n_-1 to 1 by -1;

set med(keep=id start_date med

rename=(id=same_id start_date=other_date med=other_med))

point=point;

if id ne same_id or

intck("DAY", other_date, start_date) > 89 then leave;

med + other_med;

end;

keep id start_date med;

rename med=total_med;

run;

PG

Patrick · Posted 05-23-2014 09:25 PM

Hi Kyle

The SQL I've posted is executing the expression in the Select clause for every single row. That is very inefficient. Sorry about that. I didn't expect you to have that many rows.

I've done some testing with 1.2M rows. The SQL Pierre posted is much more efficient (takes around 56 seconds real time on my laptop) but as expected the data step is much faster (even if you would have to pre-sort the data). The sort takes me only 0.18 seconds of real time and the data step 1.6 seconds.

Below the code I've used:

data med;
attrib id length=8 start_date format=date9. med length=8;

do id=1 to 10000;
start_date='01jan2000'd;

    do while(start_date<today());
      start_date=start_date+ceil(ranuni(1)*30);
      med=ceil(ranuni(2)*1.2);
      output;
      _n+1;
    end;

if _n>1200000 then
stop;
end;
run;

proc sql;
create table want2 as
select m1.id, m1.start_date, m1.med, sum(m2.med) as medCount
from med as m1 inner join med as m2 on
m1.id=m2.id
and m2.start_date between intnx("DAY", m1.start_date, -90) and m1.start_date
group by m1.id, m1.start_date, m1.med;
quit;

proc sort data=med out=med;
by id start_date;
run;

data want3;
set med;
do point=_n_-1 to 1 by -1;
    set med(keep=id start_date med
      rename=(id=same_id start_date=other_date med=other_med))
      point=point;
    if id ne same_id or intck("DAY", other_date, start_date) > 89 then
      leave;
    med + other_med;
end;
keep id start_date med;
rename med=total_med;
run;

Thanks

Patrick

PGStats · Posted 05-23-2014 10:14 PM

Thanks for the test . It confirms my intuition. - PG

PG

Kyle · Posted 05-23-2014 10:35 PM

I have tested all three methods and they all work fine. The last two were on a data sample, I have not had a chance to test them on the 1.2M data yet. I will let you know. Thank both so much for your help. Greatly appreciated!

Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Re: Counting medications by id within 90 days

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away