BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kyle
Calcite | Level 5

Hi - I am trying to count the number of medications in this sample data by id and within 90 days from the start date for each prescription.

For example: take the first record, add 90 days to get the end date as 04/19/12. Now start counting # of meds (med_cnt) until the start date reaches this end date 04/19/12, then go to the next record and start over again until last.id.

Any help is greatly appreciated!

Here is the sample data:

data med;

input id $ start_dt:mmddyy8. med;

cards;

001 01/20/12 1

001 01/31/12 1

001 02/17/12 1

001 04/25/12 1

002 02/02/12 1

002 05/01/12 2

;

The output would be something like this:

id        start_dt     med     end_dt     med_cnt

001     01/20/12     1     04/19/12     1

001     01/31/12     1     04/30/12     2

001     02/17/12     1     05/17/12     3

001     04/25/12     1     07/24/12     3   ==> as soon as start date 4/5 > the first end date 4/19, then go to the 2nd observation and start again.

001     10/11/12     1     01/09/14     1

002     02/02/12     1     05/03/12     1

002     05/01/13     2     07/30/12     3

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

A datastep alternative, using random access:

data want3;

set med;

do point=_n_-1 to 1 by -1;

  set med(keep=id start_date med

       rename=(id=same_id start_date=other_date med=other_med))

       point=point;

  if id ne same_id or

       intck("DAY", other_date, start_date) > 89 then leave;

  med + other_med;

  end;

keep id start_date med;

rename med=total_med;

run;

PG

PG

View solution in original post

7 REPLIES 7
Patrick
Opal | Level 21

One way to go

data med;
  input id $ start_date:mmddyy8. med;
  format start_date date9.;
  cards;
001 01/20/12 1
001 01/31/12 1
001 02/17/12 1
001 04/25/12 1
001 10/11/12 1
002 02/02/12 1
002 05/01/12 2
;
run;

proc sql;
  create table want as
    select o.*
      ,(select sum(i.med) from med i where o.id=i.id and 0 <= o.start_date-i.start_date <= 90) as med_count
    from med o
  ;
quit;

Kyle
Calcite | Level 5

Hi Patrick - thanks for your help, the code works well, but it's been running for hours with 1.2 milllions prescriptions.

I was trying to do it with the data step approach and it's been a challenge. Will  a "do until loop" do the trick? Thanks again!

PGStats
Opal | Level 21

Not obvious at all. You may try the alternate query:

proc sql;

create table want2 as

select m1.id, m1.start_date, m1.med, sum(m2.med) as medCount

from

  med as m1 inner join

  med as m2 on

       m1.id=m2.id and

            m2.start_date between intnx("DAY", m1.start_date, -90) and m1.start_date

group by m1.id, m1.start_date, m1.med;

quit;

which may or may not be more optimizable by SAS.

Good luck.

PG

PG
PGStats
Opal | Level 21

A datastep alternative, using random access:

data want3;

set med;

do point=_n_-1 to 1 by -1;

  set med(keep=id start_date med

       rename=(id=same_id start_date=other_date med=other_med))

       point=point;

  if id ne same_id or

       intck("DAY", other_date, start_date) > 89 then leave;

  med + other_med;

  end;

keep id start_date med;

rename med=total_med;

run;

PG

PG
Patrick
Opal | Level 21

Hi Kyle

The SQL I've posted is executing the expression in the Select clause for every single row. That is very inefficient. Sorry about that. I didn't expect you to have that many rows.

I've done some testing with 1.2M rows. The SQL Pierre posted is much more efficient (takes around 56 seconds real time on my laptop) but as expected the data step is much faster (even if you would have to pre-sort the data). The sort takes me only 0.18 seconds of real time and the data step 1.6 seconds.

Below the code I've used:

data med;
  attrib id length=8 start_date format=date9. med length=8;

  do id=1 to 10000;
    start_date='01jan2000'd;

    do while(start_date<today());
      start_date=start_date+ceil(ranuni(1)*30);
      med=ceil(ranuni(2)*1.2);
      output;
      _n+1;
    end;

    if _n>1200000 then
      stop;
  end;
run;

proc sql;
  create table want2 as
  select m1.id, m1.start_date, m1.med, sum(m2.med) as medCount
  from med as m1 inner join med as m2 on
    m1.id=m2.id
    and m2.start_date between intnx("DAY", m1.start_date, -90) and m1.start_date
  group by m1.id, m1.start_date, m1.med;
quit;

proc sort data=med out=med;
  by id start_date;
run;

data want3;
  set med;
  do point=_n_-1 to 1 by -1;
    set med(keep=id start_date med
      rename=(id=same_id start_date=other_date med=other_med))
      point=point;
    if id ne same_id or  intck("DAY", other_date, start_date) > 89 then
      leave;
    med + other_med;
  end;
  keep id start_date med;
  rename med=total_med;
run;

Thanks

Patrick

PGStats
Opal | Level 21

Thanks for the test . It confirms my intuition. - PG

PG
Kyle
Calcite | Level 5

I have tested all three methods and they all work fine. The last two were on a data sample, I have not had a chance to test them on the 1.2M data yet. I will let you know. Thank both so much for your help. Greatly appreciated!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2092 views
  • 7 likes
  • 3 in conversation