Running sum across observations for a specific time period

Reply
Occasional Contributor
Posts: 8

Running sum across observations for a specific time period

Hi,

I have a longitudinal dataset and would like to sum up a field (drug dose) for 2 and 4 hour intervals.  Essentially I want SAS to create logic to look back over the past 2 and 4 hour period and calculate a total.  I was able to set up the following code to add up doses for 2 and 4 hour intervals, however, this only calculates a total over the 2 or 4 hour period and then resets it (not a running sum. 

Thanks,  JH

data temp2;

  set temp;

  by episode_id datetime;

  retain sum_dose_2hr sum_dose_4hr begin_dose_dt_2hr begin_dose_dt_4hr;

  format begin_dose_dt_2hr datetime20.

    begin_dose_dt_4hr datetime20.;

  if first.episode_id then do;

     begin_dose_dt_2hr=datetime; begin_dose_dt_4hr=datetime;

  sum_dose_2hr=0; sum_dose_4hr=0;

  end;

  if ((datetime-begin_dose_dt_2hr)/3600)>2 then do;

  begin_dose_dt_2hr=datetime; sum_dose_2hr=0;

  end;

  if ((datetime-begin_dose_dt_4hr)/3600)>4 then do;

  begin_dose_dt_4hr=datetime; sum_dose_4hr=0;

  end;

  sum_dose_2hr=sum(sum_dose_2hr,IV_SOLUTIONS_drug_dose);

  sum_dose_4hr=sum(sum_dose_4hr,IV_SOLUTIONS_drug_dose);

run;

Super Contributor
Posts: 543

Re: Running sum across observations for a specific time period

Could you provide a small sample (fake) data?

Anca.

Occasional Contributor
Posts: 8

Re: Running sum across observations for a specific time period

Posted in reply to AncaTilea

Sure.  My data is set similar to the following

episode_id          datetime                 drug_dose

      1              01jan2013:12:00               10

      1              01jan2013:01:00               10

      1              01jan2013:02:00               10

      1              01jan2013:02:30               10

      1              01jan2013:04:30               10

Here is how I want my data to look

episode_id          datetime                 drug_dose          sum_drug_dose_2hr          sum_drug_dose_4hr

      1              01jan2013:12:00               10                            10                                     10

      1              01jan2013:01:00               10                            20                                     20 

      1              01jan2013:02:00               10                            30                                     30

      1              01jan2013:02:30               10                            30                                     40

      1              01jan2013:04:30               10                            20                                     40 

Super Contributor
Posts: 543

Re: Running sum across observations for a specific time period

Oi, I solved this in a non-elegant way.

I assume you need another flag to get the logic that you desire:

(The bold part in the code are my additions)

data temp2;

  set temp;

  by episode_id datetime;

  retain sum_dose_2hr sum_dose_4hr begin_dose_dt_2hr begin_dose_dt_4hr;

  format begin_dose_dt_2hr datetime20.

    begin_dose_dt_4hr datetime20.;

  if first.episode_id then do;

     begin_dose_dt_2hr=datetime; begin_dose_dt_4hr=datetime;

  sum_dose_2hr=0; sum_dose_4hr=0;

  end;

  if ((datetime-begin_dose_dt_2hr)/3600)>2 then do;

temp_2hr = sum_dose_2hr;begin_dose_dt_2hr=datetime; sum_dose_2hr=0;

  end;

  if ((datetime-begin_dose_dt_4hr)/3600)>4 then do;

temp_4hr = sum_dose_4hr;begin_dose_dt_4hr=datetime; sum_dose_4hr=0;

  end;

  sum_dose_2hr=sum(sum_dose_2hr,IV_SOLUTIONS_drug_dose);

  sum_dose_4hr=sum(sum_dose_4hr,IV_SOLUTIONS_drug_dose);

run;

data step;

  set temp2;

  if temp_2hr ne . then sum_dose_2hr = temp_2hr;else sum_dose_2hr = sum_dose_2hr;

  if temp_4hr ne . then sum_dose_4hr = temp_4hr;else sum_dose_4hr = sum_dose_4hr;

  drop temp_:;

run;

proc print;run;


Good luck!

Nice problem.

Anca.

PS: there are more elegant/efficient ways to solve this, maybe someone will provide a better solution

Occasional Contributor
Posts: 8

Re: Running sum across observations for a specific time period

Posted in reply to AncaTilea

Unfortunately your code does not seem to work.  The temp variables overwrite the sum_dose fields whenever the 2 hour or 4 hour times are exceeded. 

I was able to have a colleague help me solve this problem using proc expand to create dummy rows at the minute level; then calculate running sums.

JH

Super Contributor
Posts: 543

Re: Running sum across observations for a specific time period

I apologize if I didn't quite get it.

I thought that you wanted the sum to be carried over

1   01JAN13:00:00 1      10     10   10    .    1010
2   01JAN13:01:00 1      10     20   20    .    2020
3   01JAN13:02:00 1      10     30   30    .    3030
4   01JAN13:02:30 1      10     10   40    1    30--------------->>>this is more than 2 hrs, carry down the sum from above, while you 'reset' the sum and
5   01JAN13:04:30 1      10     20   10    .    20---------------->>> so this sum now it's 20 (10 + 10)
6   01JAN13:00:00 2      10     10   10    .    1010
7   01JAN13:01:00 2      10     20   20    .    2020
8   01JAN13:02:00 2      10     30   30    .    3030
9   01JAN13:02:30 2      10     10   40    1    3040
10   01JAN13:04:30 2      10     20   10    .    2040 ------>>> similarly, this is more than 4 hrs...so carry down the sum....

I do have a proc sort by id and datetime;

Given this dataset, what does your code result in?

data temp;

format datetime datetime15.;

input episode_id          datetime datetime15.                 iv_solutions_drug_dose;

datalines;

1 01jan2013:00:00 10

1 01jan2013:01:00 10

1 01jan2013:02:00 10

1 01jan2013:02:30 10

1 01jan2013:04:30 10

2 01jan2013:00:00 10

2 01jan2013:01:00 10

2 01jan2013:02:00 10

2 01jan2013:02:30 10

2 01jan2013:04:30 10

;

I am curious, that's all.

Thanks.

Anca.

Occasional Contributor
Posts: 8

Re: Running sum across observations for a specific time period

Posted in reply to AncaTilea

Sorry if my post was not clear.  I was trying to calculate a running sum of the dose for the previous 2 and 4 hours, not carry down the sum. 

I can post the results of my code later, but I need to test it out first.

Thanks,

JH

Respected Advisor
Posts: 4,920

Re: Running sum across observations for a specific time period

Are those datetimes from the second one on supposed to be PM?

PG
Occasional Contributor
Posts: 8

Re: Running sum across observations for a specific time period

All of the datetimes are military times (up to 24 hrs) in SAS, no AM or PM is denoted.

Respected Advisor
Posts: 4,920

Re: Running sum across observations for a specific time period

The datetimes in your example are not sorted then. Why sum_drug_dose_2hr=20 at 01jan2013:01:00? There is no datetime preceding it in your data. Anca's code assumes your data is sorted.

PG
Occasional Contributor
Posts: 8

Re: Running sum across observations for a specific time period

You are correct.  I made up the data for the example.  The 12:00 time should have read 00:00. 

Super Contributor
Posts: 543

Re: Running sum across observations for a specific time period

And I omitted to mention that I  changed the dates to be 13,14,14.30,and 16.30.

Sorry.

:-)

Respected Advisor
Posts: 4,920

Re: Running sum across observations for a specific time period

You may find SQL to be a slightly simpler solution:

proc sql;

create table want as

select a.*, sum_drug_dose_4h from    

     (select

           h0.*,

           sum(h2.drug_dose) as sum_drug_dose_2h

      from

           temp as h0 inner join

           temp as h2 on h0.episode_id=h2.episode_id and

                h2.datetime between intnx("HOUR", h0.datetime, -2, "SAME") and h0.datetime

      group by h0.episode_id, h0.datetime, h0.drug_dose) as a

natural join

     (select

           h0.episode_id,

           h0.datetime,

           sum(h4.drug_dose) as sum_drug_dose_4h

      from

           temp as h0 inner join

           temp as h4 on h0.episode_id=h4.episode_id and

                 h4.datetime between intnx("HOUR", h0.datetime, -4, "SAME") and h0.datetime

      group by h0.episode_id, h0.datetime);

select * from want;

quit;

I did say slightly?

PG

PG
Super User
Posts: 5,503

Re: Running sum across observations for a specific time period

It looks like you have a relatively limited number of records to sum.  If you were to store the most recent 10 doses, for example, that might be sufficient to calculate a total for the most recent 4 hours.  In that case, a set of two arrays (one with doses, one with datetimes)  would probably be the simplest solution.

If you can confirm that 10 doses would always be enough for a 4-hour time period (or if not, post an alternative maximum number of doses needed), I would be able to write the code later tonight.

Good luck.

Occasional Contributor
Posts: 8

Re: Running sum across observations for a specific time period

I appreciate all the help everyone!

JH

Ask a Question
Discussion stats
  • 16 replies
  • 443 views
  • 0 likes
  • 4 in conversation