Solved: Re: How to compute rolling sums based on fixed width moving window (eg...

arorata · Posted 04-28-2021 12:13 PM

Hi All,

I have a data that looks like this:

data have;
	informat index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
	format index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
	input ID $ index_date days_supply drug_start_date daily_dose bl_start_date drug_end_date;
	first_lookup_date=bl_drug_start_date-89;
	label index_date  =  "Index date"
		  drug_start_date = "Prescription fill date"
		  drug_end_date = "Prescription end date"
		  bl_start_date = "Baseline start date"
		  first_lookup_date = "Index date - 89 days"
		  daily_dose = "Drug daily dose"
		  days_supply = "Days supply of prescription fill";
	datalines;
	A 05/18/2018 7 02/12/2018 5 05/17/2017 02/18/2018
	A 05/18/2018 5 03/01/2017 2.5 05/17/2017 03/05/2017
	A 05/18/2018 15 04/02/2017 5 05/17/2017 04/16/2017
	A 06/01/2018 5 03/01/2017 2.5 06/01/2017 03/05/2017
	A 06/01/2018 15 04/02/2017 5 06/01/2017  04/16/2017
	;
run;

I want to compute cumulative dose during a rolling 90 days period conditional on it reaching a threshold (say 90 mg). The rolling period starts on the bl_start_date and ends on index_date.

A few things to note:

A subject can have multiple index dates and can also have multiple prescription fills.

bl_start_date = index_date - 365 days

drug_end_date = drug_start_date + days_supply - 1.

If a subject has drug_end_date that is greater than or equal to first_lookup date and drug_start_date less than or equal to index_date for a given index date then that record is included.

In this example all variables in rows 2 and 3 except index_date, bl_start_date and first_lookup_date are repeated for both index_date where as row 1 is only included for index_date = 05/18/2018.

What I want:

I want to start from bl_start_date and look back 89 days and:

count the number of days exposed to the drug in that period.
then multiply that number of days by the daily dose.
retain the cumulative dose
if the cumulative dose meets or exceeds the threshold dose then set a flag and exit
if cumulative dose is less than the threshold dose then move then increment bl_start_date + 1 and again look back 89 days and repeat steps 1 to 5.

For index_date = 05/18/218 the counter starts from 05/18/2017 and 90 day lookback date is 02/18/2017. From the first row there is only one eligible day in the period so total dose = 5*1 = 5, which is less than 90mg so we keep on incrementing the counter until it reaches 04/16/2017 and the cumulative dose is now 92.5. We set the flag and exit.

Similarly for index_date - 06/01/2018 the counter starts from 06/01/2017 and 90 day lookback date is 03/04/2017. The first row for this index date only has 2 eligble days and total dose of 5 mg. The second row has all 16 eligible days and the total dose = 80 mg giving us a cumulative dose of 85mg. This subject is not eligible and the flag will not be set and loop will exit once the date counter reaches the index_date.

Here is the data I want:

 Obs    ID    index_date    flag
  1        A     05/18/2018      1
  2        A     06/18/2018      0

I can do this with creating a person-day file but really want to avoid it as the actual dataset has over a million subjects and creating potentially 455 records from (first_lookup_date through index_date) for each subject is an immense overhead and drain on limited resources. I would prefer a solution using temporary arrays and loops if possible. Any other efficiencies are most welcome.

Thank you so much in advance.

Kurt_Bremser · Posted 05-13-2021 06:53 AM

You should first use DO loops to transpose your data to a longitudinal format; if necessary, remove overlaps by summarizing per day. Then you can use the rather simple code I presented.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Ksharp · Posted 04-29-2021 09:22 AM

Here is an example :

data have;
call streaminit(1234);
do id='A','B' ;
  do date='01jan2018'd to '01jan2021'd;
    value=rand('uniform');output;
  end;
end;
format date date9.;
run;

proc sql;
create table want as
select *,(select sum(value) from have where id=a.id and date between a.date-90 and a.date) as rolling_sum
 from have as a;
quit;

arorata · Posted 04-29-2021 11:14 AM

Thanks for this. Is there a way to to avoid creating a "person-day" files as you do in your simulated data "have". Can this work with the example data that I posted? Thanks and appreciate your response.

Ksharp · Posted 04-30-2021 08:38 AM

Nopps. You need change your data structure into the same as mines .

arorata · Posted 05-07-2021 12:54 PM

Thanks for you reply. I have posted my code and would really appreciate any input from you.

Kurt_Bremser · Posted 04-29-2021 09:35 AM

If you can make sure that there is always one and only one observation per id and date, you can (after sorting per id and date) use a DATA step with a 90-member array to create the rolling window:

(using @Ksharp 's data)

data have;
call streaminit(1234);
do id='A','B' ;
  do date='01jan2018'd to '01jan2021'd;
    value=rand('uniform');output;
  end;
end;
format date date9.;
run;

%let rwindow_size=90;

data want;
set have;
by id;
array rolling{0:%eval(&rwindow_size.-1)} _temporary_;
if first.id
then do;
  counter = 0;
  call missing(of rolling{*});
end;
else counter + 1;
rolling{mod(counter,&rwindow_size.)} = value;
rolling_sum = sum(of rolling{*});
drop counter;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

arorata · Posted 04-29-2021 12:13 PM

Thanks for this. So when you say single record per ID per date, do you mean per index date of per drug start date? Also does the actual data needs to be reshaped to create a single row for each date between the baseline start and index date or can that be avoided? I would prefer to avoid that since I have millions of subjects. Appreciate your response.

arorata · Posted 05-06-2021 01:06 AM

Thanks @Ksharp and @Kurt_Bremser , here is my amateurish attempt using ideas from both of you so thanks for that.

data have;
	informat index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
	format index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
	input ID $ index_date days_supply drug_start_date daily_dose bl_start_date drug_end_date;
	first_lookup_date=bl_start_date-89;
	label index_date  =  "Index date"
		  drug_start_date = "Prescription fill date"
		  drug_end_date = "Prescription end date"
		  bl_start_date = "Baseline start date"
		  first_lookup_date = "Index date - 89 days"
		  daily_dose = "Drug daily dose"
		  days_supply = "Days supply of prescription fill";
	datalines;
	A 05/18/2018 7 02/12/2017 5 05/18/2017 02/18/2017
	A 05/18/2018 5 03/01/2017 2.5 05/18/2017 03/05/2017
	A 05/18/2018 8 04/02/2017 5 05/18/2017 04/09/2017
	A 05/18/2018 3 04/20/2017 5 05/18/2017 04/22/2017
	A 05/18/2018 10 05/15/2017 5 05/18/2017 05/24/2017
	A 05/18/2018 13 06/18/2017 3 05/18/2017 06/30/2017
	A 05/18/2018 15 10/02/2017 5 05/18/2017  10/16/2017
	A 09/01/2018 13 06/18/2017 3 09/01/2017 06/30/2017
	A 09/01/2018 15 10/02/2017 5 09/01/2017  10/16/2017
	;
run;

I know the maximum number of prescription fills per ID and Index date combination and used that to reshape the data. For this example it is 7 for ID=A and index date=05/18/2018. Here is the code:

proc sort data=have;
	by id index_date drug_start_date;
run;

%let cum_dose=95;
%let max_fils=7; /*maximum no of unique prescription fill for each ID index date combination*/
%let window_width=89;/*gets 90 days counting the day itself*/

data want;
	set have;
	by id index_date drug_start_date;

	array start_date{&max_fils.} 4. _temporary_;
	retain start_date:;

	array end_date{&max_fils.} 4. _temporary_;
	retain end_date:;

	array dds{&max_fils.} _temporary_;
	retain dds:;

	if first.index_date then do;
		i=1;
		call missing(of start_date{*});
		call missing(of end_date{*});
		call missing(of dds{*});
	end;
	start_date(i)=drug_start_date;
	end_date(i)=drug_end_date;
	dds(i)=daily_dose;
	if last.index_date then do;;

		array date{455} 4. _temporary_;/*365 days of baseline + 89 days window*/
		retain date:;
		do dy=1 to dim(date);
			date[dy] = bl_start_date+dy-1;
		end;

		array cum{455} _temporary_;
		retain cum:;

		array ds{&max_fils.} _temporary_;
		retain ds:;

		found=0;
		index=1;
		cumulative_dose=0;
		do aa = index to 455 while (date[aa] le index_date+89);
			dt=date[aa]-89;
			cum[aa]=0;
			format dt mmddyy10.;
			do ab=1 to dim(dds);
	/*			put dt=;*/
	/*			put end_date1=;*/
	/*			put start_date1=;*/
	/*			put date1 mmddyy10.;*/
	/*			put eos_global=;*/
				if (end_date[ab] ge dt) and (start_date[ab] le date[aa]) then do;
					ds[ab] = min(end_date[ab]-dt+1, end_date[ab]-start_date[ab]+1, date[aa]-start_date[ab]+1, 90);
	/*				put ds[ab];*/
	/*				put dt=;*/
	/*				put date[aa] mmddyy10.;*/
	/*				put start_date[ab]=;*/
	/*				put end_date[ab]=;*/
	/*				put aa=;*/
					cum[aa] = cum[aa] + ds[ab]*dds[ab];
	/*				put cum[aa]=;*/
					if cum[aa] ge &cum_dose. then do;
						found=1;
						found_date=date[aa];
						cumulative_dose=cum[aa];
						goto exit;
					end;
				end;
			end;
		end;
		exit:
		output;
	end;
		i+1;
		format found_date mmddyy10.;
		keep ID index_date found cumulative_dose found_date first_look: last_look: bl_start:;
run;

This works for the example data and also on a subset of real data. My concern is if there are logical errors that I am not able to see. I would appreciate any suggestions to make this more elegant ("less brute force") and efficient. Thank you both for responding.

Kurt_Bremser · Posted 05-13-2021 06:53 AM

You should first use DO loops to transpose your data to a longitudinal format; if necessary, remove overlaps by summarizing per day. Then you can use the rather simple code I presented.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

arorata · Posted 05-13-2021 09:44 AM

Thanks for your suggestions. I think that will remove clutter from my code.

How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Re: How to compute rolling sums based on fixed width moving window (eg. 90 days moving each day)

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away