Hi All,
I have a data that looks like this:
data have;
informat index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
format index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
input ID $ index_date days_supply drug_start_date daily_dose bl_start_date drug_end_date;
first_lookup_date=bl_drug_start_date-89;
label index_date = "Index date"
drug_start_date = "Prescription fill date"
drug_end_date = "Prescription end date"
bl_start_date = "Baseline start date"
first_lookup_date = "Index date - 89 days"
daily_dose = "Drug daily dose"
days_supply = "Days supply of prescription fill";
datalines;
A 05/18/2018 7 02/12/2018 5 05/17/2017 02/18/2018
A 05/18/2018 5 03/01/2017 2.5 05/17/2017 03/05/2017
A 05/18/2018 15 04/02/2017 5 05/17/2017 04/16/2017
A 06/01/2018 5 03/01/2017 2.5 06/01/2017 03/05/2017
A 06/01/2018 15 04/02/2017 5 06/01/2017 04/16/2017
;
run;
I want to compute cumulative dose during a rolling 90 days period conditional on it reaching a threshold (say 90 mg). The rolling period starts on the bl_start_date and ends on index_date.
A few things to note:
A subject can have multiple index dates and can also have multiple prescription fills.
bl_start_date = index_date - 365 days
drug_end_date = drug_start_date + days_supply - 1.
If a subject has drug_end_date that is greater than or equal to first_lookup date and drug_start_date less than or equal to index_date for a given index date then that record is included.
In this example all variables in rows 2 and 3 except index_date, bl_start_date and first_lookup_date are repeated for both index_date where as row 1 is only included for index_date = 05/18/2018.
What I want:
I want to start from bl_start_date and look back 89 days and:
For index_date = 05/18/218 the counter starts from 05/18/2017 and 90 day lookback date is 02/18/2017. From the first row there is only one eligible day in the period so total dose = 5*1 = 5, which is less than 90mg so we keep on incrementing the counter until it reaches 04/16/2017 and the cumulative dose is now 92.5. We set the flag and exit.
Similarly for index_date - 06/01/2018 the counter starts from 06/01/2017 and 90 day lookback date is 03/04/2017. The first row for this index date only has 2 eligble days and total dose of 5 mg. The second row has all 16 eligible days and the total dose = 80 mg giving us a cumulative dose of 85mg. This subject is not eligible and the flag will not be set and loop will exit once the date counter reaches the index_date.
Here is the data I want:
Obs ID index_date flag
1 A 05/18/2018 1
2 A 06/18/2018 0
I can do this with creating a person-day file but really want to avoid it as the actual dataset has over a million subjects and creating potentially 455 records from (first_lookup_date through index_date) for each subject is an immense overhead and drain on limited resources. I would prefer a solution using temporary arrays and loops if possible. Any other efficiencies are most welcome.
Thank you so much in advance.
You should first use DO loops to transpose your data to a longitudinal format; if necessary, remove overlaps by summarizing per day. Then you can use the rather simple code I presented.
Here is an example :
data have;
call streaminit(1234);
do id='A','B' ;
do date='01jan2018'd to '01jan2021'd;
value=rand('uniform');output;
end;
end;
format date date9.;
run;
proc sql;
create table want as
select *,(select sum(value) from have where id=a.id and date between a.date-90 and a.date) as rolling_sum
from have as a;
quit;
If you can make sure that there is always one and only one observation per id and date, you can (after sorting per id and date) use a DATA step with a 90-member array to create the rolling window:
(using @Ksharp 's data)
data have;
call streaminit(1234);
do id='A','B' ;
do date='01jan2018'd to '01jan2021'd;
value=rand('uniform');output;
end;
end;
format date date9.;
run;
%let rwindow_size=90;
data want;
set have;
by id;
array rolling{0:%eval(&rwindow_size.-1)} _temporary_;
if first.id
then do;
counter = 0;
call missing(of rolling{*});
end;
else counter + 1;
rolling{mod(counter,&rwindow_size.)} = value;
rolling_sum = sum(of rolling{*});
drop counter;
run;
Thanks @Ksharp and @Kurt_Bremser , here is my amateurish attempt using ideas from both of you so thanks for that.
data have;
informat index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
format index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
input ID $ index_date days_supply drug_start_date daily_dose bl_start_date drug_end_date;
first_lookup_date=bl_start_date-89;
label index_date = "Index date"
drug_start_date = "Prescription fill date"
drug_end_date = "Prescription end date"
bl_start_date = "Baseline start date"
first_lookup_date = "Index date - 89 days"
daily_dose = "Drug daily dose"
days_supply = "Days supply of prescription fill";
datalines;
A 05/18/2018 7 02/12/2017 5 05/18/2017 02/18/2017
A 05/18/2018 5 03/01/2017 2.5 05/18/2017 03/05/2017
A 05/18/2018 8 04/02/2017 5 05/18/2017 04/09/2017
A 05/18/2018 3 04/20/2017 5 05/18/2017 04/22/2017
A 05/18/2018 10 05/15/2017 5 05/18/2017 05/24/2017
A 05/18/2018 13 06/18/2017 3 05/18/2017 06/30/2017
A 05/18/2018 15 10/02/2017 5 05/18/2017 10/16/2017
A 09/01/2018 13 06/18/2017 3 09/01/2017 06/30/2017
A 09/01/2018 15 10/02/2017 5 09/01/2017 10/16/2017
;
run;
I know the maximum number of prescription fills per ID and Index date combination and used that to reshape the data. For this example it is 7 for ID=A and index date=05/18/2018. Here is the code:
proc sort data=have;
by id index_date drug_start_date;
run;
%let cum_dose=95;
%let max_fils=7; /*maximum no of unique prescription fill for each ID index date combination*/
%let window_width=89;/*gets 90 days counting the day itself*/
data want;
set have;
by id index_date drug_start_date;
array start_date{&max_fils.} 4. _temporary_;
retain start_date:;
array end_date{&max_fils.} 4. _temporary_;
retain end_date:;
array dds{&max_fils.} _temporary_;
retain dds:;
if first.index_date then do;
i=1;
call missing(of start_date{*});
call missing(of end_date{*});
call missing(of dds{*});
end;
start_date(i)=drug_start_date;
end_date(i)=drug_end_date;
dds(i)=daily_dose;
if last.index_date then do;;
array date{455} 4. _temporary_;/*365 days of baseline + 89 days window*/
retain date:;
do dy=1 to dim(date);
date[dy] = bl_start_date+dy-1;
end;
array cum{455} _temporary_;
retain cum:;
array ds{&max_fils.} _temporary_;
retain ds:;
found=0;
index=1;
cumulative_dose=0;
do aa = index to 455 while (date[aa] le index_date+89);
dt=date[aa]-89;
cum[aa]=0;
format dt mmddyy10.;
do ab=1 to dim(dds);
/* put dt=;*/
/* put end_date1=;*/
/* put start_date1=;*/
/* put date1 mmddyy10.;*/
/* put eos_global=;*/
if (end_date[ab] ge dt) and (start_date[ab] le date[aa]) then do;
ds[ab] = min(end_date[ab]-dt+1, end_date[ab]-start_date[ab]+1, date[aa]-start_date[ab]+1, 90);
/* put ds[ab];*/
/* put dt=;*/
/* put date[aa] mmddyy10.;*/
/* put start_date[ab]=;*/
/* put end_date[ab]=;*/
/* put aa=;*/
cum[aa] = cum[aa] + ds[ab]*dds[ab];
/* put cum[aa]=;*/
if cum[aa] ge &cum_dose. then do;
found=1;
found_date=date[aa];
cumulative_dose=cum[aa];
goto exit;
end;
end;
end;
end;
exit:
output;
end;
i+1;
format found_date mmddyy10.;
keep ID index_date found cumulative_dose found_date first_look: last_look: bl_start:;
run;
This works for the example data and also on a subset of real data. My concern is if there are logical errors that I am not able to see. I would appreciate any suggestions to make this more elegant ("less brute force") and efficient. Thank you both for responding.
You should first use DO loops to transpose your data to a longitudinal format; if necessary, remove overlaps by summarizing per day. Then you can use the rather simple code I presented.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.