- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have a data that looks like this:
data have;
informat index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
format index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
input ID $ index_date days_supply drug_start_date daily_dose bl_start_date drug_end_date;
first_lookup_date=bl_drug_start_date-89;
label index_date = "Index date"
drug_start_date = "Prescription fill date"
drug_end_date = "Prescription end date"
bl_start_date = "Baseline start date"
first_lookup_date = "Index date - 89 days"
daily_dose = "Drug daily dose"
days_supply = "Days supply of prescription fill";
datalines;
A 05/18/2018 7 02/12/2018 5 05/17/2017 02/18/2018
A 05/18/2018 5 03/01/2017 2.5 05/17/2017 03/05/2017
A 05/18/2018 15 04/02/2017 5 05/17/2017 04/16/2017
A 06/01/2018 5 03/01/2017 2.5 06/01/2017 03/05/2017
A 06/01/2018 15 04/02/2017 5 06/01/2017 04/16/2017
;
run;
I want to compute cumulative dose during a rolling 90 days period conditional on it reaching a threshold (say 90 mg). The rolling period starts on the bl_start_date and ends on index_date.
A few things to note:
A subject can have multiple index dates and can also have multiple prescription fills.
bl_start_date = index_date - 365 days
drug_end_date = drug_start_date + days_supply - 1.
If a subject has drug_end_date that is greater than or equal to first_lookup date and drug_start_date less than or equal to index_date for a given index date then that record is included.
In this example all variables in rows 2 and 3 except index_date, bl_start_date and first_lookup_date are repeated for both index_date where as row 1 is only included for index_date = 05/18/2018.
What I want:
I want to start from bl_start_date and look back 89 days and:
- count the number of days exposed to the drug in that period.
- then multiply that number of days by the daily dose.
- retain the cumulative dose
- if the cumulative dose meets or exceeds the threshold dose then set a flag and exit
- if cumulative dose is less than the threshold dose then move then increment bl_start_date + 1 and again look back 89 days and repeat steps 1 to 5.
For index_date = 05/18/218 the counter starts from 05/18/2017 and 90 day lookback date is 02/18/2017. From the first row there is only one eligible day in the period so total dose = 5*1 = 5, which is less than 90mg so we keep on incrementing the counter until it reaches 04/16/2017 and the cumulative dose is now 92.5. We set the flag and exit.
Similarly for index_date - 06/01/2018 the counter starts from 06/01/2017 and 90 day lookback date is 03/04/2017. The first row for this index date only has 2 eligble days and total dose of 5 mg. The second row has all 16 eligible days and the total dose = 80 mg giving us a cumulative dose of 85mg. This subject is not eligible and the flag will not be set and loop will exit once the date counter reaches the index_date.
Here is the data I want:
Obs ID index_date flag
1 A 05/18/2018 1
2 A 06/18/2018 0
I can do this with creating a person-day file but really want to avoid it as the actual dataset has over a million subjects and creating potentially 455 records from (first_lookup_date through index_date) for each subject is an immense overhead and drain on limited resources. I would prefer a solution using temporary arrays and loops if possible. Any other efficiencies are most welcome.
Thank you so much in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You should first use DO loops to transpose your data to a longitudinal format; if necessary, remove overlaps by summarizing per day. Then you can use the rather simple code I presented.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is an example :
data have;
call streaminit(1234);
do id='A','B' ;
do date='01jan2018'd to '01jan2021'd;
value=rand('uniform');output;
end;
end;
format date date9.;
run;
proc sql;
create table want as
select *,(select sum(value) from have where id=a.id and date between a.date-90 and a.date) as rolling_sum
from have as a;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you can make sure that there is always one and only one observation per id and date, you can (after sorting per id and date) use a DATA step with a 90-member array to create the rolling window:
(using @Ksharp 's data)
data have;
call streaminit(1234);
do id='A','B' ;
do date='01jan2018'd to '01jan2021'd;
value=rand('uniform');output;
end;
end;
format date date9.;
run;
%let rwindow_size=90;
data want;
set have;
by id;
array rolling{0:%eval(&rwindow_size.-1)} _temporary_;
if first.id
then do;
counter = 0;
call missing(of rolling{*});
end;
else counter + 1;
rolling{mod(counter,&rwindow_size.)} = value;
rolling_sum = sum(of rolling{*});
drop counter;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Ksharp and @Kurt_Bremser , here is my amateurish attempt using ideas from both of you so thanks for that.
data have;
informat index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
format index_date drug_start_date bl_start_date drug_end_date first_lookup_date mmddyy10.;
input ID $ index_date days_supply drug_start_date daily_dose bl_start_date drug_end_date;
first_lookup_date=bl_start_date-89;
label index_date = "Index date"
drug_start_date = "Prescription fill date"
drug_end_date = "Prescription end date"
bl_start_date = "Baseline start date"
first_lookup_date = "Index date - 89 days"
daily_dose = "Drug daily dose"
days_supply = "Days supply of prescription fill";
datalines;
A 05/18/2018 7 02/12/2017 5 05/18/2017 02/18/2017
A 05/18/2018 5 03/01/2017 2.5 05/18/2017 03/05/2017
A 05/18/2018 8 04/02/2017 5 05/18/2017 04/09/2017
A 05/18/2018 3 04/20/2017 5 05/18/2017 04/22/2017
A 05/18/2018 10 05/15/2017 5 05/18/2017 05/24/2017
A 05/18/2018 13 06/18/2017 3 05/18/2017 06/30/2017
A 05/18/2018 15 10/02/2017 5 05/18/2017 10/16/2017
A 09/01/2018 13 06/18/2017 3 09/01/2017 06/30/2017
A 09/01/2018 15 10/02/2017 5 09/01/2017 10/16/2017
;
run;
I know the maximum number of prescription fills per ID and Index date combination and used that to reshape the data. For this example it is 7 for ID=A and index date=05/18/2018. Here is the code:
proc sort data=have;
by id index_date drug_start_date;
run;
%let cum_dose=95;
%let max_fils=7; /*maximum no of unique prescription fill for each ID index date combination*/
%let window_width=89;/*gets 90 days counting the day itself*/
data want;
set have;
by id index_date drug_start_date;
array start_date{&max_fils.} 4. _temporary_;
retain start_date:;
array end_date{&max_fils.} 4. _temporary_;
retain end_date:;
array dds{&max_fils.} _temporary_;
retain dds:;
if first.index_date then do;
i=1;
call missing(of start_date{*});
call missing(of end_date{*});
call missing(of dds{*});
end;
start_date(i)=drug_start_date;
end_date(i)=drug_end_date;
dds(i)=daily_dose;
if last.index_date then do;;
array date{455} 4. _temporary_;/*365 days of baseline + 89 days window*/
retain date:;
do dy=1 to dim(date);
date[dy] = bl_start_date+dy-1;
end;
array cum{455} _temporary_;
retain cum:;
array ds{&max_fils.} _temporary_;
retain ds:;
found=0;
index=1;
cumulative_dose=0;
do aa = index to 455 while (date[aa] le index_date+89);
dt=date[aa]-89;
cum[aa]=0;
format dt mmddyy10.;
do ab=1 to dim(dds);
/* put dt=;*/
/* put end_date1=;*/
/* put start_date1=;*/
/* put date1 mmddyy10.;*/
/* put eos_global=;*/
if (end_date[ab] ge dt) and (start_date[ab] le date[aa]) then do;
ds[ab] = min(end_date[ab]-dt+1, end_date[ab]-start_date[ab]+1, date[aa]-start_date[ab]+1, 90);
/* put ds[ab];*/
/* put dt=;*/
/* put date[aa] mmddyy10.;*/
/* put start_date[ab]=;*/
/* put end_date[ab]=;*/
/* put aa=;*/
cum[aa] = cum[aa] + ds[ab]*dds[ab];
/* put cum[aa]=;*/
if cum[aa] ge &cum_dose. then do;
found=1;
found_date=date[aa];
cumulative_dose=cum[aa];
goto exit;
end;
end;
end;
end;
exit:
output;
end;
i+1;
format found_date mmddyy10.;
keep ID index_date found cumulative_dose found_date first_look: last_look: bl_start:;
run;
This works for the example data and also on a subset of real data. My concern is if there are logical errors that I am not able to see. I would appreciate any suggestions to make this more elegant ("less brute force") and efficient. Thank you both for responding.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You should first use DO loops to transpose your data to a longitudinal format; if necessary, remove overlaps by summarizing per day. Then you can use the rather simple code I presented.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content