Apprectiate your advise
I have a dataset where i need new flag variable for sequential or combination therapy
Patient | Treatment | Start | End | Flag | |
E1 | A | 10-Apr-17 | 26-Jun-17 | Seq | A |
E1 | B | 07-Jun-18 | 08-Aug-18 | Seq | A |
E2 | B | 06-Sep-16 | 20-Oct-16 | Seq | B |
E2 | A | 15-Nov-17 | 04-Oct-18 | Seq | B |
E3 | A | 07-Dec-10 | 08-Feb-11 | Seq | A |
E3 | A | 06-Sep-16 | 20-Oct-16 | Seq | A |
E3 | B | 15-Nov-17 | 04-Oct-18 | Seq | A |
E4 | B | 07-Dec-10 | 08-Feb-11 | Seq | B |
E4 | B | 06-Sep-16 | 20-Oct-16 | Seq | B |
E4 | A | 15-Nov-17 | 04-Oct-18 | Seq | B |
E5 | A | 27-Feb-18 | 20-Nov-18 | Combi | C |
E5 | B | 22-May-18 | 30-Oct-18 | Combi | C |
E7 | A | 01-Feb-16 | 28-Apr-16 | Seq | A |
E7 | A | 20-Apr-17 | 16-May-17 | Seq | A |
E7 | B | 21-Aug-17 | 02-Jan-19 | Seq | A |
E7 | A | 27-May-19 | 29-Jul-19 | Seq | A |
E8 | B | 01-Feb-16 | 28-Apr-16 | Seq | B |
E8 | B | 20-Apr-17 | 16-May-17 | Seq | B |
E8 | A | 21-Aug-17 | 02-Jan-19 | Seq | B |
E8 | B | 27-May-19 | 29-Jul-19 | Seq | B |
treatment can be sequential (prev trt has ended before start of next trt)
treatment can be combination (prev trt has not ended before start of next trt)
If the patient has an end date for one prior therapy that occurs on or before the start date of another prior therapy,
then assign A or B depending on which starts first
Ex: scenarios 1,2,3 and 4
If the patient doesn’t have an end date for a prior therapy that occurs on or before the start date of another prior therapy,
then assign C.
Ex: scenario 5
Also, For scenarios 3 and 4,
we take the min (start) and max (end) per treatment for comparing.
For Ex: Patient E3, A trt has min (start) as 07-Dec-10 and max (end) as 20-Oct-16 which has ended before min(start) of B (15-Nov-17)
However scenarios 5 and 6 we need to make few exceptions.
when we take the min (start) and max (end) per treatment, they get flagged as combination although they are sequential
For Ex: Patient E7, A trt has min (start) as 01-Feb-16 and max (end) as 29-Jul-19.
So when code compares Trt A max(end) 29-Jul-19 with Trt B min(start) 21-Aug-17, it treats as A has not ended before start of B and hence flags as Combination.
Similar for Patient E8.
How to tell the program to make an exception and not count this as combination?
It should be treated as sequential
Similar exceptions should be made for
A B B A, A B B A A etc
See if this does it:
data want;
if 0 then set have;
flag1 = "Seq ";
do until (last.patient);
set have;
by patient notsorted;
if first.patient then flag2 = treatment;
if
not first.patient
and treatment ne lag(treatment)
and (start lt lag(end) or start eq lag(start))
then do;
flag1 = "Combi";
flag2 = "C";
end;
end;
do until (last.patient);
set have;
by patient notsorted;
output;
end;
run;
If not, provide an example where it fails.
data have;
infile datalines dlm='09'x dsd truncover;
input
patient $
treatment $
start :date9.
end :date9.
;
format start end date9.;
datalines;
E1 A 10-Apr-17 26-Jun-17 Seq A
E1 B 07-Jun-18 08-Aug-18 Seq A
E2 B 06-Sep-16 20-Oct-16 Seq B
E2 A 15-Nov-17 04-Oct-18 Seq B
E3 A 07-Dec-10 08-Feb-11 Seq A
E3 A 06-Sep-16 20-Oct-16 Seq A
E3 B 15-Nov-17 04-Oct-18 Seq A
E4 B 07-Dec-10 08-Feb-11 Seq B
E4 B 06-Sep-16 20-Oct-16 Seq B
E4 A 15-Nov-17 04-Oct-18 Seq B
E5 A 27-Feb-18 20-Nov-18 Combi C
E5 B 22-May-18 30-Oct-18 Combi C
E7 A 01-Feb-16 28-Apr-16 Seq A
E7 A 20-Apr-17 16-May-17 Seq A
E7 B 21-Aug-17 02-Jan-19 Seq A
E7 A 27-May-19 29-Jul-19 Seq A
E8 B 01-Feb-16 28-Apr-16 Seq B
E8 B 20-Apr-17 16-May-17 Seq B
E8 A 21-Aug-17 02-Jan-19 Seq B
E8 B 27-May-19 29-Jul-19 Seq B
;
data want;
if 0 then set have;
flag1 = "Seq ";
do until (last.patient);
set have;
by patient;
if first.patient then flag2 = treatment;
if
not first.patient
and treatment ne lag(treatment)
and start lt lag(end)
then do;
flag1 = "Combi";
flag2 = "C";
end;
end;
do until (last.patient);
set have;
by patient;
output;
end;
run;
Gives the same result that you show in your post.
Please post example data as a DATA step with DATALINES in the future, like I do here.
If your actual data are more complicated than this, e.g., many kinds of treatments or other time-varying exposures, or if you need to assess length of overlap, gaps, adherence, etc., I would recommend you convert this into a 'counting process' format wherein each row represents a period of time during which the exposure profile of a patient is static. I use a macro for this, but there are various ways out there to do this. Having data in this format will make it very simple to answer your questions about combination vs. sequential therapy. It may be overkill if you really just have two drugs and simply want to know whether there was ever any overlap, of course. Here's an example, using @Kurt_Bremser's input dataset followed by conversion into an input dataset for the macro. Note that the startdate/enddate variables that are being created in this case are just the earliest start and latest end for each patient, but that's not required - they should instead be the start / end of follow-up for the person if that information is available.
proc datasets lib=work memtype=data nolist nodetails kill; run; quit;
data have;
infile datalines dlm='09'x dsd truncover;
input
patient $
treatment $
start :date9.
end :date9.
;
format start end date9.;
datalines;
E1 A 10-Apr-17 26-Jun-17 Seq A
E1 B 07-Jun-18 08-Aug-18 Seq A
E2 B 06-Sep-16 20-Oct-16 Seq B
E2 A 15-Nov-17 04-Oct-18 Seq B
E3 A 07-Dec-10 08-Feb-11 Seq A
E3 A 06-Sep-16 20-Oct-16 Seq A
E3 B 15-Nov-17 04-Oct-18 Seq A
E4 B 07-Dec-10 08-Feb-11 Seq B
E4 B 06-Sep-16 20-Oct-16 Seq B
E4 A 15-Nov-17 04-Oct-18 Seq B
E5 A 27-Feb-18 20-Nov-18 Combi C
E5 B 22-May-18 30-Oct-18 Combi C
E7 A 01-Feb-16 28-Apr-16 Seq A
E7 A 20-Apr-17 16-May-17 Seq A
E7 B 21-Aug-17 02-Jan-19 Seq A
E7 A 27-May-19 29-Jul-19 Seq A
E8 B 01-Feb-16 28-Apr-16 Seq B
E8 B 20-Apr-17 16-May-17 Seq B
E8 A 21-Aug-17 02-Jan-19 Seq B
E8 B 27-May-19 29-Jul-19 Seq B
;
run;
proc sql;
create table forCP as
select a.patient, a.startdate, a.enddate,
b.treatment as event, b.start as edate length=4 format=date9.,
b.end-b.start as days length=4
from
(select patient, min(start) as startdate length=4 format=date9.,
max(end) as enddate length=4 format=date9. from have group by patient) A
left join
have B
on a.patient=b.patient
order by a.patient, edate, event;
quit;
%include "/path/to/macro/cp.sas";
%cp(
forCP,
ptid=patient
);
title 'first 50 obs of output data';
proc print data=cp (obs=50) width=min; run;
title;
Output from proc print looks like this -- combination therapy, in this case, are simply rows where both A and B are 1. Length of the window is given by LEN and winstart/winend are the bounds of that window.
Appreciate the response.
In this instance, i do have only 2 treatment but looks like have to introduce a 30 day overlap. Can i please know how to access cp.sas program which creates the winstart and winend etc
How do i ignore the first line of treatment as its more than 30 days overlap between the end of first treatment and start of second treatment and only consider second A treatment for flagging purpose
I have an instance for example E10 patient listed below -
Patient | Treatment | Start | End |
E10 | A | 13-Jul-15 | 21-Aug-15 |
E10 | A | 27-Apr-21 | 27-Apr-21 |
E10 | B | 27-Apr-21 | 27-Sep-21 |
Also, introduce a rule that trumps everything when it find combination first like when the start dates match.
For Example for E9
Patient | Treatment | Start | End |
E9 | A | 01-Jan-21 | 01-Jan-21 |
E9 | B | 01-Jan-21 | 01-Jan-21 |
Thanks
Quote from myself:
Please post example data as a DATA step with DATALINES in the future
Apologies.
data have;
infile datalines dlm='09'x dsd truncover;
input
patient $
treatment $
start :date9.
end :date9.
;
format start end date9.;
datalines;
E1 A 10-Apr-17 26-Jun-17 Seq A
E1 B 07-Jun-18 08-Aug-18 Seq A
E2 B 06-Sep-16 20-Oct-16 Seq B
E2 A 15-Nov-17 04-Oct-18 Seq B
E3 A 07-Dec-10 08-Feb-11 Seq A
E3 A 06-Sep-16 20-Oct-16 Seq A
E3 B 15-Nov-17 04-Oct-18 Seq A
E4 B 07-Dec-10 08-Feb-11 Seq B
E4 B 06-Sep-16 20-Oct-16 Seq B
E4 A 15-Nov-17 04-Oct-18 Seq B
E5 A 27-Feb-18 20-Nov-18 Combi C
E5 B 22-May-18 30-Oct-18 Combi C
E7 A 01-Feb-16 28-Apr-16 Seq A
E7 A 20-Apr-17 16-May-17 Seq A
E7 B 21-Aug-17 02-Jan-19 Seq A
E7 A 27-May-19 29-Jul-19 Seq A
E8 B 01-Feb-16 28-Apr-16 Seq B
E8 B 20-Apr-17 16-May-17 Seq B
E8 A 21-Aug-17 02-Jan-19 Seq B
E8 B 27-May-19 29-Jul-19 Seq B
E9 A 01-Jan-21 01-Jan-21 Combi C
E9 B 01-Jan-21 01-Jan-21 Combi C
E10 A 13-Jul-15 21-Aug-15 Combi C
E10 A 27-Apr-21 27-Apr-21 Combi C
E10 B 27-Apr-21 27-Sep-21 Combi C
;
run;
See if this does it:
data want;
if 0 then set have;
flag1 = "Seq ";
do until (last.patient);
set have;
by patient notsorted;
if first.patient then flag2 = treatment;
if
not first.patient
and treatment ne lag(treatment)
and (start lt lag(end) or start eq lag(start))
then do;
flag1 = "Combi";
flag2 = "C";
end;
end;
do until (last.patient);
set have;
by patient notsorted;
output;
end;
run;
If not, provide an example where it fails.
This is the counting process macro I'm using:
https://github.com/Jeremy-Smith5/CEP-public/blob/main/SAS/cp.sas
...it's old, and a bit of a Rube Goldberg contraption, but works as long as you follow the instructions. The key thing is that the things you provide in the 'EVENT' variable must themselves be named in such a way that they could be valid (version 7) variable names. In other words, if your unique events are: DrugA, DrugB, DrugC, HospStay, Pneumonia - those are fine as names. But Drug A, Hospital Stay, etc. will not work with the current set up. The counting process data format, however you choose to go about creating it, is transformative for longitudinal work, esp. pharmepi, in my view.
Appreciate the response. Very useful.
Rules are assiging the correct flag, but do have couple of scenarios to consider. Apologies, did not foresee these exceptions
For ex, in the below scenario, current code flags it as A, but when the start dates match, We need put a rule before that trumps it to find combination first and Flag it a C
Patient | Treatment | Start | End |
E9 | A | 01-Jan-21 | 01-Jan-21 |
E9 | B | 01-Jan-21 | 01-Jan-21 |
For second scenario,
looks like i have to introduce a 30 day overlap.
In the ex below, I need to ignore the first line of treatement as the overlap is more than 30 days between end of first treatment and start of second treatment and only consider second A treatment for flagging purpose. Current code flags it as A but if we ignore the first line of treatment, as start dates match it should be combination C.
Patient | Treatment | Start | End |
E10 | A | 13-Jul-15 | 21-Aug-15 |
E10 | A | 27-Apr-21 | 27-Apr-21 |
E10 | B | 27-Apr-21 | 27-Sep-21 |
You can set up a HISTORY array (one element per date from the earliest possible to latest possible date). Pass through each patient twice. Initialize each patient to class='Seq ' and flag=treatment of the first record.
During the first pass, update the history array. If a date is encountered that has more than one treatment, set class to 'COMBI' and flag to 'C', ... and stop monitoring dates - you won't be going back from Combi to Seq.
During the second pass, do nothing but permit the observations to be output, using the CLASS and FLAG values retained from the first pass:
data have;
infile datalines ;;
input
patient $2. treatment :$1. start :date9. end :date9.
_class :$5. _flag :$1. ;
format start end date9.;
datalines;
E1 A 10Apr2017 26Jun2017 Seq A
E1 B 07Jun2018 08Aug2018 Seq A
E2 B 06Sep2016 20Oct2016 Seq B
E2 A 15Nov2017 04Oct2018 Seq B
E3 A 07Dec2010 08Feb2011 Seq A
E3 A 06Sep2016 20Oct2016 Seq A
E3 B 15Nov2017 04Oct2018 Seq A
E4 B 07Dec2010 08Feb2011 Seq B
E4 B 06Sep2016 20Oct2016 Seq B
E4 A 15Nov2017 04Oct2018 Seq B
E5 A 27Feb2018 20Nov2018 Combi C
E5 B 22May2018 30Oct2018 Combi C
E7 A 01Feb2016 28Apr2016 Seq A
E7 A 20Apr2017 16May2017 Seq A
E7 B 21Aug2017 02Jan2019 Seq A
E7 A 27May2019 29Jul2019 Seq A
E8 B 01Feb2016 28Apr2016 Seq B
E8 B 20Apr2017 16May2017 Seq B
E8 A 21Aug2017 02Jan2019 Seq B
E8 B 27May2019 29Jul2019 Seq B
run;
%let beg=01jan2010;
%let end=31dec2019;
data want (drop=d);
set have (in=firstpass) have (in=secondpass);
by patient;
retain class ' ' Flag ' ' ;
array history {%sysevalf("&beg"d):%sysevalf("&end"d)} _temporary_;
if first.patient then do;
call missing(of history{*});
class='Seq ';
flag=treatment;
end;
if firstpass=1 and class='Seq' then do d=start to end while (class='Seq');
history{d}+1;
if history{d}>1 then do;
class='Combi';
flag='C';
end;
end;
if secondpass;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.