BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
anandrc
Obsidian | Level 7

Apprectiate your advise

I have a dataset where i need new flag variable for sequential or combination therapy

 

Patient Treatment Start End   Flag
           
E1 A 10-Apr-17 26-Jun-17 Seq A
E1 B 07-Jun-18 08-Aug-18 Seq A
           
E2 B 06-Sep-16 20-Oct-16 Seq B
E2 A 15-Nov-17 04-Oct-18 Seq B
           
E3 A 07-Dec-10 08-Feb-11 Seq A
E3 A 06-Sep-16 20-Oct-16 Seq A
E3 B 15-Nov-17 04-Oct-18 Seq A
           
E4 B 07-Dec-10 08-Feb-11 Seq B
E4 B 06-Sep-16 20-Oct-16 Seq B
E4 A 15-Nov-17 04-Oct-18 Seq B
           
E5 A 27-Feb-18 20-Nov-18 Combi C
E5 B 22-May-18 30-Oct-18 Combi C
           
E7 A 01-Feb-16 28-Apr-16 Seq A
E7 A 20-Apr-17 16-May-17 Seq A
E7 B 21-Aug-17 02-Jan-19 Seq A
E7 A 27-May-19 29-Jul-19 Seq A
           
E8 B 01-Feb-16 28-Apr-16 Seq B
E8 B 20-Apr-17 16-May-17 Seq B
E8 A 21-Aug-17 02-Jan-19 Seq B
E8 B 27-May-19 29-Jul-19 Seq B

 

treatment can be sequential (prev trt has ended before start of next trt)
treatment can be combination (prev trt has not ended before start of next trt)


If the patient has an end date for one prior therapy that occurs on or before the start date of another prior therapy,
then assign A or B depending on which starts first
Ex: scenarios 1,2,3 and 4

If the patient doesn’t have an end date for a prior therapy that occurs on or before the start date of another prior therapy,
then assign C.
Ex: scenario 5


Also, For scenarios 3 and 4,
we take the min (start) and max (end) per treatment for comparing.
For Ex: Patient E3, A trt has min (start) as 07-Dec-10 and max (end) as 20-Oct-16 which has ended before min(start) of B (15-Nov-17)


However scenarios 5 and 6 we need to make few exceptions.

when we take the min (start) and max (end) per treatment, they get flagged as combination although they are sequential
For Ex: Patient E7, A trt has min (start) as 01-Feb-16 and max (end) as 29-Jul-19.
So when code compares Trt A max(end) 29-Jul-19 with Trt B min(start) 21-Aug-17, it treats as A has not ended before start of B and hence flags as Combination.

Similar for Patient E8.

How to tell the program to make an exception and not count this as combination?
It should be treated as sequential

Similar exceptions should be made for
A B B A, A B B A A etc

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See if this does it:

data want;
if 0 then set have;
flag1 = "Seq  ";
do until (last.patient);
  set have;
  by patient notsorted;
  if first.patient then flag2 = treatment;
  if
    not first.patient
    and treatment ne lag(treatment)
    and (start lt lag(end) or start eq lag(start))
  then do;
    flag1 = "Combi";
    flag2 = "C";
  end;
end;
do until (last.patient);
  set have;
  by patient notsorted;
  output;
end;
run;

If not, provide an example where it fails.

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User
data have;
infile datalines dlm='09'x dsd truncover;
input
  patient $
  treatment $
  start :date9.
  end :date9.
;
format start end date9.;
datalines;
E1	A	10-Apr-17	26-Jun-17	Seq	A
E1	B	07-Jun-18	08-Aug-18	Seq	A
E2	B	06-Sep-16	20-Oct-16	Seq	B
E2	A	15-Nov-17	04-Oct-18	Seq	B
E3	A	07-Dec-10	08-Feb-11	Seq	A
E3	A	06-Sep-16	20-Oct-16	Seq	A
E3	B	15-Nov-17	04-Oct-18	Seq	A
E4	B	07-Dec-10	08-Feb-11	Seq	B
E4	B	06-Sep-16	20-Oct-16	Seq	B
E4	A	15-Nov-17	04-Oct-18	Seq	B
E5	A	27-Feb-18	20-Nov-18	Combi	C
E5	B	22-May-18	30-Oct-18	Combi	C
E7	A	01-Feb-16	28-Apr-16	Seq	A
E7	A	20-Apr-17	16-May-17	Seq	A
E7	B	21-Aug-17	02-Jan-19	Seq	A
E7	A	27-May-19	29-Jul-19	Seq	A
E8	B	01-Feb-16	28-Apr-16	Seq	B
E8	B	20-Apr-17	16-May-17	Seq	B
E8	A	21-Aug-17	02-Jan-19	Seq	B
E8	B	27-May-19	29-Jul-19	Seq	B
;

data want;
if 0 then set have;
flag1 = "Seq  ";
do until (last.patient);
  set have;
  by patient;
  if first.patient then flag2 = treatment;
  if
    not first.patient
    and treatment ne lag(treatment)
    and start lt lag(end)
  then do;
    flag1 = "Combi";
    flag2 = "C";
  end;
end;
do until (last.patient);
  set have;
  by patient;
  output;
end;
run;

Gives the same result that you show in your post.

 

Please post example data as a DATA step with DATALINES in the future, like I do here.

quickbluefish
Barite | Level 11

If your actual data are more complicated than this, e.g., many kinds of treatments or other time-varying exposures, or if you need to assess length of overlap, gaps, adherence, etc., I would recommend you convert this into a 'counting process' format wherein each row represents a period of time during which the exposure profile of a patient is static.  I use a macro for this, but there are various ways out there to do this.  Having data in this format will make it very simple to answer your questions about combination vs. sequential therapy.  It may be overkill if you really just have two drugs and simply want to know whether there was ever any overlap, of course.  Here's an example, using @Kurt_Bremser's input dataset followed by conversion into an input dataset for the macro.  Note that the startdate/enddate variables that are being created in this case are just the earliest start and latest end for each patient, but that's not required - they should instead be the start / end of follow-up for the person if that information is available.

proc datasets lib=work memtype=data nolist nodetails kill; run; quit;

data have;
infile datalines dlm='09'x dsd truncover;
input
  patient $
  treatment $
  start :date9.
  end :date9.
;
format start end date9.;
datalines;
E1	A	10-Apr-17	26-Jun-17	Seq	A
E1	B	07-Jun-18	08-Aug-18	Seq	A
E2	B	06-Sep-16	20-Oct-16	Seq	B
E2	A	15-Nov-17	04-Oct-18	Seq	B
E3	A	07-Dec-10	08-Feb-11	Seq	A
E3	A	06-Sep-16	20-Oct-16	Seq	A
E3	B	15-Nov-17	04-Oct-18	Seq	A
E4	B	07-Dec-10	08-Feb-11	Seq	B
E4	B	06-Sep-16	20-Oct-16	Seq	B
E4	A	15-Nov-17	04-Oct-18	Seq	B
E5	A	27-Feb-18	20-Nov-18	Combi	C
E5	B	22-May-18	30-Oct-18	Combi	C
E7	A	01-Feb-16	28-Apr-16	Seq	A
E7	A	20-Apr-17	16-May-17	Seq	A
E7	B	21-Aug-17	02-Jan-19	Seq	A
E7	A	27-May-19	29-Jul-19	Seq	A
E8	B	01-Feb-16	28-Apr-16	Seq	B
E8	B	20-Apr-17	16-May-17	Seq	B
E8	A	21-Aug-17	02-Jan-19	Seq	B
E8	B	27-May-19	29-Jul-19	Seq	B
;
run;

proc sql;
create table forCP as
select a.patient, a.startdate, a.enddate, 
b.treatment as event, b.start as edate length=4 format=date9.,
b.end-b.start as days length=4
from
	(select patient, min(start) as startdate length=4 format=date9.,
	max(end) as enddate length=4 format=date9. from have group by patient) A
	left join
	have B
	on a.patient=b.patient
order by a.patient, edate, event;
quit;

%include "/path/to/macro/cp.sas";

%cp(
	forCP,
	ptid=patient
	);
	
title 'first 50 obs of output data';
proc print data=cp (obs=50) width=min; run;
title;

Output from proc print looks like this -- combination therapy, in this case, are simply rows where both A and B are 1.  Length of the window is given by LEN and winstart/winend are the bounds of that window.

quickbluefish_0-1744107236200.png

 

anandrc
Obsidian | Level 7

Appreciate the response.

In this instance, i do have only 2 treatment but looks like have to introduce a 30 day overlap. Can i please know how to access cp.sas program which creates the winstart and winend etc

 

How do i ignore the first line of treatment as its more than 30 days overlap between the end of first treatment and start of second treatment and only consider second A treatment for flagging purpose

I have an instance for example E10 patient listed below -

 

Patient Treatment Start End
E10 A 13-Jul-15 21-Aug-15
E10 A 27-Apr-21 27-Apr-21
E10 B 27-Apr-21 27-Sep-21

 

Also, introduce a rule that trumps everything when it  find combination first like when the start dates match.
For Example for E9

Patient Treatment Start End
E9 A 01-Jan-21 01-Jan-21
E9 B 01-Jan-21 01-Jan-21

 

Thanks

anandrc
Obsidian | Level 7

Apologies.

 

data have;
infile datalines dlm='09'x dsd truncover;
input
patient $
treatment $
start :date9.
end :date9.
;
format start end date9.;
datalines;
E1 A 10-Apr-17 26-Jun-17 Seq A
E1 B 07-Jun-18 08-Aug-18 Seq A
E2 B 06-Sep-16 20-Oct-16 Seq B
E2 A 15-Nov-17 04-Oct-18 Seq B
E3 A 07-Dec-10 08-Feb-11 Seq A
E3 A 06-Sep-16 20-Oct-16 Seq A
E3 B 15-Nov-17 04-Oct-18 Seq A
E4 B 07-Dec-10 08-Feb-11 Seq B
E4 B 06-Sep-16 20-Oct-16 Seq B
E4 A 15-Nov-17 04-Oct-18 Seq B
E5 A 27-Feb-18 20-Nov-18 Combi C
E5 B 22-May-18 30-Oct-18 Combi C
E7 A 01-Feb-16 28-Apr-16 Seq A
E7 A 20-Apr-17 16-May-17 Seq A
E7 B 21-Aug-17 02-Jan-19 Seq A
E7 A 27-May-19 29-Jul-19 Seq A
E8 B 01-Feb-16 28-Apr-16 Seq B
E8 B 20-Apr-17 16-May-17 Seq B
E8 A 21-Aug-17 02-Jan-19 Seq B
E8 B 27-May-19 29-Jul-19 Seq B
E9 A 01-Jan-21 01-Jan-21 Combi C
E9 B 01-Jan-21 01-Jan-21 Combi C
E10 A 13-Jul-15 21-Aug-15 Combi C
E10 A 27-Apr-21 27-Apr-21 Combi C
E10 B 27-Apr-21 27-Sep-21 Combi C
;
run;

Kurt_Bremser
Super User

See if this does it:

data want;
if 0 then set have;
flag1 = "Seq  ";
do until (last.patient);
  set have;
  by patient notsorted;
  if first.patient then flag2 = treatment;
  if
    not first.patient
    and treatment ne lag(treatment)
    and (start lt lag(end) or start eq lag(start))
  then do;
    flag1 = "Combi";
    flag2 = "C";
  end;
end;
do until (last.patient);
  set have;
  by patient notsorted;
  output;
end;
run;

If not, provide an example where it fails.

quickbluefish
Barite | Level 11

This is the counting process macro I'm using:

https://github.com/Jeremy-Smith5/CEP-public/blob/main/SAS/cp.sas

...it's old, and a bit of a Rube Goldberg contraption, but works as long as you follow the instructions.  The key thing is that the things you provide in the 'EVENT' variable must themselves be named in such a way that they could be valid (version 7) variable names.  In other words, if your unique events are: DrugA, DrugB, DrugC, HospStay, Pneumonia - those are fine as names.  But Drug A, Hospital Stay, etc. will not work with the current set up.  The counting process data format, however you choose to go about creating it, is transformative for longitudinal work, esp. pharmepi, in my view.  

anandrc
Obsidian | Level 7
Thankyou for the suggestion. I will try this for treatments with more than 3+.
Really appreciate
anandrc
Obsidian | Level 7

Appreciate the response. Very useful.

Rules are assiging the correct flag, but do have couple of scenarios to consider. Apologies, did not foresee these exceptions

For ex, in the below scenario, current code flags it as A, but when the start dates match, We need put a rule before that trumps it to find combination first and Flag it a C

Patient Treatment Start End
E9 A 01-Jan-21 01-Jan-21
E9 B 01-Jan-21 01-Jan-21

 

For second scenario,

looks like i have to introduce a 30 day overlap. 

In the ex below, I need to ignore the first line of treatement as the overlap is more than 30 days between end of first treatment and start of second treatment and only consider second A treatment for flagging purpose. Current code flags it as A but if we ignore the first line of treatment, as start dates match it should be combination C.

Patient Treatment Start End
E10 A 13-Jul-15 21-Aug-15
E10 A 27-Apr-21 27-Apr-21
E10 B 27-Apr-21 27-Sep-21
mkeintz
PROC Star

You can set up a HISTORY array (one element per date from the earliest possible to latest possible date).  Pass through each patient twice.  Initialize each patient to class='Seq  '  and flag=treatment of the first record.

 

During the first pass, update the history array.  If a date is encountered that has more than one treatment, set class to 'COMBI' and flag to 'C', ... and stop monitoring dates - you won't be going back from Combi to Seq.

 

During the second pass, do nothing but permit the observations to be output, using the CLASS and FLAG values retained from the first pass:

 

data have;
infile datalines ;;
input
  patient $2.  treatment :$1.  start :date9.  end :date9.   
     _class :$5.   _flag :$1. ;
format start end date9.;
datalines;
E1 A 10Apr2017 26Jun2017 Seq A
E1 B 07Jun2018 08Aug2018 Seq A
E2 B 06Sep2016 20Oct2016 Seq B
E2 A 15Nov2017 04Oct2018 Seq B
E3 A 07Dec2010 08Feb2011 Seq A
E3 A 06Sep2016 20Oct2016 Seq A
E3 B 15Nov2017 04Oct2018 Seq A
E4 B 07Dec2010 08Feb2011 Seq B
E4 B 06Sep2016 20Oct2016 Seq B
E4 A 15Nov2017 04Oct2018 Seq B
E5 A 27Feb2018 20Nov2018 Combi C
E5 B 22May2018 30Oct2018 Combi C
E7 A 01Feb2016 28Apr2016 Seq A
E7 A 20Apr2017 16May2017 Seq A
E7 B 21Aug2017 02Jan2019 Seq A
E7 A 27May2019 29Jul2019 Seq A
E8 B 01Feb2016 28Apr2016 Seq B
E8 B 20Apr2017 16May2017 Seq B
E8 A 21Aug2017 02Jan2019 Seq B
E8 B 27May2019 29Jul2019 Seq B
run;

%let beg=01jan2010;
%let end=31dec2019;
data want (drop=d);
  set have (in=firstpass) have (in=secondpass);
  by patient;

  retain class '     ' Flag ' ' ;
  array history {%sysevalf("&beg"d):%sysevalf("&end"d)}  _temporary_;

  if first.patient then do;
    call missing(of history{*});
    class='Seq  ';
    flag=treatment;
  end;

  if firstpass=1 and class='Seq' then do d=start to end while (class='Seq');
    history{d}+1;
    if history{d}>1 then do;
      class='Combi';
      flag='C';
    end;
  end;
  if secondpass;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
anandrc
Obsidian | Level 7
Thankyou for the suggestion. I will save this and try this solution

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2189 views
  • 0 likes
  • 4 in conversation