BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hendrixl114
Calcite | Level 5

I have a dataset containing dates of treatment for cancer patients.  A patient can have radiation, surgery, brachytherapy, hormone treatment, or no treatment. I need to find the first treatment that occurred in a time window of diagnosis date + 180 days to diagnosis date + 3 years, then determine the sequence of all other 'first' treatments that occurred within 1 year after the first treatment and construct a variable that indicates the treatment sequence within this time window.. 

My dataset looks like this:

patient ID     dxdate          first_rtdate          first_surgdate              first_btdate         first_htdate    

1                01JAN2008     01JUN2008                                                  .                  01MAY2008

2               01MAY2004     01MAY2009         15JUN2004               01SEP2004          01JUN2004

3               01SEP2003     01MAR2004           01DEC2003                                        15SEP2003

4               01JUL2005                                                                                             15JUL2005

5.              01OCT2001                                   01OCT2007

6.              01JAN2002     15JUL2002               15JUL2007

7.              01NOV2009     01DEC2009               01DEC2013

What I need to end up with is a variable for treatment sequence, including the first treatment, within this first treatment-365day window that looks like

patient ID     txseq

1                    HT,RT

2                    HT,SURG,BT

3                    HT,SURG,RT

4                   HT

5.                  NO TREATMENT

6.                  RT

7.                  NO TREATMENT

I'd be very grateful for any help!

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It is more clear . If I understand what you mean ,this time .

data have;
input ID     dxdate   : date9.       first_rtdate     : date9.       first_surgdate    : date9.            first_btdate   : date9.        first_htdate  : date9.  ;
format  dxdate first_rtdate         first_surgdate       first_btdate      first_htdate   date9.  ;
cards; 
1                01JAN2008     01AUG2008    .             .                  01JUL2008
2               01MAY2004     01NOV2009         01DEC2004               15DEC2004          01NOV2004
3               01SEP2003     01FEB2005           01DEC2004           .          01APR2004
4               01JUL2005       .               01SEP2005           .                01FEB2006
5              01OCT2001      .               01OCT2007         .  .
6              01JAN2002     01JUL2002               15JUL2007   .  .
7              01NOV2009     01DEC2009               01DEC2013   .  .
;
run;
data temp(keep=id dxdate vname date start end);
 set have;
 array x{*}  first_rtdate         first_surgdate       first_btdate      first_htdate;
 start=dxdate+180; end=intnx('year',dxdate,3,'s');
 do i=1 to dim(x);
  if not missing(x{i}) then do;
    vname=vname(x{i});date=x{i};  output;
  end;
 end;
 format date start end date9.;
run;
data temp1(where=(dif is not missing));
 set temp; 
 if start le date le end then dif=date-dxdate;
run;

proc sort data=temp1 ; by id  dif;run;
data temp2(keep=id date rename=(date=_date));
 set temp1;
 by id;
 if first.id;
run;
data temp3;
 merge temp temp2;
 by id;
 if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ;
run;
proc sort data=temp3;by id date;run;
data temp4(keep=id want);
 set temp3;
 by id ;
 length want $ 100;
 retain want;
 want=catx(',',want,substr(scan(vname,-1,'_'),1,2) );
 if last.id then do;output;call missing(want);end;
run;
data want;
 merge have(keep=id) temp4 ;
 by id ;
 if missing(want) then want='NO TREATMENT';
run;

Xia Keshan

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Should be straightforward, five if statements:

data have;

patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;   

run;

data want;

  set have;

  length txseq $200;

  if first_htdate ne . and dxdate <= first_htdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"HT");

  if first_surgdate ne . and dxdate <= first_surgdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"SURG");

  /* Same for other columns */

  if strip(txseq)="" then txseq="NO TREATMENT";

run;

Ksharp
Super User

I don't understand why id=7 has no treatment ? should be RT ?

data have;
input ID     dxdate   : date9.       first_rtdate     : date9.       first_surgdate    : date9.            first_btdate   : date9.        first_htdate  : date9.  ;
format  dxdate first_rtdate         first_surgdate       first_btdate      first_htdate   date9.  ;
cards; 
1                01JAN2008     01JUN2008            .                        .                       01MAY2008
2               01MAY2004     01MAY2009         15JUN2004               01SEP2004          01JUN2004
3               01SEP2003     01MAR2004           01DEC2003                   .                     15SEP2003
4               01JUL2005          .                  .                       .                15JUL2005
5.              01OCT2001          .                 01OCT2007          .     .
6.              01JAN2002     15JUL2002               15JUL2007      .          .
7.              01NOV2009     01DEC2009               01DEC2013          .        .
;
run;
data temp(keep=id dxdate vname date );
 set have;
 array x{*}  first_rtdate         first_surgdate       first_btdate      first_htdate;
 do i=1 to dim(x);
  if not missing(x{i}) then do;
    vname=vname(x{i});date=x{i}; if x{i} le  dxdate+180 then output;
  end;
 end;
 format date start end date9.;
run;
data temp;
 set temp; 
  dif=date-dxdate;
run;

proc sort data=temp ; by id  date;run;

data want(keep=id want);
 set temp;
 by id ;
 length want $ 100;
 retain want;
 want=catx(',',want,substr(scan(vname,-1,'_'),1,2) );
 if last.id then do;output;call missing(want);end;
run;
data want;
 merge have(keep=id) want ;
 by id ;
 if missing(want) then want='NO TREATMENT';
run;



Xia Keshan

hendrixl114
Calcite | Level 5

Thanks for your help! 

Patient 7 has no treatment because the only treatment date, for RT, was only 1 month after diagnosis.  The earliest treatment date in my 'window' should be dxdate +180.  I am only interested in treatment dates dx+180 days to dxdate +3 years.

In other words, the 'initial' window is dx+180<txdate<3yrs.  Once I determine the first treatment within that window, I need to set another window, for all subsequent treatments within 1 year of that first treatment date.  Then I need to create the txseq variable (which will include the initial treatment date).

Laura

Ksharp
Super User

So id=1 should not fit your condition , right ?

dxdate   =01JAN2008  

dxdate  +180=29JUN2008 

dxdate  +3yrs=01JAN2011 

  01JUN2008 is not in this range . Can you explain it ?

hendrixl114
Calcite | Level 5

Yes, sorry!  Those dates are for a 3-month window!  Let me do this again, changing the dates to be correct this time...i think these are right now

patient ID     dxdate          first_rtdate          first_surgdate              first_btdate         first_htdate    

1                01JAN2008     01AUG2008                                                  .                  01JUL2008

2               01MAY2004     01NOV2009         01DEC2004               15DEC2004          01NOV2004

3               01SEP2003     01FEB2005           01DEC2004                                        01APR2004

4               01JUL2005                                  01SEP2005                                         01FEB2006

5.              01OCT2001                                   01OCT2007

6.              01JAN2002     01JUL2002               15JUL2007

7.              01NOV2009     01DEC2009               01DEC2013

patient ID     txseq

1                    HT,RT

2                    HT,SURG,BT

3                    HT,SURG,RT

4                   HT

5.                  NO TREATMENT

6.                  RT

7.                  NO TREATMENT

Ksharp
Super User

It is more clear . If I understand what you mean ,this time .

data have;
input ID     dxdate   : date9.       first_rtdate     : date9.       first_surgdate    : date9.            first_btdate   : date9.        first_htdate  : date9.  ;
format  dxdate first_rtdate         first_surgdate       first_btdate      first_htdate   date9.  ;
cards; 
1                01JAN2008     01AUG2008    .             .                  01JUL2008
2               01MAY2004     01NOV2009         01DEC2004               15DEC2004          01NOV2004
3               01SEP2003     01FEB2005           01DEC2004           .          01APR2004
4               01JUL2005       .               01SEP2005           .                01FEB2006
5              01OCT2001      .               01OCT2007         .  .
6              01JAN2002     01JUL2002               15JUL2007   .  .
7              01NOV2009     01DEC2009               01DEC2013   .  .
;
run;
data temp(keep=id dxdate vname date start end);
 set have;
 array x{*}  first_rtdate         first_surgdate       first_btdate      first_htdate;
 start=dxdate+180; end=intnx('year',dxdate,3,'s');
 do i=1 to dim(x);
  if not missing(x{i}) then do;
    vname=vname(x{i});date=x{i};  output;
  end;
 end;
 format date start end date9.;
run;
data temp1(where=(dif is not missing));
 set temp; 
 if start le date le end then dif=date-dxdate;
run;

proc sort data=temp1 ; by id  dif;run;
data temp2(keep=id date rename=(date=_date));
 set temp1;
 by id;
 if first.id;
run;
data temp3;
 merge temp temp2;
 by id;
 if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ;
run;
proc sort data=temp3;by id date;run;
data temp4(keep=id want);
 set temp3;
 by id ;
 length want $ 100;
 retain want;
 want=catx(',',want,substr(scan(vname,-1,'_'),1,2) );
 if last.id then do;output;call missing(want);end;
run;
data want;
 merge have(keep=id) temp4 ;
 by id ;
 if missing(want) then want='NO TREATMENT';
run;

Xia Keshan

hendrixl114
Calcite | Level 5

This works!!! Thank you so much, you have no idea how much I appreciate your help!!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Errm, that doesn't match you required output then. 

patient ID     dxdate          first_rtdate          first_surgdate              first_btdate         first_htdate    

1                01JAN2008     01JUN2008                                                  .                  01MAY2008

DXDATE + 180 = 29JUN2008, therefore no treatments within the window DXDATE + 180 <= date < DSDATE + 3 years?

Anyways, I am leaving soon so will leave you with where I got to.  I started by normalizing (reverse transpose) the data, this gives one column so you can order the dates and drop missing.  Its then a matter of going through line by line and checking your window.  When a window is hit, set dxdate to the limit for next window.  You can then just rollup the results.

data have;

patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;   

run;

proc transpose data=have out=t_have;

  by patient_id dxdate;

  var first_rtdate first_surgdate first_btdate first_htdate;

run;

proc sort data=t_have;

  by col1;

  where col1 ne .;

run;

data want;

  set t_have;

  format dxdate col1 date9.;

  retain dxdate;

  if dxdate + 180 <= col1 < intnx('years',dxdate+180,3,'same') then row="Window1";

run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Health and Life Sciences Learning

 

Need courses to help you with SAS Life Sciences Analytics Framework, SAS Health Cohort Builder, or other topics? Check out the Health and Life Sciences learning path for all of the offerings.

LEARN MORE

Discussion stats
  • 8 replies
  • 4824 views
  • 0 likes
  • 3 in conversation