BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hendrixl114
Calcite | Level 5

I have a dataset containing dates of treatment for cancer patients.  A patient can have radiation, surgery, brachytherapy, hormone treatment, or no treatment. I need to find the first treatment that occurred in a time window of diagnosis date + 180 days to diagnosis date + 3 years, then determine the sequence of all other 'first' treatments that occurred within 1 year after the first treatment and construct a variable that indicates the treatment sequence within this time window.. 

My dataset looks like this:

patient ID     dxdate          first_rtdate          first_surgdate              first_btdate         first_htdate    

1                01JAN2008     01JUN2008                                                  .                  01MAY2008

2               01MAY2004     01MAY2009         15JUN2004               01SEP2004          01JUN2004

3               01SEP2003     01MAR2004           01DEC2003                                        15SEP2003

4               01JUL2005                                                                                             15JUL2005

5.              01OCT2001                                   01OCT2007

6.              01JAN2002     15JUL2002               15JUL2007

7.              01NOV2009     01DEC2009               01DEC2013

What I need to end up with is a variable for treatment sequence, including the first treatment, within this first treatment-365day window that looks like

patient ID     txseq

1                    HT,RT

2                    HT,SURG,BT

3                    HT,SURG,RT

4                   HT

5.                  NO TREATMENT

6.                  RT

7.                  NO TREATMENT

I'd be very grateful for any help!

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It is more clear . If I understand what you mean ,this time .

data have;
input ID     dxdate   : date9.       first_rtdate     : date9.       first_surgdate    : date9.            first_btdate   : date9.        first_htdate  : date9.  ;
format  dxdate first_rtdate         first_surgdate       first_btdate      first_htdate   date9.  ;
cards; 
1                01JAN2008     01AUG2008    .             .                  01JUL2008
2               01MAY2004     01NOV2009         01DEC2004               15DEC2004          01NOV2004
3               01SEP2003     01FEB2005           01DEC2004           .          01APR2004
4               01JUL2005       .               01SEP2005           .                01FEB2006
5              01OCT2001      .               01OCT2007         .  .
6              01JAN2002     01JUL2002               15JUL2007   .  .
7              01NOV2009     01DEC2009               01DEC2013   .  .
;
run;
data temp(keep=id dxdate vname date start end);
 set have;
 array x{*}  first_rtdate         first_surgdate       first_btdate      first_htdate;
 start=dxdate+180; end=intnx('year',dxdate,3,'s');
 do i=1 to dim(x);
  if not missing(x{i}) then do;
    vname=vname(x{i});date=x{i};  output;
  end;
 end;
 format date start end date9.;
run;
data temp1(where=(dif is not missing));
 set temp; 
 if start le date le end then dif=date-dxdate;
run;

proc sort data=temp1 ; by id  dif;run;
data temp2(keep=id date rename=(date=_date));
 set temp1;
 by id;
 if first.id;
run;
data temp3;
 merge temp temp2;
 by id;
 if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ;
run;
proc sort data=temp3;by id date;run;
data temp4(keep=id want);
 set temp3;
 by id ;
 length want $ 100;
 retain want;
 want=catx(',',want,substr(scan(vname,-1,'_'),1,2) );
 if last.id then do;output;call missing(want);end;
run;
data want;
 merge have(keep=id) temp4 ;
 by id ;
 if missing(want) then want='NO TREATMENT';
run;

Xia Keshan

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Should be straightforward, five if statements:

data have;

patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;   

run;

data want;

  set have;

  length txseq $200;

  if first_htdate ne . and dxdate <= first_htdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"HT");

  if first_surgdate ne . and dxdate <= first_surgdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"SURG");

  /* Same for other columns */

  if strip(txseq)="" then txseq="NO TREATMENT";

run;

Ksharp
Super User

I don't understand why id=7 has no treatment ? should be RT ?

data have;
input ID     dxdate   : date9.       first_rtdate     : date9.       first_surgdate    : date9.            first_btdate   : date9.        first_htdate  : date9.  ;
format  dxdate first_rtdate         first_surgdate       first_btdate      first_htdate   date9.  ;
cards; 
1                01JAN2008     01JUN2008            .                        .                       01MAY2008
2               01MAY2004     01MAY2009         15JUN2004               01SEP2004          01JUN2004
3               01SEP2003     01MAR2004           01DEC2003                   .                     15SEP2003
4               01JUL2005          .                  .                       .                15JUL2005
5.              01OCT2001          .                 01OCT2007          .     .
6.              01JAN2002     15JUL2002               15JUL2007      .          .
7.              01NOV2009     01DEC2009               01DEC2013          .        .
;
run;
data temp(keep=id dxdate vname date );
 set have;
 array x{*}  first_rtdate         first_surgdate       first_btdate      first_htdate;
 do i=1 to dim(x);
  if not missing(x{i}) then do;
    vname=vname(x{i});date=x{i}; if x{i} le  dxdate+180 then output;
  end;
 end;
 format date start end date9.;
run;
data temp;
 set temp; 
  dif=date-dxdate;
run;

proc sort data=temp ; by id  date;run;

data want(keep=id want);
 set temp;
 by id ;
 length want $ 100;
 retain want;
 want=catx(',',want,substr(scan(vname,-1,'_'),1,2) );
 if last.id then do;output;call missing(want);end;
run;
data want;
 merge have(keep=id) want ;
 by id ;
 if missing(want) then want='NO TREATMENT';
run;



Xia Keshan

hendrixl114
Calcite | Level 5

Thanks for your help! 

Patient 7 has no treatment because the only treatment date, for RT, was only 1 month after diagnosis.  The earliest treatment date in my 'window' should be dxdate +180.  I am only interested in treatment dates dx+180 days to dxdate +3 years.

In other words, the 'initial' window is dx+180<txdate<3yrs.  Once I determine the first treatment within that window, I need to set another window, for all subsequent treatments within 1 year of that first treatment date.  Then I need to create the txseq variable (which will include the initial treatment date).

Laura

Ksharp
Super User

So id=1 should not fit your condition , right ?

dxdate   =01JAN2008  

dxdate  +180=29JUN2008 

dxdate  +3yrs=01JAN2011 

  01JUN2008 is not in this range . Can you explain it ?

hendrixl114
Calcite | Level 5

Yes, sorry!  Those dates are for a 3-month window!  Let me do this again, changing the dates to be correct this time...i think these are right now

patient ID     dxdate          first_rtdate          first_surgdate              first_btdate         first_htdate    

1                01JAN2008     01AUG2008                                                  .                  01JUL2008

2               01MAY2004     01NOV2009         01DEC2004               15DEC2004          01NOV2004

3               01SEP2003     01FEB2005           01DEC2004                                        01APR2004

4               01JUL2005                                  01SEP2005                                         01FEB2006

5.              01OCT2001                                   01OCT2007

6.              01JAN2002     01JUL2002               15JUL2007

7.              01NOV2009     01DEC2009               01DEC2013

patient ID     txseq

1                    HT,RT

2                    HT,SURG,BT

3                    HT,SURG,RT

4                   HT

5.                  NO TREATMENT

6.                  RT

7.                  NO TREATMENT

Ksharp
Super User

It is more clear . If I understand what you mean ,this time .

data have;
input ID     dxdate   : date9.       first_rtdate     : date9.       first_surgdate    : date9.            first_btdate   : date9.        first_htdate  : date9.  ;
format  dxdate first_rtdate         first_surgdate       first_btdate      first_htdate   date9.  ;
cards; 
1                01JAN2008     01AUG2008    .             .                  01JUL2008
2               01MAY2004     01NOV2009         01DEC2004               15DEC2004          01NOV2004
3               01SEP2003     01FEB2005           01DEC2004           .          01APR2004
4               01JUL2005       .               01SEP2005           .                01FEB2006
5              01OCT2001      .               01OCT2007         .  .
6              01JAN2002     01JUL2002               15JUL2007   .  .
7              01NOV2009     01DEC2009               01DEC2013   .  .
;
run;
data temp(keep=id dxdate vname date start end);
 set have;
 array x{*}  first_rtdate         first_surgdate       first_btdate      first_htdate;
 start=dxdate+180; end=intnx('year',dxdate,3,'s');
 do i=1 to dim(x);
  if not missing(x{i}) then do;
    vname=vname(x{i});date=x{i};  output;
  end;
 end;
 format date start end date9.;
run;
data temp1(where=(dif is not missing));
 set temp; 
 if start le date le end then dif=date-dxdate;
run;

proc sort data=temp1 ; by id  dif;run;
data temp2(keep=id date rename=(date=_date));
 set temp1;
 by id;
 if first.id;
run;
data temp3;
 merge temp temp2;
 by id;
 if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ;
run;
proc sort data=temp3;by id date;run;
data temp4(keep=id want);
 set temp3;
 by id ;
 length want $ 100;
 retain want;
 want=catx(',',want,substr(scan(vname,-1,'_'),1,2) );
 if last.id then do;output;call missing(want);end;
run;
data want;
 merge have(keep=id) temp4 ;
 by id ;
 if missing(want) then want='NO TREATMENT';
run;

Xia Keshan

hendrixl114
Calcite | Level 5

This works!!! Thank you so much, you have no idea how much I appreciate your help!!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Errm, that doesn't match you required output then. 

patient ID     dxdate          first_rtdate          first_surgdate              first_btdate         first_htdate    

1                01JAN2008     01JUN2008                                                  .                  01MAY2008

DXDATE + 180 = 29JUN2008, therefore no treatments within the window DXDATE + 180 <= date < DSDATE + 3 years?

Anyways, I am leaving soon so will leave you with where I got to.  I started by normalizing (reverse transpose) the data, this gives one column so you can order the dates and drop missing.  Its then a matter of going through line by line and checking your window.  When a window is hit, set dxdate to the limit for next window.  You can then just rollup the results.

data have;

patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;   

run;

proc transpose data=have out=t_have;

  by patient_id dxdate;

  var first_rtdate first_surgdate first_btdate first_htdate;

run;

proc sort data=t_have;

  by col1;

  where col1 ne .;

run;

data want;

  set t_have;

  format dxdate col1 date9.;

  retain dxdate;

  if dxdate + 180 <= col1 < intnx('years',dxdate+180,3,'same') then row="Window1";

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 8 replies
  • 4389 views
  • 0 likes
  • 3 in conversation