I have a dataset containing dates of treatment for cancer patients. A patient can have radiation, surgery, brachytherapy, hormone treatment, or no treatment. I need to find the first treatment that occurred in a time window of diagnosis date + 180 days to diagnosis date + 3 years, then determine the sequence of all other 'first' treatments that occurred within 1 year after the first treatment and construct a variable that indicates the treatment sequence within this time window..
My dataset looks like this:
patient ID dxdate first_rtdate first_surgdate first_btdate first_htdate
1 01JAN2008 01JUN2008 . 01MAY2008
2 01MAY2004 01MAY2009 15JUN2004 01SEP2004 01JUN2004
3 01SEP2003 01MAR2004 01DEC2003 15SEP2003
4 01JUL2005 15JUL2005
5. 01OCT2001 01OCT2007
6. 01JAN2002 15JUL2002 15JUL2007
7. 01NOV2009 01DEC2009 01DEC2013
What I need to end up with is a variable for treatment sequence, including the first treatment, within this first treatment-365day window that looks like
patient ID txseq
1 HT,RT
2 HT,SURG,BT
3 HT,SURG,RT
4 HT
5. NO TREATMENT
6. RT
7. NO TREATMENT
I'd be very grateful for any help!
Thank you
It is more clear . If I understand what you mean ,this time .
data have; input ID dxdate : date9. first_rtdate : date9. first_surgdate : date9. first_btdate : date9. first_htdate : date9. ; format dxdate first_rtdate first_surgdate first_btdate first_htdate date9. ; cards; 1 01JAN2008 01AUG2008 . . 01JUL2008 2 01MAY2004 01NOV2009 01DEC2004 15DEC2004 01NOV2004 3 01SEP2003 01FEB2005 01DEC2004 . 01APR2004 4 01JUL2005 . 01SEP2005 . 01FEB2006 5 01OCT2001 . 01OCT2007 . . 6 01JAN2002 01JUL2002 15JUL2007 . . 7 01NOV2009 01DEC2009 01DEC2013 . . ; run; data temp(keep=id dxdate vname date start end); set have; array x{*} first_rtdate first_surgdate first_btdate first_htdate; start=dxdate+180; end=intnx('year',dxdate,3,'s'); do i=1 to dim(x); if not missing(x{i}) then do; vname=vname(x{i});date=x{i}; output; end; end; format date start end date9.; run; data temp1(where=(dif is not missing)); set temp; if start le date le end then dif=date-dxdate; run; proc sort data=temp1 ; by id dif;run; data temp2(keep=id date rename=(date=_date)); set temp1; by id; if first.id; run; data temp3; merge temp temp2; by id; if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ; run; proc sort data=temp3;by id date;run; data temp4(keep=id want); set temp3; by id ; length want $ 100; retain want; want=catx(',',want,substr(scan(vname,-1,'_'),1,2) ); if last.id then do;output;call missing(want);end; run; data want; merge have(keep=id) temp4 ; by id ; if missing(want) then want='NO TREATMENT'; run;
Xia Keshan
Should be straightforward, five if statements:
data have;
patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;
run;
data want;
set have;
length txseq $200;
if first_htdate ne . and dxdate <= first_htdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"HT");
if first_surgdate ne . and dxdate <= first_surgdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"SURG");
/* Same for other columns */
if strip(txseq)="" then txseq="NO TREATMENT";
run;
I don't understand why id=7 has no treatment ? should be RT ?
data have; input ID dxdate : date9. first_rtdate : date9. first_surgdate : date9. first_btdate : date9. first_htdate : date9. ; format dxdate first_rtdate first_surgdate first_btdate first_htdate date9. ; cards; 1 01JAN2008 01JUN2008 . . 01MAY2008 2 01MAY2004 01MAY2009 15JUN2004 01SEP2004 01JUN2004 3 01SEP2003 01MAR2004 01DEC2003 . 15SEP2003 4 01JUL2005 . . . 15JUL2005 5. 01OCT2001 . 01OCT2007 . . 6. 01JAN2002 15JUL2002 15JUL2007 . . 7. 01NOV2009 01DEC2009 01DEC2013 . . ; run; data temp(keep=id dxdate vname date ); set have; array x{*} first_rtdate first_surgdate first_btdate first_htdate; do i=1 to dim(x); if not missing(x{i}) then do; vname=vname(x{i});date=x{i}; if x{i} le dxdate+180 then output; end; end; format date start end date9.; run; data temp; set temp; dif=date-dxdate; run; proc sort data=temp ; by id date;run; data want(keep=id want); set temp; by id ; length want $ 100; retain want; want=catx(',',want,substr(scan(vname,-1,'_'),1,2) ); if last.id then do;output;call missing(want);end; run; data want; merge have(keep=id) want ; by id ; if missing(want) then want='NO TREATMENT'; run;
Xia Keshan
Thanks for your help!
Patient 7 has no treatment because the only treatment date, for RT, was only 1 month after diagnosis. The earliest treatment date in my 'window' should be dxdate +180. I am only interested in treatment dates dx+180 days to dxdate +3 years.
In other words, the 'initial' window is dx+180<txdate<3yrs. Once I determine the first treatment within that window, I need to set another window, for all subsequent treatments within 1 year of that first treatment date. Then I need to create the txseq variable (which will include the initial treatment date).
Laura
So id=1 should not fit your condition , right ?
dxdate =01JAN2008
dxdate +180=29JUN2008
dxdate +3yrs=01JAN2011
01JUN2008 is not in this range . Can you explain it ?
Yes, sorry! Those dates are for a 3-month window! Let me do this again, changing the dates to be correct this time...i think these are right now
patient ID dxdate first_rtdate first_surgdate first_btdate first_htdate
1 01JAN2008 01AUG2008 . 01JUL2008
2 01MAY2004 01NOV2009 01DEC2004 15DEC2004 01NOV2004
3 01SEP2003 01FEB2005 01DEC2004 01APR2004
4 01JUL2005 01SEP2005 01FEB2006
5. 01OCT2001 01OCT2007
6. 01JAN2002 01JUL2002 15JUL2007
7. 01NOV2009 01DEC2009 01DEC2013
patient ID txseq
1 HT,RT
2 HT,SURG,BT
3 HT,SURG,RT
4 HT
5. NO TREATMENT
6. RT
7. NO TREATMENT
It is more clear . If I understand what you mean ,this time .
data have; input ID dxdate : date9. first_rtdate : date9. first_surgdate : date9. first_btdate : date9. first_htdate : date9. ; format dxdate first_rtdate first_surgdate first_btdate first_htdate date9. ; cards; 1 01JAN2008 01AUG2008 . . 01JUL2008 2 01MAY2004 01NOV2009 01DEC2004 15DEC2004 01NOV2004 3 01SEP2003 01FEB2005 01DEC2004 . 01APR2004 4 01JUL2005 . 01SEP2005 . 01FEB2006 5 01OCT2001 . 01OCT2007 . . 6 01JAN2002 01JUL2002 15JUL2007 . . 7 01NOV2009 01DEC2009 01DEC2013 . . ; run; data temp(keep=id dxdate vname date start end); set have; array x{*} first_rtdate first_surgdate first_btdate first_htdate; start=dxdate+180; end=intnx('year',dxdate,3,'s'); do i=1 to dim(x); if not missing(x{i}) then do; vname=vname(x{i});date=x{i}; output; end; end; format date start end date9.; run; data temp1(where=(dif is not missing)); set temp; if start le date le end then dif=date-dxdate; run; proc sort data=temp1 ; by id dif;run; data temp2(keep=id date rename=(date=_date)); set temp1; by id; if first.id; run; data temp3; merge temp temp2; by id; if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ; run; proc sort data=temp3;by id date;run; data temp4(keep=id want); set temp3; by id ; length want $ 100; retain want; want=catx(',',want,substr(scan(vname,-1,'_'),1,2) ); if last.id then do;output;call missing(want);end; run; data want; merge have(keep=id) temp4 ; by id ; if missing(want) then want='NO TREATMENT'; run;
Xia Keshan
This works!!! Thank you so much, you have no idea how much I appreciate your help!!
Errm, that doesn't match you required output then.
patient ID dxdate first_rtdate first_surgdate first_btdate first_htdate
1 01JAN2008 01JUN2008 . 01MAY2008
DXDATE + 180 = 29JUN2008, therefore no treatments within the window DXDATE + 180 <= date < DSDATE + 3 years?
Anyways, I am leaving soon so will leave you with where I got to. I started by normalizing (reverse transpose) the data, this gives one column so you can order the dates and drop missing. Its then a matter of going through line by line and checking your window. When a window is hit, set dxdate to the limit for next window. You can then just rollup the results.
data have;
patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;
run;
proc transpose data=have out=t_have;
by patient_id dxdate;
var first_rtdate first_surgdate first_btdate first_htdate;
run;
proc sort data=t_have;
by col1;
where col1 ne .;
run;
data want;
set t_have;
format dxdate col1 date9.;
retain dxdate;
if dxdate + 180 <= col1 < intnx('years',dxdate+180,3,'same') then row="Window1";
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need courses to help you with SAS Life Sciences Analytics Framework, SAS Health Cohort Builder, or other topics? Check out the Health and Life Sciences learning path for all of the offerings.