I have a dataset containing dates of treatment for cancer patients. A patient can have radiation, surgery, brachytherapy, hormone treatment, or no treatment. I need to find the first treatment that occurred in a time window of diagnosis date + 180 days to diagnosis date + 3 years, then determine the sequence of all other 'first' treatments that occurred within 1 year after the first treatment and construct a variable that indicates the treatment sequence within this time window..
My dataset looks like this:
patient ID dxdate first_rtdate first_surgdate first_btdate first_htdate
1 01JAN2008 01JUN2008 . 01MAY2008
2 01MAY2004 01MAY2009 15JUN2004 01SEP2004 01JUN2004
3 01SEP2003 01MAR2004 01DEC2003 15SEP2003
4 01JUL2005 15JUL2005
5. 01OCT2001 01OCT2007
6. 01JAN2002 15JUL2002 15JUL2007
7. 01NOV2009 01DEC2009 01DEC2013
What I need to end up with is a variable for treatment sequence, including the first treatment, within this first treatment-365day window that looks like
patient ID txseq
1 HT,RT
2 HT,SURG,BT
3 HT,SURG,RT
4 HT
5. NO TREATMENT
6. RT
7. NO TREATMENT
I'd be very grateful for any help!
Thank you
It is more clear . If I understand what you mean ,this time .
data have; input ID dxdate : date9. first_rtdate : date9. first_surgdate : date9. first_btdate : date9. first_htdate : date9. ; format dxdate first_rtdate first_surgdate first_btdate first_htdate date9. ; cards; 1 01JAN2008 01AUG2008 . . 01JUL2008 2 01MAY2004 01NOV2009 01DEC2004 15DEC2004 01NOV2004 3 01SEP2003 01FEB2005 01DEC2004 . 01APR2004 4 01JUL2005 . 01SEP2005 . 01FEB2006 5 01OCT2001 . 01OCT2007 . . 6 01JAN2002 01JUL2002 15JUL2007 . . 7 01NOV2009 01DEC2009 01DEC2013 . . ; run; data temp(keep=id dxdate vname date start end); set have; array x{*} first_rtdate first_surgdate first_btdate first_htdate; start=dxdate+180; end=intnx('year',dxdate,3,'s'); do i=1 to dim(x); if not missing(x{i}) then do; vname=vname(x{i});date=x{i}; output; end; end; format date start end date9.; run; data temp1(where=(dif is not missing)); set temp; if start le date le end then dif=date-dxdate; run; proc sort data=temp1 ; by id dif;run; data temp2(keep=id date rename=(date=_date)); set temp1; by id; if first.id; run; data temp3; merge temp temp2; by id; if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ; run; proc sort data=temp3;by id date;run; data temp4(keep=id want); set temp3; by id ; length want $ 100; retain want; want=catx(',',want,substr(scan(vname,-1,'_'),1,2) ); if last.id then do;output;call missing(want);end; run; data want; merge have(keep=id) temp4 ; by id ; if missing(want) then want='NO TREATMENT'; run;
Xia Keshan
Should be straightforward, five if statements:
data have;
patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;
run;
data want;
set have;
length txseq $200;
if first_htdate ne . and dxdate <= first_htdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"HT");
if first_surgdate ne . and dxdate <= first_surgdate < intnx('days',dxdate,180,'same') then txseq=catx(",",strip(txseq),"SURG");
/* Same for other columns */
if strip(txseq)="" then txseq="NO TREATMENT";
run;
I don't understand why id=7 has no treatment ? should be RT ?
data have; input ID dxdate : date9. first_rtdate : date9. first_surgdate : date9. first_btdate : date9. first_htdate : date9. ; format dxdate first_rtdate first_surgdate first_btdate first_htdate date9. ; cards; 1 01JAN2008 01JUN2008 . . 01MAY2008 2 01MAY2004 01MAY2009 15JUN2004 01SEP2004 01JUN2004 3 01SEP2003 01MAR2004 01DEC2003 . 15SEP2003 4 01JUL2005 . . . 15JUL2005 5. 01OCT2001 . 01OCT2007 . . 6. 01JAN2002 15JUL2002 15JUL2007 . . 7. 01NOV2009 01DEC2009 01DEC2013 . . ; run; data temp(keep=id dxdate vname date ); set have; array x{*} first_rtdate first_surgdate first_btdate first_htdate; do i=1 to dim(x); if not missing(x{i}) then do; vname=vname(x{i});date=x{i}; if x{i} le dxdate+180 then output; end; end; format date start end date9.; run; data temp; set temp; dif=date-dxdate; run; proc sort data=temp ; by id date;run; data want(keep=id want); set temp; by id ; length want $ 100; retain want; want=catx(',',want,substr(scan(vname,-1,'_'),1,2) ); if last.id then do;output;call missing(want);end; run; data want; merge have(keep=id) want ; by id ; if missing(want) then want='NO TREATMENT'; run;
Xia Keshan
Thanks for your help!
Patient 7 has no treatment because the only treatment date, for RT, was only 1 month after diagnosis. The earliest treatment date in my 'window' should be dxdate +180. I am only interested in treatment dates dx+180 days to dxdate +3 years.
In other words, the 'initial' window is dx+180<txdate<3yrs. Once I determine the first treatment within that window, I need to set another window, for all subsequent treatments within 1 year of that first treatment date. Then I need to create the txseq variable (which will include the initial treatment date).
Laura
So id=1 should not fit your condition , right ?
dxdate =01JAN2008
dxdate +180=29JUN2008
dxdate +3yrs=01JAN2011
01JUN2008 is not in this range . Can you explain it ?
Yes, sorry! Those dates are for a 3-month window! Let me do this again, changing the dates to be correct this time...i think these are right now
patient ID dxdate first_rtdate first_surgdate first_btdate first_htdate
1 01JAN2008 01AUG2008 . 01JUL2008
2 01MAY2004 01NOV2009 01DEC2004 15DEC2004 01NOV2004
3 01SEP2003 01FEB2005 01DEC2004 01APR2004
4 01JUL2005 01SEP2005 01FEB2006
5. 01OCT2001 01OCT2007
6. 01JAN2002 01JUL2002 15JUL2007
7. 01NOV2009 01DEC2009 01DEC2013
patient ID txseq
1 HT,RT
2 HT,SURG,BT
3 HT,SURG,RT
4 HT
5. NO TREATMENT
6. RT
7. NO TREATMENT
It is more clear . If I understand what you mean ,this time .
data have; input ID dxdate : date9. first_rtdate : date9. first_surgdate : date9. first_btdate : date9. first_htdate : date9. ; format dxdate first_rtdate first_surgdate first_btdate first_htdate date9. ; cards; 1 01JAN2008 01AUG2008 . . 01JUL2008 2 01MAY2004 01NOV2009 01DEC2004 15DEC2004 01NOV2004 3 01SEP2003 01FEB2005 01DEC2004 . 01APR2004 4 01JUL2005 . 01SEP2005 . 01FEB2006 5 01OCT2001 . 01OCT2007 . . 6 01JAN2002 01JUL2002 15JUL2007 . . 7 01NOV2009 01DEC2009 01DEC2013 . . ; run; data temp(keep=id dxdate vname date start end); set have; array x{*} first_rtdate first_surgdate first_btdate first_htdate; start=dxdate+180; end=intnx('year',dxdate,3,'s'); do i=1 to dim(x); if not missing(x{i}) then do; vname=vname(x{i});date=x{i}; output; end; end; format date start end date9.; run; data temp1(where=(dif is not missing)); set temp; if start le date le end then dif=date-dxdate; run; proc sort data=temp1 ; by id dif;run; data temp2(keep=id date rename=(date=_date)); set temp1; by id; if first.id; run; data temp3; merge temp temp2; by id; if not missing(_date) then do; if _date le date le intnx('year',_date,1,'s') then output;end ; run; proc sort data=temp3;by id date;run; data temp4(keep=id want); set temp3; by id ; length want $ 100; retain want; want=catx(',',want,substr(scan(vname,-1,'_'),1,2) ); if last.id then do;output;call missing(want);end; run; data want; merge have(keep=id) temp4 ; by id ; if missing(want) then want='NO TREATMENT'; run;
Xia Keshan
This works!!! Thank you so much, you have no idea how much I appreciate your help!!
Errm, that doesn't match you required output then.
patient ID dxdate first_rtdate first_surgdate first_btdate first_htdate
1 01JAN2008 01JUN2008 . 01MAY2008
DXDATE + 180 = 29JUN2008, therefore no treatments within the window DXDATE + 180 <= date < DSDATE + 3 years?
Anyways, I am leaving soon so will leave you with where I got to. I started by normalizing (reverse transpose) the data, this gives one column so you can order the dates and drop missing. Its then a matter of going through line by line and checking your window. When a window is hit, set dxdate to the limit for next window. You can then just rollup the results.
data have;
patient_ID=1; dxdate="01JAN2008"d; first_rtdate="01JUN2008"d; first_surgdate="15JUN2008"d; first_btdate=.; first_htdate="01MAY2008"d; output;
run;
proc transpose data=have out=t_have;
by patient_id dxdate;
var first_rtdate first_surgdate first_btdate first_htdate;
run;
proc sort data=t_have;
by col1;
where col1 ne .;
run;
data want;
set t_have;
format dxdate col1 date9.;
retain dxdate;
if dxdate + 180 <= col1 < intnx('years',dxdate+180,3,'same') then row="Window1";
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.