Hi,
I have a tumor dataset with multiple observations per ID, and I am intending to use PROC PHREG. First I want to build a dataset with one observation per ID. Besides keeping baseline values. each observation should contain several variables coding whether a cancer recurrent or metastasis happened/ didn't happen, and if so on which date the event occurred/didn't occur.
My data looks like this: You can put all the codes I written immediately into your SAS and try out yourself.
Data have;
INFORMAT CaseType $12.;
FORMAT VisitDate DDMMYYP10.;
Input ID Sex $ VisitNr VisitDate CaseType $;
Cards;
1 m 1 14555 Primary
1 m 2 15000 Recurrent
1 m 3 15550 Metastasis
2 f 1 16520 Primary
2 f 2 17000 Recurrent
2 f 2 17060 Recurrent
3 f 1 14800 Primary
3 f 2 14900 Metastasis
4 m 1 17150 Primary
4 m 2 17750 Rec/Meta
;
RUN;
So as you can see case 2 didn't have any metastasis on the third visit, so I need the latest date on which the patient didn't have an metastasis. But I need the earliest date for an event that happened such as the first recurrence in case 2. My goal is to improve my analysis by using all available follow-up time.
My end dataset should look like this
Data want;
Input ID Sex $ PriDate Rec RecDate Metastasis MetaDate;
FORMAT ID 2. sex $1. Rec Meta 1. PriDate RecDate MetaDate DDMMYY10P.;
Cards;
1 m 14555 1 15000 1 15550
2 f 16520 1 17000 0 17060
3 f 14800 0 14900 1 14900
4 m 17150 1 177501 17750
;
RUN;
So the variables Rec and Metastasis work basically as censoring variables for the time to recurrence, to metastasis and they have the values 0 or 1.
Currently I am using the following code to do this.
DATA want;
FORMAT Rec Meta 1. PriDate RecDate MetaDate DDMMYYP10.;
RETAIN Rec0 Rec1 Meta0 Meta1 PriDate RecDate0 RecDate1 MetaDate0 MetaDate1;
SET have;
BY ID;
IF FIRST.ID THEN DO; Rec0=.; Rec1=.; Meta0=.; Meta1=.; RecDate0=.; RecDate1 =.; MetaDate0 =.; MetaDate1=.; END;
IF CaseType ='Primary' THEN PriDate = VisitDate;
ELSE IF CaseType ='Recurrent' THEN DO; Rec1=1; RecDate1 = min(RecDate1, VisitDate); Meta0=1; MetaDate0=max(MetaDate0, VisitDate); END;
ELSE IF CaseType ='Metastasis' THEN DO; Rec0=1; RecDate0 = max(RecDate0, VisitDate); Meta1=1; MetaDate1=min(MetaDate1, VisitDate); END;
ELSE IF CaseType ='Rec/Meta' THEN DO; Rec1=1; RecDate1 = min(RecDate1, VisitDate); Meta1=1; MetaDate1=min(MetaDate1, VisitDate); END;
IF LAST.ID THEN DO;
IF Rec1 = 1 THEN DO Rec = 1; RecDate=RecDate1; END;
ELSE IF Rec1 ne 1 THEN DO; IF Rec0 = 1 THEN DO; Rec = 0; RecDate=RecDate0; END; END;
IF Meta1 = 1 THEN DO Meta = 1; MetaDate=MetaDate1; END;
ELSE IF Meta1 ne 1 THEN DO; IF Meta0 = 1 THEN DO; Meta = 0; MetaDate=MetaDate0; END; END;
END;
IF LAST.ID THEN OUTPUT;
DROP rec0 rec1 meta0 meta1 RecDate1 RecDate0 MetaDate0 MetaDate1 VisitDate VisitNr CaseType;
RUN;
Is there a better way to do this? because it's very intensive and can have mistakes easily. It will be also very unpractical if I have too many possible end events.
Also is there a way may be to do the whole analysis including the PROC PHREG faster? Does The ID statement in the PROC PHREG help?
Finally I would appreciate it a lot if you can paste examples or links to advanced survival analysis codes.
Thanks in forward.
Ubai
Ok. I just though of a little better way, but I would love to hear from you. Here is a better code using the date of the last visit:
DATA want2;
FORMAT Rec Meta 1. PriDate LastDate RecDate MetaDate DDMMYYP10.;
RETAIN Rec Meta PriDate RecDate MetaDate;
SET have;
BY ID;
IF FIRST.ID THEN DO; PriDate = VisitDate; Rec=.; Meta=.; RecDate =.; MetaDate=.; END;
IF CaseType ='Recurrent' THEN DO; Rec=1; RecDate = min(RecDate, VisitDate); END;
ELSE IF CaseType ='Metastasis' THEN DO; Meta=1; MetaDate=min(MetaDate, VisitDate); END;
ELSE IF CaseType ='Rec/Meta' THEN DO; Rec=1; RecDate = min(RecDate, VisitDate); Meta=1; MetaDate=min(MetaDate, VisitDate); END;
IF LAST.ID THEN DO;
LastDate= Visitdate;
IF Rec ne 1 THEN DO; Rec=0; RecDate=VisitDate; END;
IF Meta ne 1 THEN DO; Meta=0; MetaDate=VisitDate; END;
END;
IF LAST.ID THEN OUTPUT;
DROP VisitDate VisitNr CaseType;
RUN;
Instead of retaining some variables, setting them missing at first.ID, and having an if last.ID code block, I would prefer putting things in a "doW" loop (named after Ian Whitlock, who has done a lot to popularize it). And all the if-then-else code on a single variable fits naturally in a SELECT statement.
Finally, you should take more care about how you format your code. Using line breaks and indents help a lot, and I think the code becomes a lot more readable if you do NOT capitalize all keywords - there are actually scientific studies showing that most people have an easier time reading lowcase text than capitalized text!
So, if I were to take over your code from here, it would very soon look something like this:
data want; format Rec Meta 1. PriDate LastDate RecDate MetaDate DDMMYYP10.;; do until(last.ID); set have; by ID; select(CaseType); when('Primary') PriDate = VisitDate; when('Recurrent') do; Rec=1; RecDate = min(RecDate, VisitDate); end; when('Metastasis') do; Meta=1; MetaDate=min(MetaDate, VisitDate); end; when('Rec/Meta') do; Rec=1; RecDate = min(RecDate, VisitDate); Meta=1; MetaDate=min(MetaDate, VisitDate); end; otherwise; end; end; LastDate= Visitdate; if Rec ne 1 then do; Rec=0; RecDate=VisitDate; end; if Meta ne 1 then do; Meta=0; MetaDate=VisitDate; end; drop VisitDate VisitNr CaseType; run;
Hi Lassen and thanks for replying. I will change my code and utilize the SELECT Statement. I never heard of the doW Loop before. I just found some literature on this topic and I will start reading. Thanks for the hint.
You are right with capitalization. I capitalize few stuff to make the beginning and the end of a step clear for me. This still Needs some improvement though. I also utilize indents and breaks in the original code.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.