Hi SAS community,
I'm doing a survival analysis where I need to calculate person-time. For my dataset, I don't have a variables with specific end dates, rather I have days of enrollment. I would like to calculate the person-time for each individual. Can you help with a code that calculates an end date based on time to event, loss-to-follow-up or end of study - which ever comes first? Below is a simple dataset that represents my data. Day1 is the start date.
data have;
input ID $ Start_Date $ Day1 Day2 Day3 Day4 Day5 Event_date;
datalines;
AA 01/01/2015 1 1 1 1 1 01/04/2015
BB 02/01/2015 1 . 1 1 1 02/20/2015
CC 01/04/2015 1 1 . . . .
DD 02/10/2015 1 1 1 . 1 02/15/2015
EE 02/07/2015 1 1 1 1 1 .
FF 01/30/2015 1 1 1 1 1 01/31/2015
;;
run;
AA start_date=01/01/2015 and end_date=01/04/2015 *date of event
BB start_date=02/01/2015 and end_date=02/01/2015 *date of last enrollment;
CC start_date=01/04/2015 and end_date=01/05/2015 *date of last enrollment;
DD start_date=02/10/2015 and end_date=02/12/2015 *date of last enrollment
EE start_date=02/07/2015 and end_date=02/11/2015 *date of last enrollment
FF start_date=01/30/2015 and end_date=01/31/2015 *date of event;
No problem. Please find below an adapted version of my code.
data want;
retain studystart '01JAN2015'd;
set have;
array day day:;
do m=start_date-studystart+1 to dim(day) while(day[m]);
end;
if m >= start_date-studystart+2 then end_date=min(studystart+m-2, event_date);
else put 'WAR' 'NING: DayX for start date of ' id= 'is missing! End_date has been set to missing.';
format end_date mmddyy10.;
keep id start_date end_date;
run;
Please note (I forgot to mention this in my earlier post) that I use a numeric (SAS date) variable START_DATE, not a character variable as is suggested by your data step. Is START_DATE in your real data a character variable?
As in my earlier code, I let SAS write a warning to the log and set END_DATE to missing if the DAYx variable corresponding to a patient's start date is missing and not 1 as in your sample data. (Or should these cases, if any, be handled differently?)
Finally, I noticed two inconsistencies in your new sample output:
* if I understand the logic correctly
Hi,
A few questions. Why is date character, its far easier to use numeric dates. Also, whereCC start_date=01/04/2015 and end_date=01/05/2015 *date of last enrollment;r does this record come from:
I can't see why 01/05 is calculated as its not in the test data.
You can use simple if statements to get it:
data want; set have; end_date=start_date; stop=1; if day1 ne . then end_date=end_date+1; else stop=0; if day2 ne . and stop then end_date=end_date+1; if day3 ne . and stop then end_date=end_date+1; ... run;
Hi RW9,
Thanks for your reply and suggested code.
The variable end_date does not exist in my original dataset. I need to create that variable based on the variables that exist (i.e. start_date, day1-day5, event_date). In the space after the sample dataset, I was giving examples of what the end_date value should be for these instances. Regarding your question about CC. The end_date = 01/05/2015 because CC was enrolled for two days, starting on 01/04/2015. The ideal code should create the end_date variable with 01/05/2015 as the value for subject ID CC.
Thanks!
Hi,
Sorry, did you actually try my code? end_date is created in my datastep per below.
For the second point, why would CC have 01/05 - this is not 2 days after 01/04? Am am not seeing the logic of how you go:
start_date=01/04/2015 - then nothing in day 1-3, then a 1 in day4, and nothing in day 5, and arrive at an end_date=01/04/2015? Where does the date come from, what is "day" in relation to the date? If its not a single "day" then probably best not to name it that, but something more descriptive.
data want; set have; end_date=start_date; stop=1; if day1 ne . then end_date=end_date+1; else stop=0; if day2 ne . and stop then end_date=end_date+1; if day3 ne . and stop then end_date=end_date+1; ... run;
Hi @sophia_SAS,
Try this:
data want;
set have;
array day day:;
do m=1 to dim(day) while(day[m]);
end;
if m >= 2 then end_date=min(start_date+m-2, event_date);
else put 'WAR' 'NING: For ' id= 'Day1 is missing! End_date has been set to missing.';
format end_date mmddyy10.;
keep id start_date end_date;
run;
Hi Freelance Reinhard!
Your code worked beautifully. Thank you for putting it together. However, when I went to apply it to my actual dataset, it wasn't working. It was here that I realized the set-up I had in my sample 'have' dataset is different than my actual one. Apologies for wasting your time. My actual dataset has continuous days starting from the beginning of the entire dataset vs starting from the enrollees start date. (i.e. day1 always equals 01/01/2015, regardless of the enrollee's start date) So using my same sample dataset, here's a revised sample dataset. I've revised some of the dates, but rules still apply.
data have;
input ID $ Start_Date $ Day1 Day2 Day3 Day4 Day5 Event_date;
datalines;
AA 01/01/2015 1 1 1 1 1 01/04/2015
BB 01/03/2015 . . 1 1 1 .
CC 01/04/2015 . . . 1 . .
DD 01/01/2015 1 1 1 . 1 02/15/2015
EE 01/01/2015 1 1 1 1 1 .
FF 01/02/2015 . 1 1 1 1 01/03/2015
;;
run;
AA start_date=01/01/2015 and end_date=01/04/2015 *date of event
BB start_date=01/03/2015 and end_date=01/05/2015 *date of last enrollment;
CC start_date=01/04/2015 and end_date=01/04/2015 *date of last enrollment;
DD start_date=01/01/2015 and end_date=01/03/2015 *date of last enrollment
EE start_date=01/01/2015 and end_date=01/01/2015 *date of last enrollment
FF start_date=01/02/2015 and end_date=01/02/2015 *date of event;
No problem. Please find below an adapted version of my code.
data want;
retain studystart '01JAN2015'd;
set have;
array day day:;
do m=start_date-studystart+1 to dim(day) while(day[m]);
end;
if m >= start_date-studystart+2 then end_date=min(studystart+m-2, event_date);
else put 'WAR' 'NING: DayX for start date of ' id= 'is missing! End_date has been set to missing.';
format end_date mmddyy10.;
keep id start_date end_date;
run;
Please note (I forgot to mention this in my earlier post) that I use a numeric (SAS date) variable START_DATE, not a character variable as is suggested by your data step. Is START_DATE in your real data a character variable?
As in my earlier code, I let SAS write a warning to the log and set END_DATE to missing if the DAYx variable corresponding to a patient's start date is missing and not 1 as in your sample data. (Or should these cases, if any, be handled differently?)
Finally, I noticed two inconsistencies in your new sample output:
* if I understand the logic correctly
Thanks Freelance Reinhard! I really appreciate your help!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.