Hi all,
I have the data like this:
data have;
infile datalines delimiter=',';
attrib
id length=$1
visit_date format=date9. informat=date9.
death_date format=date9. informat=date9.
;
input id $ visit_date death death_date disease;
*format id best. visit_date death_date date9.;
datalines;
1,24oct2012,0,.,0
1,24oct2013,0,.,0
1,23oct2014,0,.,0
1,26oct2015,0,.,0
1,23oct2016,0,.,1
1,28oct2017,0,.,1
2,20dec2007,1,15jun2018,0
2,17sep2008,1,15jun2018,0
2,20dec2009,1,15jun2018,0
2,16sep2010,1,15jun2018,1
2,20dec2011,1,15jun2018,1
2,22sep2012,1,15jun2018,1
;
run;
From this longitudinal dataset, I want to generate the dataset to run Cox model with competing risk. More specifically, I want to create the dataset that satisfy the following:
- if from the first visit to the last visit, subject does not have the event (disease=0), then time_to_event= duration from the first visit to the last visit and create variable called censor_status=0.
- if from the first visit to the last visit, subject has the event in between (diease=1), then time_to_event =duration from first visit to the visit_date when even happen (aka disease from 0 to 1) and censor_status=1.
- if death_date occurs before the event happens then time_to_event = duration from first visit to death date, and censor_status=2.
Any insight is greatly appreciated.
Thanks
You have 3 conditions specified, but only provide two IDs. We can't be sure we have programmed all three conditions because we don't have data for the third condition. In the future, we need data that represents all reasonable situations that might happen. However, you can test it out on other conditions to see if it works.
Also, I did not compute the variable censor_status and I leave it as a homework problem for you to do so, given the logic of the program, I don't think it will be hard.
data want;
set have;
by id;
retain start_date end_date;
if first.id then do;
start_date=visit_date;
end_date=.;
end;
if disease=1 and missing(end_date) then end_date=visit_date;
if last.id and missing(end_date) then end_date=visit_date;
if last.id and not missing(death_date) then end_date=death_date;
if last.id then do;
time_to_event=end_date-start_date;
output;
end;
run;
Before you proceed, you should first learn the proper use of informats in INPUT statements.
Right now, your data step does not work:
69 data have; 70 input id visit_date death_date status; 71 cards; NOTE: Invalid data for visit_date in line 72 3-11. REGEL: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 72 1 01jan2000 . 0 id=1 visit_date=. death_date=. status=0 _ERROR_=1 _N_=1
(multiple NOTEs for invalid data removed).
Since you use List INPUT , you have to use the colon modifier when using informats. From the look of your data, you should use the DATE9. informat.
Please fix your data step, so we can proceed in helping you further.
You have 3 conditions specified, but only provide two IDs. We can't be sure we have programmed all three conditions because we don't have data for the third condition. In the future, we need data that represents all reasonable situations that might happen. However, you can test it out on other conditions to see if it works.
Also, I did not compute the variable censor_status and I leave it as a homework problem for you to do so, given the logic of the program, I don't think it will be hard.
data want;
set have;
by id;
retain start_date end_date;
if first.id then do;
start_date=visit_date;
end_date=.;
end;
if disease=1 and missing(end_date) then end_date=visit_date;
if last.id and missing(end_date) then end_date=visit_date;
if last.id and not missing(death_date) then end_date=death_date;
if last.id then do;
time_to_event=end_date-start_date;
output;
end;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.