id | year | deathy | deathm | deathd |
1 | 2010 | |||
1 | 2011 | |||
1 | 2012 | |||
1 | 2013 | |||
1 | 2014 | |||
1 | 2015 | 2015 | 3 | 17 |
2 | 2010 | |||
2 | 2011 | |||
2 | 2012 | |||
2 | 2013 | |||
2 | 2014 | |||
2 | 2015 | |||
2 | 2016 | |||
2 | 2017 | |||
3 | 2010 | |||
3 | 2011 | 2011 | 3 | 1 |
If I have data looking like this, to conduct cox analysis should I make a table looking like below?
Also, what codes should I use to make a table like this?
id | year | deathy | deathm | deathd | start | finish |
1 | 2010 | 0 | 365 | |||
1 | 2011 | 366 | 730 | |||
1 | 2012 | 731 | 1096 | |||
1 | 2013 | 1097 | 1462 | |||
1 | 2014 | 1463 | 1828 | |||
1 | 2015 | 2015 | 3 | 17 | 1829 | 77.83333 |
2 | 2010 | 0 | 365 | |||
2 | 2011 | 366 | 730 | |||
2 | 2012 | 731 | 1096 | |||
2 | 2013 | 1097 | 1462 | |||
2 | 2014 | 1463 | 1828 | |||
2 | 2015 | 1829 | 2194 | |||
2 | 2016 | 2195 | 2560 | |||
2 | 2017 | 2561 | 2926 | |||
3 | 2010 | 0 | 365 | |||
3 | 2011 | 2011 | 3 | 1 | 366 | 61.83333 |
Thanks in advance.
Here are some features about your plan that stand out as questionable.
Why should the start for each ID be at 0 and not at 1? It's inconsistent to go from 0 to 365 for year 1, but 366 to 730 for year 2.
Why use fractional days? SAS is perfectly capable of counting days from January 1 to March 17.
Why should the final FINISH value be lower than the final START value? Should we be adding those together to get the right FINISH?
I'm not claiming to have answers to these questions, just pointing out that these issues are red flags.
The programming isn't that difficult, but designing the target result takes some work.
I am sorry that I made some mistakes and your comment is correct.
I'm attaching the new table here. Is this correct now?
id | year | deathy | deathm | deathd | start | finish |
1 | 2010 | 1 | 365 | |||
1 | 2011 | 366 | 730 | |||
1 | 2012 | 731 | 1096 | |||
1 | 2013 | 1097 | 1462 | |||
1 | 2014 | 1463 | 1828 | |||
1 | 2015 | 2015 | 3 | 17 | 1829 | 1907 |
2 | 2010 | 1 | 365 | |||
2 | 2011 | 366 | 730 | |||
2 | 2012 | 731 | 1096 | |||
2 | 2013 | 1097 | 1462 | |||
2 | 2014 | 1463 | 1828 | |||
2 | 2015 | 1829 | 2194 | |||
2 | 2016 | 2195 | 2560 | |||
2 | 2017 | 2561 | 2926 | |||
3 | 2010 | 1 | 365 | |||
3 | 2011 | 2011 | 3 | 1 | 366 | 428 |
That looks better. I can't really advise you about the analysis, but I can give you some direction about creating the START and FINISH variables. Try it this way:
proc sort data=have;
by id year;
run;
data want;
set have;
by id year;
retain start finish;;
days_in_current_feb = day(mdy(3, 1, year) - 1);
if first.id then do;
start = 1;
finish = 365 - 28 + days_in_current_feb;
if deathy > . then finish = mdy(deathm, deathd, deathy) -
intnx('year', mdy(1, 1, year), 0) + 1;
end;
else do;
start + 365 - 28 + days_in_current_feb;
if deathy = . then finish + 365 - 28 + days_in_current_feb;
else finish + mdy(deathm, deathd, deathy) - start;
end;
run;
FOR THE MOMENT: Sorry, this is incomplete. Once my brain turned to mush, I needed a break. I will get back to it later to fix the rest of it.
OK, here's the retry:
proc sort data=have;
by id year;
run;
data want;
set have;
by id year;
retain start finish;
days_in_current_year = mdy(1, 1, year+1) - mdy(1, 1, year);
if first.id then do;
start = 1;
finish = days_in_current_year;
end;
else do;
start = finish + 1;
finish = finish + days_in_current_year;
end;
if deathy > . then do;
finish = finish - days_in_current_year;
finish = finish + mdy(deathm, deathd, deathy) - mdy(1, 1, year) + 1;
end;
run;
It looks about right, but test it to see if the final calculation for a death date needs to be adjusted by a day.
Edit: I posted this without first seeing Astounding's solution. Our approaches are virtually identical.
I thought this would be fun...this is what I came up with:
data have;
length id year deathy deathm deathd 8;
infile datalines truncover;
input id year deathy deathm deathd;
datalines;
1 2010
1 2011
1 2012
1 2013
1 2014
1 2015 2015 3 17
2 2010
2 2011
2 2012
2 2013
2 2014
2 2015
2 2016
2 2017
3 2010
3 2011 2011 3 1
;
run;
data want;
length id year deathy deathm deathd start finish 8;
infile datalines truncover;
input id year deathy deathm deathd start finish;
datalines;
1 2010 . . . 1 365
1 2011 . . . 366 730
1 2012 . . . 731 1096
1 2013 . . . 1097 1462
1 2014 . . . 1463 1828
1 2015 2015 3 17 1829 1907
2 2010 . . . 1 365
2 2011 . . . 366 730
2 2012 . . . 731 1096
2 2013 . . . 1097 1462
2 2014 . . . 1463 1828
2 2015 . . . 1829 2194
2 2016 . . . 2195 2560
2 2017 . . . 2561 2926
3 2010 . . . 1 365
3 2011 2011 3 1 366 428
;
run;
data test;
set have;
by id year;
* derive date from year ;
date=mdy(1,1,year);
* derive number of days in the year ;
num_days=intck('day',intnx('year',date,0,'B'),intnx('year',date,1,'B'));
* if death, derive number of days to death ;
if nmiss(deathy,deathm,deathd)=0 then
num_days=intck('day',intnx('year',date,0,'B'),mdy(deathm,deathd,deathy));
* reset if first ID ;
if first.id then do;
start=1;
finish=num_days;
end;
else do;
start=finish+1;
finish+num_days; * automatically retained ;
end;
drop date;
* drop num_days;
run;
proc compare base=want comp=test;
run;
However, it doesn't exactly match your desired output. Perhaps you can run with this and fix the code? Or perhaps your desired output is slightly off?
Hope this helps...
P.S.: In the future, can you post your problems with a self-contained data step containing datalines for your data? This will save us from having to cut-and-paste your data into a working data step.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.