I suspect that we would need to see actual input data (both versions) and the entire proc phreg code used for both to get enough details.
And which result and by how much difference?
Thanks.
Here are some examples of my data structure.
Say the study period is until 2013-12-31 as in ID 1.
But if there's a change in any subject due to some time-dependent variable (treatment / as in ID 2 and 3),
I updated the attribute before and after the date of the change as in Table 1.
Table 1.
ID | startdate1 | enddate1 | treatment1 | outcome1 | startdate2 | enddate2 | treatment1 | outcome2 |
1 | 2013-05-01 | 2013-12-31 | 0 | 0 | . | . | . | . |
2 | 2013-06-01 | 2013-06-30 | 0 | 0 | 2013-06-30 | 2013-11-20 | 1 | 0 |
3 | 2013-07-01 | 2013-07-30 | 0 | 0 | 2013-07-30 | 2013-12-01 | 1 | 1 |
In order to run this through PROC PHREG, I changed the dataset into long form as following table 2.
(i.e, those who have different states have 2 rows)
Table 2.
ID | start | end | treatment | outcome |
1 | 2013-05-01 | 2013-12-31 | 0 | 0 |
2 | 2013-06-01 | 2013-06-30 | 0 | 0 |
2 | 2013-06-30 | 2013-11-20 | 1 | 0 |
3 | 2013-07-01 | 2013-07-30 | 0 | 0 |
3 | 2013-07-30 | 2013-12-01 | 0 | 1 |
and the code goes as:
proc phreg data=sample;
class treatment (ref='0');
model (start, end)*outcome(0) = treatment / rl;
strata OOO (some other adjusting variables);
run;
But when I run this code, it takes several hours to get the results. (original dataset has about 10 million records)
So instead using (start, end) form, I manually calculated the time-to-event (tte) as following table 3
Table 3.
ID | tte | treatment | outcome |
1 | 244 | 0 | 0 |
2 | 29 | 0 | 0 |
2 | 143 | 1 | 0 |
3 | 29 | 0 | 0 |
3 | 124 | 0 | 1 |
and only changed the (start, end) part into "tte" as :
proc phreg data=sample;
class treatment (ref='0');
model tte*outcome(0) = treatment / rl;
strata OOO (some other adjusting variables);
run;
In this case, the program only takes minutes, but the results are different from when using (start, end) statement.
(sorry that I don't have exact numbers since I'm running another program right now)
* I referred to
http://support.sas.com/resources/papers/proceedings12/168-2012.pdf
this article when managing the dataset, though it doesn't say anything about whether manually calculating the time-to-event will result the same or not.
hope this explains enough to ask for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.