I am trying to build a cox regression model which has time varying predictors. The predictor nature is continuous in nature.
Most the examples that I cam across use dummy coding in binary form.e.g. if that covariate is there in that year then its 1 else 0.
But I have time varying covariates which are continuous in nature. Like cash value, asset etc.
Any suggestion how to include them into model, and in what form the model would need them to model probability of event being 1/0
I think you got it right.
Didn't you say that there were 15 intervals as in the examples? -then the length of the arrays aa, bb .. should be 14. The code below is (intended to be) as in the example, just with 5 timedependent variables instead of just one.
proc phreg data=Tumor;
model Time*Dead(0)=Dose a b c d e;
array aa{*} a1-a14;
array bb{*} b1-b14;
array cc{*} c1-c14;
array dd{*} d1-d14;
array tt{*} t1-t14;
t1=27; t2=34; t3=37; t4=41; t5=43; t6=45; t7=46; t8=47; t9=49; t10=50; t11=51; t12=53; t13=65; t14=67; t15=71;
if Time < tt[1] then do;
a=0;b=0;c=0;d=0;e=0;
*it is important here that tt[1] is before the first event-time. I think this can be done more elegant, but it is how it is written in the example;
end;
else if time >= tt[15] then do;
a=a15;b=b15;c=c15;d=d15;e=e15;
end;
else do i=1 to dim(pp);
if tt <= Time < tt[i+1] then do; a=aa;b=b[;c=cc;d=dd;e=e; end;
end;
run;
Maybe, you can shorten the code considerable by using twodimensional arrays. This is especially relevant if you need to extend the code to handle 20 time-dependent covariates.
Also, by the way, the second approach also shown in the documentation is the same as I suggested above (the one where a longer dataset is created such each record consist of an interval with time-indedependent covariates). I better like that solution.
Please see the example below-- it shows how to fit a Cox model by taking into account a continuous time dependent variable.
Funda
I had gone through this example. Say p is the continuous variable and p1-p15 are observations over 15 time periods. I have 20 such variables. can i directly put it in the form p1-p15 followed by say a1-a15,b1-b15 and so on. Will it allow such a combination?
Hi munitech4u,
SAS should allow for the form you suggest where the model line will be
model time*censur(0)=var1-var20,
and the 20 variables which is dependent on the values of a1-a15,b1-b15 .... has to be calculated inside phreg as in the example reffered in above.
There is also an alternative solution: Each record can be split up into several records each with an entry and exit time. Then, in each record the covariates are constant. This make the use of phreg much easier. The paper by Rostgaard shows how this dataset can be created ( http://link.springer.com/article/10.1186%2F1742-5573-5-7) by using a freely available macro
(The paper mention Poisson regression in the title, but with the "noagg" option it also works with Cox regresssion).
proc phreg data=Tumor;
model Time*Dead(0)=Dose NPap;
array pp{*} P1-P14;
So in the above statement, you mean we can modify the statement like:
proc phreg data=Tumor;
model Time*Dead(0)=sector a b c d e;
array aa{*} a1-a21;
array bb{*} b1-b21;
array cc{*} c1-c21;
array dd{*} d1-d21;
I think you got it right.
Didn't you say that there were 15 intervals as in the examples? -then the length of the arrays aa, bb .. should be 14. The code below is (intended to be) as in the example, just with 5 timedependent variables instead of just one.
proc phreg data=Tumor;
model Time*Dead(0)=Dose a b c d e;
array aa{*} a1-a14;
array bb{*} b1-b14;
array cc{*} c1-c14;
array dd{*} d1-d14;
array tt{*} t1-t14;
t1=27; t2=34; t3=37; t4=41; t5=43; t6=45; t7=46; t8=47; t9=49; t10=50; t11=51; t12=53; t13=65; t14=67; t15=71;
if Time < tt[1] then do;
a=0;b=0;c=0;d=0;e=0;
*it is important here that tt[1] is before the first event-time. I think this can be done more elegant, but it is how it is written in the example;
end;
else if time >= tt[15] then do;
a=a15;b=b15;c=c15;d=d15;e=e15;
end;
else do i=1 to dim(pp);
if tt <= Time < tt[i+1] then do; a=aa;b=b[;c=cc;d=dd;e=e; end;
end;
run;
Maybe, you can shorten the code considerable by using twodimensional arrays. This is especially relevant if you need to extend the code to handle 20 time-dependent covariates.
Also, by the way, the second approach also shown in the documentation is the same as I suggested above (the one where a longer dataset is created such each record consist of an interval with time-indedependent covariates). I better like that solution.
How to check multicollinearity in such a scenario? or should i check multicollinearity in the transposed dataset first before transposing it back to longitudinal data. Also the missing values are there in example for the time period before start or after finish, can we feed it directly to model
as in the dataset created in link: SAS/STAT(R) 9.22 User's Guide
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.