One of the most used model to analyze survival data is the Cox proportional regression model where the hazard rate is on the form λ(t|X)=exp(γX)λ0(t) where X is a set of covariates. However, this model is often not flexible enough. A typical problem is that the proportionality assumption is not fulfilled. This can for instance happen if one has an effect that is additive, eg the effect of a covariate is the same independent of the baseline hazard. To overcome this problem the form of the hazard function can be generalized such it can include additive effects as well. I will here demonstrate some SAS-code which can estimate additive and multiplicative effects in a a more general hazardregression model, and the applied macro can be found as attached files together with examples of its use.
Given two sets of covariates, W and X with dimension respectively q and p, consider survival times that are generated from a hazard function λ(t,X,W) on the form
λ(t|W,X)=g(βW)+h(γX)λ0,S(t)
as described in the paper by Lin and Ying (1). Here g and h can be arbitrary userdefined link functions which has to be two times differentiable, β and γ are the parameters to be estimated and λ0S(t) is a non-parametric baseline hazard function. The subscript "S" is a strata-variable allowing the baseline hazard to be different between different subgroups of the study cohort.
Note that when g=0 and h=exp() then we have an ordinary Cox-regression model, and when g(x)=x and h=1 then we have the Aalen model. Since g and h can be (almost) any link-functions and the model also allows for stratification of the baseline hazard this model is very flexible.
As far as I am aware there are no SAS code available to fit the general hazard regression model as specified above. As most are aware the PROC PHREG can do the special case of Cox-regression. Further, Schaubel and Wei (2) have showed how standard SAS procedures can be used to estimate parameters in the additive model where λ(t|W)=βW+λ0(t).
Martingale theory can be used to derive a set of estimation equations as shown in the paper by Lin and Ying (1). The attached SAS-macro "%lin_ying" solves the set of equations by use of a Newton Raphson algorithm. One should note that there can be more than one solution (see (1) for details) and only one of the solutions is consistent (convergence to the true value). This issue has though not been a problem in a large number of testruns of the macro.
Using PROC FCMP it is possible to specify the form of g and h. It is also necessary to specify the first and second derivatives as these will be used to solve the estimating equations described by Lin and Ying. An example of how to define g(x)=1 and h(x)=exp(x) is
proc fcmp outlib=work.functions.gh;
function g(x);
y=x;
return (y);
endsub;
function dg(x);
*first derivative of g;
y=1;
return (y);
endsub;
function ddg(x);
*second derivative of g;
y=0;
return (y);
endsub;
function h(x);
y=exp(x);
return (y);
endsub;
function dh(x);
*first derivative of h;
y=exp(x);
return (y);
endsub;
function ddh(x);
*second derivative of h;
y=exp(x);
return (y);
endsub;
run;
If the user wants some other link functions then he/she can easily redefine the functions.
To use the functions it is necesary to tell SAS where the functions can be found. This is done with the cmplib-option:
option cmplib=(work.functions);
Before calling the macro it is neccessary to get the survival data on the right form. Since I have not implemented a class statement the user has to get each class-variable into a set of indicator variables (as how PROC PHREG worked in old days). The covariates belonging to W has to be named w_1-w_q and covariates belonging to X has to be named x_1-x_p. Further the dataset has to contain a timevariable and an event indicator (1 if death, 0 if censored).
Note, the macro can not handle missing values as well as it can not handle any kind over linear dependency between covariates (overparametrization).
A requirement for running the macro is that there is access to some matrix functions. The matrix functions are defined in the attached file "matrix_algebra.sas", where functions are put in the work.functions dataset. The option "cmplib" will be used to point on these functions. In the attached file the matrix functions are put in the work.functions dataset, so there is no need to redefine the cmplib-options (unless one put the matrix-functions in some permanent library).
When the linkfunctions are defined and the cmplib option is used to point on the linkfunctions and matrixfunctions then the macro can be executed. The macro definition is
%lin_ying(data=,p=,q=,start=,time=,event=,strata=,epsilon=,maxiter=);
where
Output from the macro is the parameters with 95% confidence limits and standarderrors which will be shown in the output window. Also, the macro generate a dataset called "baseline" which contains the estimated baseline function.
For use in the examples below the following simulated data will be used. These data are here generated from the hazardfunction λ0S(t)=(t+1)*NS, with NS standard lognormal-distributed, h as the exponential function and g as the function. The true parameters is γ=(1/4) and β=(1/2,2,0.4). The covariates x_1 and w_1-w_3 are all bin(1,1/2)-distributed. The hazard then takes the form
λS(t)=βW+exp(γX)λ0S(t)
Further I here have a censoring time assumed to be exponential distributed with mean 1.
data simulation;
call streaminit(1234567);
array strataval{0:9} _temporary_;
do strata=0 to 9;
strataval[strata]=exp(rand('normal',-2,1));
end;
do i=1 to 10000;
x_1=RAND('BERNOULLI', 1/2) ;
w_1=RAND('BERNOULLI', 1/2) ;
w_2=RAND('BERNOULLI', 1/2) ;
w_3=RAND('BERNOULLI', 1/2) ;
strata=mod(i,10);
wbeta=w_1*0.5+w_2*0.1+0.4*w_3;
xgamma= x_1*0.25;
censurtime=RAND('EXPONENTIAL');
start=0;
eventtime=sqrt(-2*log(rand('uniform',0,1))/(h(xgamma)*strataval[strata]) +(g(wbeta)/(h(xgamma)*strataval[strata])+1)**2)-g(wbeta)/(h(xgamma)*strataval[strata])-1;
event=(eventtime<censurtime);
time=min(eventtime,censurtime);
strata2=strata*10+x_1;*the variable strata2 will be used in example 2;
output;
end;
keep start time x_1 w_1-w_3 event strata strata2;
run;
Here, "start" is a left truncation time, which here is chosen constant 0, "t" is the observed time, "event" is an indiator of whether there t is an eventtime or censoring time (1=eventime), and "strata" is the stratification variable of λo(). Notice that if one wants to generate data with some other link funtions the same program can be used since as it is not dependent on the form of g and h.
The dimension of β is 3 so we set q=3. The dimension of γ is 1 so we set p=1. The macro call therefore becomes
%lin_ying(data=simulation,p=1,q=3,time=time,event=event,strata=strata);
If everything is defined in the right manner the output window will now show the estimates with 95% confidence limits:
The model has converged
Parameter Estimate Std Err
Beta 1 0.5059(0.4554,0.5565) 0.025789
Beta 2 0.0965(0.0482,0.1448) 0.024654
Beta 3 0.3776(0.3272, 0.428) 0.025708
Gamma 1 0.2326(0.1082,0.3571) 0.063484
That the parameters is very close to the true values is not surprising as the model is specified exactly as the model from which the data are generated;
If one put the number of parameters in the multiplicative term to one and letting g(x)=x, then one has a Aalens additive model.
Note that the model of from which the test data was generated can be expressed by the Aalens additive model if one replace the strata variable with a cross product between x_1 and the old strata variable. This is how the variable "strata2" was created. We can thereby get consistent estimates of β. We get no estimate of γ, because the effect of x_1 is handled by stratifying on the variable "strata2":
%lin_ying(data=simulation,p=0,q=3,time=time,event=event,strata=strata2);
Gives the output:
The model has converged
Parameter Estimate Std Err
Beta 1 0.507(0.4565,0.5575) 0.025772
Beta 2 0.0998(0.0513,0.1483) 0.024749
Beta 3 0.3779(0.3277,0.4282) 0.025639
If one put the number of parameters in the additive term to zero (g=0) and letting h(x)=exp(x), then one has a Cox model. Therefore, the macro call
%lin_ying(data=simulation,p=1,q=0,time=time,event=event);
gives same result as PROC PHREG:
PROC PHREG DATA=simulation nosummary;
MODEL time*event(0)=x_1;
RUN;
In the original paper by Lin and Ying the covariate vectors X and W are allowed to dependent on time. This is not possible in the macro I present in this article. However, left truncation is a possibility. Therefore, if we assume that X and W are piecewise constant then each record can be divided into pieces such X and W are constant between the lefttruncation time and the survival/censoring time. The same princip is explained in details the doucmentioton of PHREG (http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_phreg_exampl...).
Here I have an example were the effect of w_3 is allowed to change at time=1.
data simulation2;
set simulation;
if time>1 then do;
temptime=time;
tempevent=event;
time=1;
event=0;
w_4=0;
output;
event=tempevent;
time=temptime;
start=1;
w_4=w_3;
w_3=0;
output;
end;
else do;
w_4=0;
output;
end;
drop tempevent temptime;
run;
%lin_ying(data=simulation2,p=1,q=4,start=start,time=time,event=event,strata=strata);
These files are attached, and they can be run in the order they are listed:
1) Lin, D. & Ying, Z. Semiparametric analysis multiplicative hazard models for counting processes, The annals of Statistics, 1995, 1712-1734
2) Fitting sermiparametric additive hazards models using standard statistical software, Schaubel & Wei, Biometrical Journal 49 (2007) 5, 719-730
Jacob, this is also a great article. Thank you for adding your knowledge to Communities on SAS.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.