Estimating additive and multiplicative parameters in a semiparametric hazard regression model

Introduction

One of the most used model to analyze survival data is the Cox proportional regression model where the hazard rate is on the form λ(t|X)=exp(γX)λ₀(t) where X is a set of covariates. However, this model is often not flexible enough. A typical problem is that the proportionality assumption is not fulfilled. This can for instance happen if one has an effect that is additive, eg the effect of a covariate is the same independent of the baseline hazard. To overcome this problem the form of the hazard function can be generalized such it can include additive effects as well. I will here demonstrate some SAS-code which can estimate additive and multiplicative effects in a a more general hazardregression model, and the applied macro can be found as attached files together with examples of its use.

The model

Given two sets of covariates, W and X with dimension respectively q and p, consider survival times that are generated from a hazard function λ(t,X,W) on the form

λ(t|W,X)=g(βW)+h(γX)λ_0,S(t)

as described in the paper by Lin and Ying (1). Here g and h can be arbitrary userdefined link functions which has to be two times differentiable, β and γ are the parameters to be estimated and λ_0S(t) is a non-parametric baseline hazard function. The subscript "S" is a strata-variable allowing the baseline hazard to be different between different subgroups of the study cohort.

Note that when g=0 and h=exp() then we have an ordinary Cox-regression model, and when g(x)=x and h=1 then we have the Aalen model. Since g and h can be (almost) any link-functions and the model also allows for stratification of the baseline hazard this model is very flexible.

Other software

As far as I am aware there are no SAS code available to fit the general hazard regression model as specified above. As most are aware the PROC PHREG can do the special case of Cox-regression. Further, Schaubel and Wei (2) have showed how standard SAS procedures can be used to estimate parameters in the additive model where λ(t|W)=βW+λ₀(t).

Estimation

Theory and method

Martingale theory can be used to derive a set of estimation equations as shown in the paper by Lin and Ying (1). The attached SAS-macro "%lin_ying" solves the set of equations by use of a Newton Raphson algorithm. One should note that there can be more than one solution (see (1) for details) and only one of the solutions is consistent (convergence to the true value). This issue has though not been a problem in a large number of testruns of the macro.

Defining the link functions

Using PROC FCMP it is possible to specify the form of g and h. It is also necessary to specify the first and second derivatives as these will be used to solve the estimating equations described by Lin and Ying. An example of how to define g(x)=1 and h(x)=exp(x) is

proc fcmp outlib=work.functions.gh;

function g(x);

y=x;

return (y);

endsub;

function dg(x);

*first derivative of g;

y=1;

return (y);

endsub;

function ddg(x);

*second derivative of g;

y=0;

return (y);

endsub;

function h(x);

y=exp(x);

return (y);

endsub;

function dh(x);

*first derivative of h;

y=exp(x);

return (y);

endsub;

function ddh(x);

*second derivative of h;

y=exp(x);

return (y);

endsub;

run;

If the user wants some other link functions then he/she can easily redefine the functions.

To use the functions it is necesary to tell SAS where the functions can be found. This is done with the cmplib-option:

option cmplib=(work.functions);

Execute the macro

Before calling the macro it is neccessary to get the survival data on the right form. Since I have not implemented a class statement the user has to get each class-variable into a set of indicator variables (as how PROC PHREG worked in old days). The covariates belonging to W has to be named w_1-w_q and covariates belonging to X has to be named x_1-x_p. Further the dataset has to contain a timevariable and an event indicator (1 if death, 0 if censored).

Note, the macro can not handle missing values as well as it can not handle any kind over linear dependency between covariates (overparametrization).

A requirement for running the macro is that there is access to some matrix functions. The matrix functions are defined in the attached file "matrix_algebra.sas", where functions are put in the work.functions dataset. The option "cmplib" will be used to point on these functions. In the attached file the matrix functions are put in the work.functions dataset, so there is no need to redefine the cmplib-options (unless one put the matrix-functions in some permanent library).

When the linkfunctions are defined and the cmplib option is used to point on the linkfunctions and matrixfunctions then the macro can be executed. The macro definition is

%lin_ying(data=,p=,q=,start=,time=,event=,strata=,epsilon=,maxiter=);

where

"data" is the dataset containing the survival data.
p is the dimension of γ
q is the dimension of β
start is a lefttruncation time (optional, 0 is default l)
time is the survival or right-censoring time
event an event indicator (1=death, 0=censoring)
strata is a strata-variable used on the baseline hazard function (optional)
epsilon controls the convergence criterion. Convergence is attained when parameters change less than epsilon (optional, 1E-6 is default)
maxiter is the maximal number of iterations in the Newton Raphson algorithm (optional, 20 is default).

Output from the macro is the parameters with 95% confidence limits and standarderrors which will be shown in the output window. Also, the macro generate a dataset called "baseline" which contains the estimated baseline function.

Examples

Creation of a test dataset

For use in the examples below the following simulated data will be used. These data are here generated from the hazardfunction λ_0S(t)=(t+1)*N_S, with N_S standard lognormal-distributed, h as the exponential function and g as the function. The true parameters is γ=(1/4) and β=(1/2,2,0.4). The covariates x_1 and w_1-w_3 are all bin(1,1/2)-distributed. The hazard then takes the form

λ_S(t)=βW+exp(γX)λ_0S(t)

Further I here have a censoring time assumed to be exponential distributed with mean 1.

data simulation;

call streaminit(1234567);

array strataval{0:9} _temporary_;

do strata=0 to 9;

strataval[strata]=exp(rand('normal',-2,1));

end;

do i=1 to 10000;

x_1=RAND('BERNOULLI', 1/2) ;

w_1=RAND('BERNOULLI', 1/2) ;

w_2=RAND('BERNOULLI', 1/2) ;

w_3=RAND('BERNOULLI', 1/2) ;

strata=mod(i,10);

wbeta=w_1*0.5+w_2*0.1+0.4*w_3;

xgamma= x_1*0.25;

censurtime=RAND('EXPONENTIAL');

start=0;

eventtime=sqrt(-2*log(rand('uniform',0,1))/(h(xgamma)*strataval[strata]) +(g(wbeta)/(h(xgamma)*strataval[strata])+1)**2)-g(wbeta)/(h(xgamma)*strataval[strata])-1;

event=(eventtime<censurtime);

time=min(eventtime,censurtime);

strata2=strata*10+x_1;*the variable strata2 will be used in example 2;

output;

end;

keep start time x_1 w_1-w_3 event strata strata2;

run;

Here, "start" is a left truncation time, which here is chosen constant 0, "t" is the observed time, "event" is an indiator of whether there t is an eventtime or censoring time (1=eventime), and "strata" is the stratification variable of λo(). Notice that if one wants to generate data with some other link funtions the same program can be used since as it is not dependent on the form of g and h.

Example 1: The true model

The dimension of β is 3 so we set q=3. The dimension of γ is 1 so we set p=1. The macro call therefore becomes

%lin_ying(data=simulation,p=1,q=3,time=time,event=event,strata=strata);

If everything is defined in the right manner the output window will now show the estimates with 95% confidence limits:

The model has converged

Parameter Estimate Std Err

Beta 1 0.5059(0.4554,0.5565) 0.025789

Beta 2 0.0965(0.0482,0.1448) 0.024654

Beta 3 0.3776(0.3272, 0.428) 0.025708

Gamma 1 0.2326(0.1082,0.3571) 0.063484

That the parameters is very close to the true values is not surprising as the model is specified exactly as the model from which the data are generated;

Example 2: Aalens additive Model

If one put the number of parameters in the multiplicative term to one and letting g(x)=x, then one has a Aalens additive model.

Note that the model of from which the test data was generated can be expressed by the Aalens additive model if one replace the strata variable with a cross product between x_1 and the old strata variable. This is how the variable "strata2" was created. We can thereby get consistent estimates of β. We get no estimate of γ, because the effect of x_1 is handled by stratifying on the variable "strata2":

%lin_ying(data=simulation,p=0,q=3,time=time,event=event,strata=strata2);

Gives the output:

The model has converged

Parameter Estimate Std Err

Beta 1 0.507(0.4565,0.5575) 0.025772

Beta 2 0.0998(0.0513,0.1483) 0.024749

Beta 3 0.3779(0.3277,0.4282) 0.025639

Example 3: Cox regression

If one put the number of parameters in the additive term to zero (g=0) and letting h(x)=exp(x), then one has a Cox model. Therefore, the macro call

%lin_ying(data=simulation,p=1,q=0,time=time,event=event);

gives same result as PROC PHREG:

PROC PHREG DATA=simulation nosummary;

MODEL time*event(0)=x_1;

RUN;

Example 4: Timedependent covariates

In the original paper by Lin and Ying the covariate vectors X and W are allowed to dependent on time. This is not possible in the macro I present in this article. However, left truncation is a possibility. Therefore, if we assume that X and W are piecewise constant then each record can be divided into pieces such X and W are constant between the lefttruncation time and the survival/censoring time. The same princip is explained in details the doucmentioton of PHREG (http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_phreg_exampl...).

Here I have an example were the effect of w_3 is allowed to change at time=1.

data simulation2;

set simulation;

if time>1 then do;

temptime=time;

tempevent=event;

time=1;

event=0;

w_4=0;

output;

event=tempevent;

time=temptime;

start=1;

w_4=w_3;

w_3=0;

output;

end;

else do;

w_4=0;

output;

end;

drop tempevent temptime;

run;

%lin_ying(data=simulation2,p=1,q=4,start=start,time=time,event=event,strata=strata);

Attached files

These files are attached, and they can be run in the order they are listed:

Matrix_algebra.sas (contains the necessary functions for doing matrix operations)
Lin_ying.sas (contains the macro to solve the estimating equations and thereby provide estimates)
Examples (contains the examples shown in this article)

References

1) Lin, D. & Ying, Z. Semiparametric analysis multiplicative hazard models for counting processes, The annals of Statistics, 1995, 1712-1734

2) Fitting sermiparametric additive hazards models using standard statistical software, Schaubel & Wei, Biometrical Journal 49 (2007) 5, 719-730

Community_Help · ‎01-29-2015

Jacob, this is also a great article. Thank you for adding your knowledge to Communities on SAS.