## Simulating survival data

Dear all,

I want to demonstrate discrete time hazard models with competing risks to my students by simulating survival data with time-dependent covariates. I also want to experiment with different variable selection approaches.

Most of the literature on this topic seems to be based on continuous time Cox PH assumptions.

Any SAS/IML or DATA STEP code will be much appreciated.

MFK

"One page of well written code is more valuable than hundred pages of explanation" Source: Unknown.

4 REPLIES 4

## Re: Simulating survival data

I'm not sure what you mean by a discrete time model. Reference or example? Do you want to use a (Markov) transition matrix to iterate from one time step to the next, or do you have something else in mind?

Chapter 12 of Simulating Data with SAS has a section on survival analysis models. As you say, most simulation studies focus on a Cox regression model where the survival times are exponential or Weibull distributed.  Bender, Augustin, and Blettner (2005, p. 1715) discuss "how survival times can be generated to simulate...Cox models ..with any non-zero baseline hazard rates," but they are still using continuous time.

## Re: Simulating survival data

If you have survival data on a portfolio of loans where some of the loans prepayed, some loans defaulted, some of the loans are current (or censored), but a large proportion of the loans ran their full term. In addition to this you assume that you also have data on time-dependent covariates. Then you can fit a discrete time competing risks (multinomial logistic regression) model to estimate the cause-specific hazard functions.

My problem is, I don't have survival data and would like to simulate data and then fit the above model to show that the data generating model can be "extracted".

## Re: Simulating survival data

The reason I suggested that you supply a reference is because a reference will often write down the theoretical assumptions that underlie the statistical model.  When you simulate data, you are pulling a random sample from a specified population model.

Do you already know how to simulate the multinomial logistic regression model, which you mentioned in your response, or is that part of the problem?

## Re: Simulating survival data

Yes, to simulate the multinomial logistic regression model is part of the problem.

A reference: Watkins, Vasnev & Gerlach, Multiple Event Incidence and Duration Analysis for Credit Data Incorporating Non-Stochastic Loan Maturity, Journal of Applied Econometrics, 29: 627–648 (2014).