Surival analysis with time varying data

Occasional Contributor
Posts: 17

Surival analysis with time varying data

I have a longitudinal data set in long format and am doing survival analysis.

Individuals in the study were assess for the outcome and a variety of covariates every 3 months (more or less).

Each row of data is an individual at a single time point so each individual has multiple time points -the number of which depending on how long they were followed up.

There are both constant baseline covariates and time-varying covariates.

In order to look at the bivariate descriptive statistics of the time-varying covariates, I need to look at the time interval IMMEDIATELY PRECENDING the event of interest. So basically I need a frequency of how many individuals had syphilis (1,0) in the time interval immediately preceding the event. Its an open cohort so start time and every 3 month follow-up time is different for each individual. The "startdate" and "enddate" of the 3 month intervals were derived from the original "visitdate" and just allow for that time interval between the previous visit and current visit.


I think I can do this with lag(date) but have been trying and failing.


Data have:

ID            visitdate             manage     womanage     startdate          enddate           syphilis     genitalulcer      status  

5982f       20AUG2005       52              40                   .                      20AUG2005     1                0                       0

5982f       03NOV2005       52              40                  20AUG2005    03NOV2005     0                1                       0          

5982f       10FEB2006        52              40                  03NOV2005    10FEB2006      1                0                       1





Data want:

Frequency of syphilis in the time interval right before the event takes place (status=1) for each ID


Super User
Posts: 13,507

Re: Surival analysis with time varying data

It would help to have what your result would be for a given example input, something that you can show easily by hand.

There could be different interpretations of "time interval right before the event takes place" because we don't know what "event" may be or even "right before".

To make the example clearer only include the variables that concern the rules you need to implement.


Also it might not hurt to show what you attempted and describe how it didn't yeild desired results.


You mention attempting to use lag. A very common issue due to complexity of how lag is implemented is attempting to use code like this:


If lag(variable) = "some value" then ...

Lag in all of its forms, and the related function Dif, for consistent results should not be used in conditionals, and the more complex the conditional the more bizarre the behavior may appear.

Better is to do:

L_var = lag(var); <= unconditional assigment and not inside any conditional structure.


If L_var then do ;;

Occasional Contributor
Posts: 17

Re: Surival analysis with time varying data

Thank you. Yes, I used the non-conditional lag previously to create the time intervals that start with "startdate" and end with "enddate". I had to read a lot about it because of the bizarre behavior I was getting at first! So you are right, lagging is probably not the answer...


I appreciate any help. Not sure if this helps clarify or not.


When the event occurrs, the individual is censored.


Data have:

ID            visitdate              startdate          enddate           syphilis     event    eventdate

5982       20AUG2005         .                      20AUG2005      1                 0                     

5982       03NOV2005       20AUG2005    03NOV2005       1                 0                               

5982       10FEB2006        03NOV2005    10FEB2006        0                 1      03FEB2006                  

6401       20JUN2006          .                     20JUN2006        0                0

6401       18SEP2006        20JUN2006    18SEP2006        1                 0                     

6401       03DEC2006       18SEP2006     03DEC2006       0                 0                               

6401      10MAR2007        03DEC2006   10MAR2007        0                 1      01MAR2007



Data want:

Frequency of syphilis in the time interval right before the event takes place (event=1), so each ID can only contribute a 1 or 0 to this total.

"transmitting intervals" are the time intervals that begin with "startdate" and end with "enddate"


  Transmitting intervals for people who got the event
  N %
           1  1  50
Ask a Question
Discussion stats
  • 2 replies
  • 2 in conversation