02-11-2016 05:05 PM
I have a longitudinal data set in long format and am doing survival analysis.
Individuals in the study were assess for the outcome and a variety of covariates every 3 months (more or less).
Each row of data is an individual at a single time point so each individual has multiple time points -the number of which depending on how long they were followed up.
There are both constant baseline covariates and time-varying covariates.
In order to look at the bivariate descriptive statistics of the time-varying covariates, I need to look at the time interval IMMEDIATELY PRECENDING the event of interest. So basically I need a frequency of how many individuals had syphilis (1,0) in the time interval immediately preceding the event. Its an open cohort so start time and every 3 month follow-up time is different for each individual. The "startdate" and "enddate" of the 3 month intervals were derived from the original "visitdate" and just allow for that time interval between the previous visit and current visit.
I think I can do this with lag(date) but have been trying and failing.
ID visitdate manage womanage startdate enddate syphilis genitalulcer status
5982f 20AUG2005 52 40 . 20AUG2005 1 0 0
5982f 03NOV2005 52 40 20AUG2005 03NOV2005 0 1 0
5982f 10FEB2006 52 40 03NOV2005 10FEB2006 1 0 1
Frequency of syphilis in the time interval right before the event takes place (status=1) for each ID
02-11-2016 06:26 PM
It would help to have what your result would be for a given example input, something that you can show easily by hand.
There could be different interpretations of "time interval right before the event takes place" because we don't know what "event" may be or even "right before".
To make the example clearer only include the variables that concern the rules you need to implement.
Also it might not hurt to show what you attempted and describe how it didn't yeild desired results.
You mention attempting to use lag. A very common issue due to complexity of how lag is implemented is attempting to use code like this:
If lag(variable) = "some value" then ...
Lag in all of its forms, and the related function Dif, for consistent results should not be used in conditionals, and the more complex the conditional the more bizarre the behavior may appear.
Better is to do:
L_var = lag(var); <= unconditional assigment and not inside any conditional structure.
If L_var then do ;;
02-11-2016 07:05 PM
Thank you. Yes, I used the non-conditional lag previously to create the time intervals that start with "startdate" and end with "enddate". I had to read a lot about it because of the bizarre behavior I was getting at first! So you are right, lagging is probably not the answer...
I appreciate any help. Not sure if this helps clarify or not.
When the event occurrs, the individual is censored.
ID visitdate startdate enddate syphilis event eventdate
5982 20AUG2005 . 20AUG2005 1 0
5982 03NOV2005 20AUG2005 03NOV2005 1 0
5982 10FEB2006 03NOV2005 10FEB2006 0 1 03FEB2006
6401 20JUN2006 . 20JUN2006 0 0
6401 18SEP2006 20JUN2006 18SEP2006 1 0
6401 03DEC2006 18SEP2006 03DEC2006 0 0
6401 10MAR2007 03DEC2006 10MAR2007 0 1 01MAR2007
Frequency of syphilis in the time interval right before the event takes place (event=1), so each ID can only contribute a 1 or 0 to this total.
"transmitting intervals" are the time intervals that begin with "startdate" and end with "enddate"
|Transmitting intervals for people who got the event|