DATA Step, Macro, Functions and more

Get time interval variables from long format of longitudinal data set

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Get time interval variables from long format of longitudinal data set

Hi Dear All,

I have a data set below:

data A;

input ID visit_date status Event;

datalines;

46 27-Oct-08 Normal 1

46 23-Apr-09 Normal 0

46 3-Dec-09 Normal 0

46 6-Jan-11 Normal 0

53 22-Dec-08 Normal 0

53 25-Jun-09 Normal 0

53 22-Dec-09 Normal 1

54 17-Jan-09 Low 1

54 13-Aug-09 Normal 0

54 10-Jun-10 Normal 0

54 7-Apr-11 Normal 0

59 2-Feb-09 Normal 0

59 23-Jul-09 Normal 1

59 4-Feb-10 Normal 0

59 23-Jan-11 Normal 0

66 21-Apr-09 Normal 0

66 15-Oct-09 Normal 0

66 8-Apr-10 Normal 0

66 7-Apr-11 Normal 0

;

I want to create two new variables: start, and exit. The new data set like below:

data B;

input ID visit_date status Event start exit;

datalines;

46 27-Oct-08 Normal 1 0 178

46 23-Apr-09 Normal 0 178 224

46 3-Dec-09 Normal 0 224 399

46 6-Jan-11 Normal 0

53 22-Dec-08 Normal 0 0 185

53 25-Jun-09 Normal 0

53 22-Dec-09 Normal 1

54 17-Jan-09 Low 1

54 13-Aug-09 Normal 0

54 10-Jun-10 Normal 0

54 7-Apr-11 Normal 0

59 2-Feb-09 Normal 0

59 23-Jul-09 Normal 1

59 4-Feb-10 Normal 0

59 23-Jan-11 Normal 0

66 21-Apr-09 Normal 0

66 15-Oct-09 Normal 0

66 8-Apr-10 Normal 0

66 7-Apr-11 Normal 0

;

 

is there a quick way to create variables 'start' and 'exit' based on my data set?  Then I can use proc phreg to evaluate the effect of status in each time interval on event.

Thanks so much for your help,

Xinjian

 

 

 

 


Accepted Solutions
Solution
‎01-25-2016 11:32 AM
Super Contributor
Posts: 490

Re: Get time interval variables from long format of longitudinal data set

[ Edited ]

 

One way to do it

 

data need ;
retain id visit_date Status Event Start;
format visit_date DATE8.;
set A (rename=(id = origid visit_date=next_date status=originstatus Event=originevent)) end=end ;
by origid;
if first.origid then call missing(Exit); 
else Exit=intck('day',visit_date,next_date);
Start=lag(Exit);
if Start =. then Start=0;
if Exit = . then call missing(Start);
if _n_ > 1 then output;
visit_date = next_date;
id = origid;
Status=originstatus; 
Event=originevent;
if not end then return;
call missing(next_date,Start,Exit);
output;
drop origid originstatus originevent next_date;
run;

View solution in original post


All Replies
Super Contributor
Posts: 490

Re: Get time interval variables from long format of longitudinal data set

How you calculate start and exit??

Frequent Contributor
Posts: 82

Re: Get time interval variables from long format of longitudinal data set

I did not. These values are from excel calculation.

Thanks

Super User
Posts: 17,840

Re: Get time interval variables from long format of longitudinal data set

What's the business logic, and possibly even, how did Excel calculate it?

Super Contributor
Posts: 490

Re: Get time interval variables from long format of longitudinal data set

[ Edited ]

Forget about excel now.... Even your example is not complete and now clear logic behind.

You should now what you need as you said you need the data for applying proc phreg.

So now what start shoud represent and what end should represent? What you logic behind?

 

If you just need the days interval between each consecutive observation for the same ID, you can do that easily like

data A;
input ID visit_date: DATE8. status $ Event;
format visit_date DATE8.;
datalines;
46 27-Oct-08 Normal 1
46 23-Apr-09 Normal 0
46 3-Dec-09 Normal 0
46 6-Jan-11 Normal 0
53 22-Dec-08 Normal 0
53 25-Jun-09 Normal 0
53 22-Dec-09 Normal 1
54 17-Jan-09 Low 1
54 13-Aug-09 Normal 0
54 10-Jun-10 Normal 0
54 7-Apr-11 Normal 0
59 2-Feb-09 Normal 0
59 23-Jul-09 Normal 1
59 4-Feb-10 Normal 0
59 23-Jan-11 Normal 0
66 21-Apr-09 Normal 0
66 15-Oct-09 Normal 0
66 8-Apr-10 Normal 0
66 7-Apr-11 Normal 0
;
data want;
set A;
by ID;
lagdate=lag(visit_date);
if first.ID then interval=0;
else interval=intck('day',lagdate,visit_date);
drop lagdate;
run;

 And if you need thhe interval value to be accumulative from first observation to the last observation under the same ID. Just change the code to do that easily too

data want;
set A;
by ID;
retain interval;
lagdate=lag(visit_date);
if first.ID then interval=0;
else interval =interval+intck('day',lagdate,visit_date);
drop lagdate;
run;
Frequent Contributor
Posts: 82

Re: Get time interval variables from long format of longitudinal data set

Hi  and Reeza,

Thank you  so much for quick response. My data set is a longitudinaI set, I have used first 'status' of each subject to evaluate the effect of status on 'event', the follow up time is last 'visit_date' -first visit_date. Proc phreg was used for analysis. To answer some critics, I want to test if changes of 'status' ( from normal to low or  low-normal) in each interval have effects on 'event'. That is why I want to create two variables 'start' and 'exit' from data set A. Then I can run :

proc phreg;

model (start, exit) * event(0) =status;

run;

 

I may mislead you for data set B that is the data file I want to make.

Can you help me to make these two variables 'start' and 'exit'?

Thanks,

Xinjian

  

Super User
Posts: 10,500

Re: Get time interval variables from long format of longitudinal data set

Maybe we need to be more explicit in questions

data B;

input ID visit_date status Event start exit;

datalines;

46 27-Oct-08 Normal 1 0 178   <= What specific calculation gets a value of 178 on this line? It looks like it is the number of days to the next date, but we need confirmation.

46 23-Apr-09 Normal 0 178 224

46 3-Dec-09 Normal 0 224 399

46 6-Jan-11 Normal 0   <= Does this one not have either a start or end because the next record is for a different ID?

53 22-Dec-08 Normal 0 0 185

53 25-Jun-09 Normal 0  <= Why does this record NOT have a start or end? Did you just stop calculating them by hand for examples?

53 22-Dec-09 Normal 1

54 17-Jan-09 Low 1

54 13-Aug-09 Normal 0

54 10-Jun-10 Normal 0

54 7-Apr-11 Normal 0

59 2-Feb-09 Normal 0

59 23-Jul-09 Normal 1

59 4-Feb-10 Normal 0

59 23-Jan-11 Normal 0

66 21-Apr-09 Normal 0

66 15-Oct-09 Normal 0

66 8-Apr-10 Normal 0

66 7-Apr-11 Normal 0

;
Frequent Contributor
Posts: 82

Re: Get time interval variables from long format of longitudinal data set

Hi Ballardw,

The first value of 'start' with ID 46 is 0 (use first visit_date as baseline), value of 'exit' = 2nd visit_date - 1st visit_date;

2nd value of 'start'=2nd visit_date - 1st visit_date, value of 'exit' = 3rd visit_date - 2nd visit_date;

3rd value of 'start'=3rd visit_date - 2nd visit_date, value of 'exit' = 4th visit_date - 3rdd visit_date;

 

in my data  set, each subject has 4 measurement points ('status'). I just want to see if there is siginificant effect of 'status' on 'event' in each time interval. the fourth observation will be viewed as missing data.

 

yes, i only calculated 4 observations by using excel spreadsheet.

Thanks for your help,

Xinjian

 

Solution
‎01-25-2016 11:32 AM
Super Contributor
Posts: 490

Re: Get time interval variables from long format of longitudinal data set

[ Edited ]

 

One way to do it

 

data need ;
retain id visit_date Status Event Start;
format visit_date DATE8.;
set A (rename=(id = origid visit_date=next_date status=originstatus Event=originevent)) end=end ;
by origid;
if first.origid then call missing(Exit); 
else Exit=intck('day',visit_date,next_date);
Start=lag(Exit);
if Start =. then Start=0;
if Exit = . then call missing(Start);
if _n_ > 1 then output;
visit_date = next_date;
id = origid;
Status=originstatus; 
Event=originevent;
if not end then return;
call missing(next_date,Start,Exit);
output;
drop origid originstatus originevent next_date;
run;
Frequent Contributor
Posts: 82

Re: Get time interval variables from long format of longitudinal data set

Hi ,

I tried your syntax. it works very well. Thanks so much for your help.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 353 views
  • 0 likes
  • 4 in conversation