Hi Dear All,
I have a data set below:
data A;
input ID visit_date status Event;
datalines;
46 27-Oct-08 Normal 1
46 23-Apr-09 Normal 0
46 3-Dec-09 Normal 0
46 6-Jan-11 Normal 0
53 22-Dec-08 Normal 0
53 25-Jun-09 Normal 0
53 22-Dec-09 Normal 1
54 17-Jan-09 Low 1
54 13-Aug-09 Normal 0
54 10-Jun-10 Normal 0
54 7-Apr-11 Normal 0
59 2-Feb-09 Normal 0
59 23-Jul-09 Normal 1
59 4-Feb-10 Normal 0
59 23-Jan-11 Normal 0
66 21-Apr-09 Normal 0
66 15-Oct-09 Normal 0
66 8-Apr-10 Normal 0
66 7-Apr-11 Normal 0
;
I want to create two new variables: start, and exit. The new data set like below:
data B;
input ID visit_date status Event start exit;
datalines;
46 27-Oct-08 Normal 1 0 178
46 23-Apr-09 Normal 0 178 224
46 3-Dec-09 Normal 0 224 399
46 6-Jan-11 Normal 0
53 22-Dec-08 Normal 0 0 185
53 25-Jun-09 Normal 0
53 22-Dec-09 Normal 1
54 17-Jan-09 Low 1
54 13-Aug-09 Normal 0
54 10-Jun-10 Normal 0
54 7-Apr-11 Normal 0
59 2-Feb-09 Normal 0
59 23-Jul-09 Normal 1
59 4-Feb-10 Normal 0
59 23-Jan-11 Normal 0
66 21-Apr-09 Normal 0
66 15-Oct-09 Normal 0
66 8-Apr-10 Normal 0
66 7-Apr-11 Normal 0
;
is there a quick way to create variables 'start' and 'exit' based on my data set? Then I can use proc phreg to evaluate the effect of status in each time interval on event.
Thanks so much for your help,
Xinjian
One way to do it
data need ;
retain id visit_date Status Event Start;
format visit_date DATE8.;
set A (rename=(id = origid visit_date=next_date status=originstatus Event=originevent)) end=end ;
by origid;
if first.origid then call missing(Exit);
else Exit=intck('day',visit_date,next_date);
Start=lag(Exit);
if Start =. then Start=0;
if Exit = . then call missing(Start);
if _n_ > 1 then output;
visit_date = next_date;
id = origid;
Status=originstatus;
Event=originevent;
if not end then return;
call missing(next_date,Start,Exit);
output;
drop origid originstatus originevent next_date;
run;
How you calculate start and exit??
I did not. These values are from excel calculation.
Thanks
What's the business logic, and possibly even, how did Excel calculate it?
Forget about excel now.... Even your example is not complete and now clear logic behind.
You should now what you need as you said you need the data for applying proc phreg.
So now what start shoud represent and what end should represent? What you logic behind?
If you just need the days interval between each consecutive observation for the same ID, you can do that easily like
data A;
input ID visit_date: DATE8. status $ Event;
format visit_date DATE8.;
datalines;
46 27-Oct-08 Normal 1
46 23-Apr-09 Normal 0
46 3-Dec-09 Normal 0
46 6-Jan-11 Normal 0
53 22-Dec-08 Normal 0
53 25-Jun-09 Normal 0
53 22-Dec-09 Normal 1
54 17-Jan-09 Low 1
54 13-Aug-09 Normal 0
54 10-Jun-10 Normal 0
54 7-Apr-11 Normal 0
59 2-Feb-09 Normal 0
59 23-Jul-09 Normal 1
59 4-Feb-10 Normal 0
59 23-Jan-11 Normal 0
66 21-Apr-09 Normal 0
66 15-Oct-09 Normal 0
66 8-Apr-10 Normal 0
66 7-Apr-11 Normal 0
;
data want;
set A;
by ID;
lagdate=lag(visit_date);
if first.ID then interval=0;
else interval=intck('day',lagdate,visit_date);
drop lagdate;
run;
And if you need thhe interval value to be accumulative from first observation to the last observation under the same ID. Just change the code to do that easily too
data want;
set A;
by ID;
retain interval;
lagdate=lag(visit_date);
if first.ID then interval=0;
else interval =interval+intck('day',lagdate,visit_date);
drop lagdate;
run;
Hi mohamed_zaki and Reeza,
Thank you so much for quick response. My data set is a longitudinaI set, I have used first 'status' of each subject to evaluate the effect of status on 'event', the follow up time is last 'visit_date' -first visit_date. Proc phreg was used for analysis. To answer some critics, I want to test if changes of 'status' ( from normal to low or low-normal) in each interval have effects on 'event'. That is why I want to create two variables 'start' and 'exit' from data set A. Then I can run :
proc phreg;
model (start, exit) * event(0) =status;
run;
I may mislead you for data set B that is the data file I want to make.
Can you help me to make these two variables 'start' and 'exit'?
Thanks,
Xinjian
Maybe we need to be more explicit in questions
data B;
input ID visit_date status Event start exit;
datalines;
46 27-Oct-08 Normal 1 0 178 <= What specific calculation gets a value of 178 on this line? It looks like it is the number of days to the next date, but we need confirmation.
46 23-Apr-09 Normal 0 178 224
46 3-Dec-09 Normal 0 224 399
46 6-Jan-11 Normal 0 <= Does this one not have either a start or end because the next record is for a different ID?
53 22-Dec-08 Normal 0 0 185
53 25-Jun-09 Normal 0 <= Why does this record NOT have a start or end? Did you just stop calculating them by hand for examples?
53 22-Dec-09 Normal 1
54 17-Jan-09 Low 1
54 13-Aug-09 Normal 0
54 10-Jun-10 Normal 0
54 7-Apr-11 Normal 0
59 2-Feb-09 Normal 0
59 23-Jul-09 Normal 1
59 4-Feb-10 Normal 0
59 23-Jan-11 Normal 0
66 21-Apr-09 Normal 0
66 15-Oct-09 Normal 0
66 8-Apr-10 Normal 0
66 7-Apr-11 Normal 0
;
Hi Ballardw,
The first value of 'start' with ID 46 is 0 (use first visit_date as baseline), value of 'exit' = 2nd visit_date - 1st visit_date;
2nd value of 'start'=2nd visit_date - 1st visit_date, value of 'exit' = 3rd visit_date - 2nd visit_date;
3rd value of 'start'=3rd visit_date - 2nd visit_date, value of 'exit' = 4th visit_date - 3rdd visit_date;
in my data set, each subject has 4 measurement points ('status'). I just want to see if there is siginificant effect of 'status' on 'event' in each time interval. the fourth observation will be viewed as missing data.
yes, i only calculated 4 observations by using excel spreadsheet.
Thanks for your help,
Xinjian
One way to do it
data need ;
retain id visit_date Status Event Start;
format visit_date DATE8.;
set A (rename=(id = origid visit_date=next_date status=originstatus Event=originevent)) end=end ;
by origid;
if first.origid then call missing(Exit);
else Exit=intck('day',visit_date,next_date);
Start=lag(Exit);
if Start =. then Start=0;
if Exit = . then call missing(Start);
if _n_ > 1 then output;
visit_date = next_date;
id = origid;
Status=originstatus;
Event=originevent;
if not end then return;
call missing(next_date,Start,Exit);
output;
drop origid originstatus originevent next_date;
run;
Hi mohamed_zaki,
I tried your syntax. it works very well. Thanks so much for your help.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.