BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xinjian
Calcite | Level 5

Hi Dear All,

I have a data set below:

data A;

input ID visit_date status Event;

datalines;

46 27-Oct-08 Normal 1

46 23-Apr-09 Normal 0

46 3-Dec-09 Normal 0

46 6-Jan-11 Normal 0

53 22-Dec-08 Normal 0

53 25-Jun-09 Normal 0

53 22-Dec-09 Normal 1

54 17-Jan-09 Low 1

54 13-Aug-09 Normal 0

54 10-Jun-10 Normal 0

54 7-Apr-11 Normal 0

59 2-Feb-09 Normal 0

59 23-Jul-09 Normal 1

59 4-Feb-10 Normal 0

59 23-Jan-11 Normal 0

66 21-Apr-09 Normal 0

66 15-Oct-09 Normal 0

66 8-Apr-10 Normal 0

66 7-Apr-11 Normal 0

;

I want to create two new variables: start, and exit. The new data set like below:

data B;

input ID visit_date status Event start exit;

datalines;

46 27-Oct-08 Normal 1 0 178

46 23-Apr-09 Normal 0 178 224

46 3-Dec-09 Normal 0 224 399

46 6-Jan-11 Normal 0

53 22-Dec-08 Normal 0 0 185

53 25-Jun-09 Normal 0

53 22-Dec-09 Normal 1

54 17-Jan-09 Low 1

54 13-Aug-09 Normal 0

54 10-Jun-10 Normal 0

54 7-Apr-11 Normal 0

59 2-Feb-09 Normal 0

59 23-Jul-09 Normal 1

59 4-Feb-10 Normal 0

59 23-Jan-11 Normal 0

66 21-Apr-09 Normal 0

66 15-Oct-09 Normal 0

66 8-Apr-10 Normal 0

66 7-Apr-11 Normal 0

;

 

is there a quick way to create variables 'start' and 'exit' based on my data set?  Then I can use proc phreg to evaluate the effect of status in each time interval on event.

Thanks so much for your help,

Xinjian

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mohamed_zaki
Barite | Level 11

 

One way to do it

 

data need ;
retain id visit_date Status Event Start;
format visit_date DATE8.;
set A (rename=(id = origid visit_date=next_date status=originstatus Event=originevent)) end=end ;
by origid;
if first.origid then call missing(Exit); 
else Exit=intck('day',visit_date,next_date);
Start=lag(Exit);
if Start =. then Start=0;
if Exit = . then call missing(Start);
if _n_ > 1 then output;
visit_date = next_date;
id = origid;
Status=originstatus; 
Event=originevent;
if not end then return;
call missing(next_date,Start,Exit);
output;
drop origid originstatus originevent next_date;
run;

View solution in original post

9 REPLIES 9
mohamed_zaki
Barite | Level 11

How you calculate start and exit??

xinjian
Calcite | Level 5

I did not. These values are from excel calculation.

Thanks

Reeza
Super User

What's the business logic, and possibly even, how did Excel calculate it?

mohamed_zaki
Barite | Level 11

Forget about excel now.... Even your example is not complete and now clear logic behind.

You should now what you need as you said you need the data for applying proc phreg.

So now what start shoud represent and what end should represent? What you logic behind?

 

If you just need the days interval between each consecutive observation for the same ID, you can do that easily like

data A;
input ID visit_date: DATE8. status $ Event;
format visit_date DATE8.;
datalines;
46 27-Oct-08 Normal 1
46 23-Apr-09 Normal 0
46 3-Dec-09 Normal 0
46 6-Jan-11 Normal 0
53 22-Dec-08 Normal 0
53 25-Jun-09 Normal 0
53 22-Dec-09 Normal 1
54 17-Jan-09 Low 1
54 13-Aug-09 Normal 0
54 10-Jun-10 Normal 0
54 7-Apr-11 Normal 0
59 2-Feb-09 Normal 0
59 23-Jul-09 Normal 1
59 4-Feb-10 Normal 0
59 23-Jan-11 Normal 0
66 21-Apr-09 Normal 0
66 15-Oct-09 Normal 0
66 8-Apr-10 Normal 0
66 7-Apr-11 Normal 0
;
data want;
set A;
by ID;
lagdate=lag(visit_date);
if first.ID then interval=0;
else interval=intck('day',lagdate,visit_date);
drop lagdate;
run;

 And if you need thhe interval value to be accumulative from first observation to the last observation under the same ID. Just change the code to do that easily too

data want;
set A;
by ID;
retain interval;
lagdate=lag(visit_date);
if first.ID then interval=0;
else interval =interval+intck('day',lagdate,visit_date);
drop lagdate;
run;
xinjian
Calcite | Level 5

Hi  and Reeza,

Thank you  so much for quick response. My data set is a longitudinaI set, I have used first 'status' of each subject to evaluate the effect of status on 'event', the follow up time is last 'visit_date' -first visit_date. Proc phreg was used for analysis. To answer some critics, I want to test if changes of 'status' ( from normal to low or  low-normal) in each interval have effects on 'event'. That is why I want to create two variables 'start' and 'exit' from data set A. Then I can run :

proc phreg;

model (start, exit) * event(0) =status;

run;

 

I may mislead you for data set B that is the data file I want to make.

Can you help me to make these two variables 'start' and 'exit'?

Thanks,

Xinjian

  

ballardw
Super User

Maybe we need to be more explicit in questions

data B;

input ID visit_date status Event start exit;

datalines;

46 27-Oct-08 Normal 1 0 178   <= What specific calculation gets a value of 178 on this line? It looks like it is the number of days to the next date, but we need confirmation.

46 23-Apr-09 Normal 0 178 224

46 3-Dec-09 Normal 0 224 399

46 6-Jan-11 Normal 0   <= Does this one not have either a start or end because the next record is for a different ID?

53 22-Dec-08 Normal 0 0 185

53 25-Jun-09 Normal 0  <= Why does this record NOT have a start or end? Did you just stop calculating them by hand for examples?

53 22-Dec-09 Normal 1

54 17-Jan-09 Low 1

54 13-Aug-09 Normal 0

54 10-Jun-10 Normal 0

54 7-Apr-11 Normal 0

59 2-Feb-09 Normal 0

59 23-Jul-09 Normal 1

59 4-Feb-10 Normal 0

59 23-Jan-11 Normal 0

66 21-Apr-09 Normal 0

66 15-Oct-09 Normal 0

66 8-Apr-10 Normal 0

66 7-Apr-11 Normal 0

;
xinjian
Calcite | Level 5

Hi Ballardw,

The first value of 'start' with ID 46 is 0 (use first visit_date as baseline), value of 'exit' = 2nd visit_date - 1st visit_date;

2nd value of 'start'=2nd visit_date - 1st visit_date, value of 'exit' = 3rd visit_date - 2nd visit_date;

3rd value of 'start'=3rd visit_date - 2nd visit_date, value of 'exit' = 4th visit_date - 3rdd visit_date;

 

in my data  set, each subject has 4 measurement points ('status'). I just want to see if there is siginificant effect of 'status' on 'event' in each time interval. the fourth observation will be viewed as missing data.

 

yes, i only calculated 4 observations by using excel spreadsheet.

Thanks for your help,

Xinjian

 

mohamed_zaki
Barite | Level 11

 

One way to do it

 

data need ;
retain id visit_date Status Event Start;
format visit_date DATE8.;
set A (rename=(id = origid visit_date=next_date status=originstatus Event=originevent)) end=end ;
by origid;
if first.origid then call missing(Exit); 
else Exit=intck('day',visit_date,next_date);
Start=lag(Exit);
if Start =. then Start=0;
if Exit = . then call missing(Start);
if _n_ > 1 then output;
visit_date = next_date;
id = origid;
Status=originstatus; 
Event=originevent;
if not end then return;
call missing(next_date,Start,Exit);
output;
drop origid originstatus originevent next_date;
run;
xinjian
Calcite | Level 5

Hi ,

I tried your syntax. it works very well. Thanks so much for your help.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2090 views
  • 0 likes
  • 4 in conversation