I have a long dataset that includes multiple records per participant. Each person has an overall date range (begin_date and end_date) for trips they have made to conduct business for their job. Additionally, each person has a date1 and a dot1 that indicated shorter duration trips that fall within their overall begin_date and end_date. What I need is a new variable that gives a trip number for each row of the data. So for example, in the small dataset below for participant 1, row 1 would be trip 1, so my new variable trip, would be equal to 1, rows 2-5 are trip 2 and there were 4 shorter trips associated with trip 2, then rows 6-9 are trip 3 and they were 4 shorter trips associated with the overall date range. You can tell if the date1 and dot1 dates are all within a larger trip if the dates fall within begin_date and end_date. So basically I want to add a variable called trip and for the example row 1 would be 1, rows 2-5 would be 2, and rows 6-9 would be 3.
I tried using the following code but it just gives me a count of all the rows essentially. I think the code doesn't work because I have repeated measures and it is hard to distinguish between dates but I was hoping to be able to do this without transforming the data. Any help would be greatly appreciated.
data want;
set have;
trip+1;
by begin_date;
if first.begin_date then trip=1;
else if date1>end_date and dot1<=end_date then trip=trip+1;
run;
ID | begin_date | end_date | date1 | dot1 |
---|---|---|---|---|
1 | 02/12/06 | 04/29/06 | 02/12/06 | 04/29/16 |
1 | 03/14/07 | 01/15/08 | 03/14/07 | 04/18/07 |
1 | 03/14/07 | 01/15/08 | 04/18/07 | 05/16/07 |
1 | 03/14/07 | 01/15/08 | 05/16/07 | 09/02/07 |
1 | 03/14/07 | 01/15/08 | 09/02/07 | 01//15/08 |
1 | 05/15/10 | 08/05/12 | 06/26/10 | 08/15/10 |
1 | 05/15/10 | 08/05/12 | 01/20/11 | 01/25/11 |
1 | 05/15/10 | 08/05/12 | 02/01/11 | 08/01/11 |
1 | 05/15/10 | 08/05/12 | 04/01/12 | 04/03/12 |
Here you go, let me know if I missed a step:
data have;
infile cards dsd;
informat ID $1. begin_date end_date date1 dot1 mmddyy8.;
format ID $1. begin_date end_date date1 dot1 mmddyy8.;
input ID begin_date end_date date1 dot1;
cards;
1,02/12/06,04/29/06,02/12/06,04/29/16
1,03/14/07,01/15/08,03/14/07,04/18/07
1,03/14/07,01/15/08,04/18/07,05/16/07
1,03/14/07,01/15/08,05/16/07,09/02/07
1,03/14/07,01/15/08,09/02/07,01/15/08
1,05/15/10,08/05/12,06/26/10,08/15/10
1,05/15/10,08/05/12,01/20/11,01/25/11
1,05/15/10,08/05/12,02/01/11,08/01/11
1,05/15/10,08/05/12,04/01/12,04/03/12
;
data want;
set have;
by id begin_date;
if first.begin_date then trip + 1;
run;
Here you go, let me know if I missed a step:
data have;
infile cards dsd;
informat ID $1. begin_date end_date date1 dot1 mmddyy8.;
format ID $1. begin_date end_date date1 dot1 mmddyy8.;
input ID begin_date end_date date1 dot1;
cards;
1,02/12/06,04/29/06,02/12/06,04/29/16
1,03/14/07,01/15/08,03/14/07,04/18/07
1,03/14/07,01/15/08,04/18/07,05/16/07
1,03/14/07,01/15/08,05/16/07,09/02/07
1,03/14/07,01/15/08,09/02/07,01/15/08
1,05/15/10,08/05/12,06/26/10,08/15/10
1,05/15/10,08/05/12,01/20/11,01/25/11
1,05/15/10,08/05/12,02/01/11,08/01/11
1,05/15/10,08/05/12,04/01/12,04/03/12
;
data want;
set have;
by id begin_date;
if first.begin_date then trip + 1;
run;
May want to modify Mark's code with a reset for each ID if the idea is to have trips per Id.
by id begin_date;
if first.id then trip=0; /* reset for each id*/
if first.begin_date then trip + 1;
run;
That was much simpler than I thought, thank you both for your input, the code worked.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.