I have a long dataset that includes multiple records per participant. Each person has an overall date range (begin_date and end_date) for trips they have made to conduct business for their job. Additionally, each person has a date1 and a dot1 that indicated shorter duration trips that fall within their overall begin_date and end_date. What I need is a new variable that gives a trip number for each row of the data. So for example, in the small dataset below for participant 1, row 1 would be trip 1, so my new variable trip, would be equal to 1, rows 2-5 are trip 2 and there were 4 shorter trips associated with trip 2, then rows 6-9 are trip 3 and they were 4 shorter trips associated with the overall date range. You can tell if the date1 and dot1 dates are all within a larger trip if the dates fall within begin_date and end_date. So basically I want to add a variable called trip and for the example row 1 would be 1, rows 2-5 would be 2, and rows 6-9 would be 3.
I tried using the following code but it just gives me a count of all the rows essentially. I think the code doesn't work because I have repeated measures and it is hard to distinguish between dates but I was hoping to be able to do this without transforming the data. Any help would be greatly appreciated.
data want;
set have;
trip+1;
by begin_date;
if first.begin_date then trip=1;
else if date1>end_date and dot1<=end_date then trip=trip+1;
run;
ID | begin_date | end_date | date1 | dot1 |
---|---|---|---|---|
1 | 02/12/06 | 04/29/06 | 02/12/06 | 04/29/16 |
1 | 03/14/07 | 01/15/08 | 03/14/07 | 04/18/07 |
1 | 03/14/07 | 01/15/08 | 04/18/07 | 05/16/07 |
1 | 03/14/07 | 01/15/08 | 05/16/07 | 09/02/07 |
1 | 03/14/07 | 01/15/08 | 09/02/07 | 01//15/08 |
1 | 05/15/10 | 08/05/12 | 06/26/10 | 08/15/10 |
1 | 05/15/10 | 08/05/12 | 01/20/11 | 01/25/11 |
1 | 05/15/10 | 08/05/12 | 02/01/11 | 08/01/11 |
1 | 05/15/10 | 08/05/12 | 04/01/12 | 04/03/12 |
Here you go, let me know if I missed a step:
data have;
infile cards dsd;
informat ID $1. begin_date end_date date1 dot1 mmddyy8.;
format ID $1. begin_date end_date date1 dot1 mmddyy8.;
input ID begin_date end_date date1 dot1;
cards;
1,02/12/06,04/29/06,02/12/06,04/29/16
1,03/14/07,01/15/08,03/14/07,04/18/07
1,03/14/07,01/15/08,04/18/07,05/16/07
1,03/14/07,01/15/08,05/16/07,09/02/07
1,03/14/07,01/15/08,09/02/07,01/15/08
1,05/15/10,08/05/12,06/26/10,08/15/10
1,05/15/10,08/05/12,01/20/11,01/25/11
1,05/15/10,08/05/12,02/01/11,08/01/11
1,05/15/10,08/05/12,04/01/12,04/03/12
;
data want;
set have;
by id begin_date;
if first.begin_date then trip + 1;
run;
Here you go, let me know if I missed a step:
data have;
infile cards dsd;
informat ID $1. begin_date end_date date1 dot1 mmddyy8.;
format ID $1. begin_date end_date date1 dot1 mmddyy8.;
input ID begin_date end_date date1 dot1;
cards;
1,02/12/06,04/29/06,02/12/06,04/29/16
1,03/14/07,01/15/08,03/14/07,04/18/07
1,03/14/07,01/15/08,04/18/07,05/16/07
1,03/14/07,01/15/08,05/16/07,09/02/07
1,03/14/07,01/15/08,09/02/07,01/15/08
1,05/15/10,08/05/12,06/26/10,08/15/10
1,05/15/10,08/05/12,01/20/11,01/25/11
1,05/15/10,08/05/12,02/01/11,08/01/11
1,05/15/10,08/05/12,04/01/12,04/03/12
;
data want;
set have;
by id begin_date;
if first.begin_date then trip + 1;
run;
May want to modify Mark's code with a reset for each ID if the idea is to have trips per Id.
by id begin_date;
if first.id then trip=0; /* reset for each id*/
if first.begin_date then trip + 1;
run;
That was much simpler than I thought, thank you both for your input, the code worked.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.