Hi everyone,
I have a problem assigning proper dates for different ids.
Please help me out. I really appreciate it.
For different ids, if the enter_date is the same as the previous leave_date or within the previous leave_date + 30 days, consider it as re-enter, then use the first enter_date as the enter_date, and the last leave_date as leave_date.
The following is the original data and the data I would like it to look like. The data has been sorted.
have | |||
id | enter_date | leave_date | service_date |
1 | 1/21/19 | 1/27/19 | 1/21/19 |
1 | 2/25/19 | 3/9/19 | 2/25/19 |
1 | 4/1/19 | 4/2/19 | 4/1/19 |
1 | 5/1/19 | 5/5/19 | 5/1/19 |
2 | 3/1/18 | 3/4/18 | 3/1/18 |
2 | 3/10/18 | 6/10/18 | 3/7/18 |
3 | 6/7/18 | 7/5/18 | 6/2/18 |
3 | 7/3/19 | 7/9/19 | 7/3/19 |
3 | 8/11/19 | 8/30/19 | 8/11/19 |
3 | 9/1/19 | 9/5/19 | 9/1/19 |
4 | 1/31/18 | 2/14/18 | 1/31/18 |
4 | 2/14/18 | 5/14/18 | 2/14/18 |
4 | 6/1/18 | 6/7/18 | 6/1/18 |
5 | 11/12/18 | 12/12/18 | 11/11/18 |
5 | 12/28/18 | 2/1/19 | 12/28/18 |
6 | 1/1/19 | 2/1/19 | 1/1/19 |
7 | 2/21/18 | 2/24/18 | 2/21/18 |
7 | 2/27/18 | 3/3/18 | 2/26/18 |
7 | 10/10/18 | 10/23/18 | 10/10/18 |
7 | 11/1/18 | 12/1/18 | 11/1/18 |
want | |||
id | enter_date | leave_date | service_date |
1 | 1/21/19 | 5/5/19 | 1/21/19 |
1 | 1/21/19 | 5/5/19 | 2/25/19 |
1 | 1/21/19 | 5/5/19 | 4/1/19 |
1 | 1/21/19 | 5/5/19 | 5/1/19 |
2 | 3/1/18 | 6/10/18 | 3/1/18 |
2 | 3/1/18 | 6/10/18 | 3/7/18 |
3 | 6/7/18 | 7/5/18 | 6/2/18 |
3 | 7/3/19 | 9/5/19 | 7/3/19 |
3 | 7/3/19 | 9/5/19 | 8/11/19 |
3 | 7/3/19 | 9/5/19 | 9/1/19 |
4 | 1/31/18 | 6/7/18 | 1/31/18 |
4 | 1/31/18 | 6/7/18 | 2/14/18 |
4 | 1/31/18 | 6/7/18 | 6/1/18 |
5 | 11/12/18 | 2/1/19 | 11/11/18 |
5 | 11/12/18 | 2/1/19 | 12/28/18 |
6 | 1/1/19 | 2/1/19 | 1/1/19 |
7 | 2/21/18 | 3/3/18 | 2/21/18 |
7 | 2/21/18 | 3/3/18 | 2/26/18 |
7 | 10/10/18 | 12/1/18 | 10/10/18 |
7 | 10/10/18 | 12/1/18 | 11/1/18 |
Thank you very much for taking the time to help me out!
Amber
data have;
infile datalines ;
input id (enter_date leave_date service_date) ( :mmddyy9.);
format enter_date -- service_date date9.;
datalines;
1 1/21/19 1/27/19 1/21/19
1 2/25/19 3/9/19 2/25/19
1 4/1/19 4/2/19 4/1/19
1 5/1/19 5/5/19 5/1/19
2 3/1/18 3/4/18 3/1/18
2 3/10/18 6/10/18 3/7/18
3 6/7/18 7/5/18 6/2/18
3 7/3/19 7/9/19 7/3/19
3 8/11/19 8/30/19 8/11/19
3 9/1/19 9/5/19 9/1/19
4 1/31/18 2/14/18 1/31/18
4 2/14/18 5/14/18 2/14/18
4 6/1/18 6/7/18 6/1/18
5 11/12/18 12/12/18 11/11/18
5 12/28/18 2/1/19 12/28/18
6 1/1/19 2/1/19 1/1/19
7 2/21/18 2/24/18 2/21/18
7 2/27/18 3/3/18 2/26/18
7 10/10/18 10/23/18 10/10/18
7 11/1/18 12/1/18 11/1/18
;
run;
data temp;
set have;
by id;
if first.id or
lag(leave_date)>enter_date or
enter_date>lag(leave_date)+30 then group+1;
run;
proc sql;
create table want as
select *,min(enter_date) as new_enter_date format=date9.,
max(leave_date) as new_leave_date format=date9.
from temp
group by group;
quit;
The first row of your sample data has leave_date=1/21/2019, and the second row has enter_date=2/27/2019. That's more than a 30-day difference - it's 31 days. But you kept the record with the remaining ID=1 records. Was that intentional?
Thanks for pointing it out. It’s a typo I have corrected it.
As you read a set of records, you need to compare the current-record LEAVE_DATE to the next record's ENTER_DATE. You can preserve the first ENTER_DATE in an extra variable (_ENTER1). If the gap is greater than 30 days, or you are at the end of an ID group, then you can re-read the same records (without re-reading LEAVE_DATE or ENTER_DATE), and output those records.
You don't need to re-read ENTER_DATE because you can replace it with _ENTER1. And you don't need to re-read LEAVE_DATE because the LEAVE_DATE at the end of the first pass is the one to keep:
data have;
infile datalines ;
input id (enter_date leave_date service_date) ( :mmddyy9.);
format enter_date -- service_date date9.;
datalines;
1 1/21/19 1/27/19 1/21/19
1 2/25/19 3/9/19 2/25/19
1 4/1/19 4/2/19 4/1/19
1 5/1/19 5/5/19 5/1/19
2 3/1/18 3/4/18 3/1/18
2 3/10/18 6/10/18 3/7/18
3 6/7/18 7/5/18 6/2/18
3 7/3/19 7/9/19 7/3/19
3 8/11/19 8/30/19 8/11/19
3 9/1/19 9/5/19 9/1/19
4 1/31/18 2/14/18 1/31/18
4 2/14/18 5/14/18 2/14/18
4 6/1/18 6/7/18 6/1/18
5 11/12/18 12/12/18 11/11/18
5 12/28/18 2/1/19 12/28/18
6 1/1/19 2/1/19 1/1/19
7 2/21/18 2/24/18 2/21/18
7 2/27/18 3/3/18 2/26/18
7 10/10/18 10/23/18 10/10/18
7 11/1/18 12/1/18 11/1/18
run;
data want (drop=_:);
do _n=1 by 1 until (last.id=1 or _nxt_enter > leave_date + 30);
set have (keep=id);
by id;
merge have
have (firstobs=2 keep=enter_date rename=(enter_date=_nxt_enter));
if _n=1 then _enter1=enter_date;
end;
enter_date=_enter1;
do _i=1 to _n;
set have (drop=enter_date leave_date);
output;
end;
run;
Thank you so much for such a detailed answer! The logic is absolutely correct, but I don't know why it also change the dates of the records which are not qualified as re-enter. Basically it changes the records to the same enter_date and leave_date for different ids .
data have;
infile datalines ;
input id (enter_date leave_date service_date) ( :mmddyy9.);
format enter_date -- service_date date9.;
datalines;
1 1/21/19 1/27/19 1/21/19
1 2/25/19 3/9/19 2/25/19
1 4/1/19 4/2/19 4/1/19
1 5/1/19 5/5/19 5/1/19
2 3/1/18 3/4/18 3/1/18
2 3/10/18 6/10/18 3/7/18
3 6/7/18 7/5/18 6/2/18
3 7/3/19 7/9/19 7/3/19
3 8/11/19 8/30/19 8/11/19
3 9/1/19 9/5/19 9/1/19
4 1/31/18 2/14/18 1/31/18
4 2/14/18 5/14/18 2/14/18
4 6/1/18 6/7/18 6/1/18
5 11/12/18 12/12/18 11/11/18
5 12/28/18 2/1/19 12/28/18
6 1/1/19 2/1/19 1/1/19
7 2/21/18 2/24/18 2/21/18
7 2/27/18 3/3/18 2/26/18
7 10/10/18 10/23/18 10/10/18
7 11/1/18 12/1/18 11/1/18
;
run;
data temp;
set have;
by id;
if first.id or
lag(leave_date)>enter_date or
enter_date>lag(leave_date)+30 then group+1;
run;
proc sql;
create table want as
select *,min(enter_date) as new_enter_date format=date9.,
max(leave_date) as new_leave_date format=date9.
from temp
group by group;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.