Solved: How to assign dates for rows

amber7391 · Posted 10-31-2019 08:44 PM

Hi everyone,

I have a problem assigning proper dates for different ids.

Please help me out. I really appreciate it.

For different ids, if the enter_date is the same as the previous leave_date or within the previous leave_date + 30 days, consider it as re-enter, then use the first enter_date as the enter_date, and the last leave_date as leave_date.

The following is the original data and the data I would like it to look like. The data has been sorted.

have
id	enter_date	leave_date	service_date
1	1/21/19	1/27/19	1/21/19
1	2/25/19	3/9/19	2/25/19
1	4/1/19	4/2/19	4/1/19
1	5/1/19	5/5/19	5/1/19
2	3/1/18	3/4/18	3/1/18
2	3/10/18	6/10/18	3/7/18
3	6/7/18	7/5/18	6/2/18
3	7/3/19	7/9/19	7/3/19
3	8/11/19	8/30/19	8/11/19
3	9/1/19	9/5/19	9/1/19
4	1/31/18	2/14/18	1/31/18
4	2/14/18	5/14/18	2/14/18
4	6/1/18	6/7/18	6/1/18
5	11/12/18	12/12/18	11/11/18
5	12/28/18	2/1/19	12/28/18
6	1/1/19	2/1/19	1/1/19
7	2/21/18	2/24/18	2/21/18
7	2/27/18	3/3/18	2/26/18
7	10/10/18	10/23/18	10/10/18
7	11/1/18	12/1/18	11/1/18

want
id	enter_date	leave_date	service_date
1	1/21/19	5/5/19	1/21/19
1	1/21/19	5/5/19	2/25/19
1	1/21/19	5/5/19	4/1/19
1	1/21/19	5/5/19	5/1/19
2	3/1/18	6/10/18	3/1/18
2	3/1/18	6/10/18	3/7/18
3	6/7/18	7/5/18	6/2/18
3	7/3/19	9/5/19	7/3/19
3	7/3/19	9/5/19	8/11/19
3	7/3/19	9/5/19	9/1/19
4	1/31/18	6/7/18	1/31/18
4	1/31/18	6/7/18	2/14/18
4	1/31/18	6/7/18	6/1/18
5	11/12/18	2/1/19	11/11/18
5	11/12/18	2/1/19	12/28/18
6	1/1/19	2/1/19	1/1/19
7	2/21/18	3/3/18	2/21/18
7	2/21/18	3/3/18	2/26/18
7	10/10/18	12/1/18	10/10/18
7	10/10/18	12/1/18	11/1/18

Thank you very much for taking the time to help me out!

Amber

Ksharp · Posted 11-01-2019 08:00 AM

data have;
  infile datalines ;
  input id  (enter_date leave_date service_date) ( :mmddyy9.);
  format  enter_date -- service_date date9.;
datalines;
1 1/21/19  1/27/19  1/21/19 
1 2/25/19  3/9/19   2/25/19 
1 4/1/19   4/2/19   4/1/19 
1 5/1/19   5/5/19   5/1/19 
2 3/1/18   3/4/18   3/1/18 
2 3/10/18  6/10/18  3/7/18 
3 6/7/18   7/5/18   6/2/18 
3 7/3/19   7/9/19   7/3/19 
3 8/11/19  8/30/19  8/11/19 
3 9/1/19   9/5/19   9/1/19 
4 1/31/18  2/14/18  1/31/18 
4 2/14/18  5/14/18  2/14/18 
4 6/1/18   6/7/18   6/1/18 
5 11/12/18 12/12/18 11/11/18 
5 12/28/18 2/1/19   12/28/18 
6 1/1/19   2/1/19   1/1/19 
7 2/21/18  2/24/18  2/21/18 
7 2/27/18  3/3/18   2/26/18 
7 10/10/18 10/23/18 10/10/18 
7 11/1/18  12/1/18  11/1/18 
;
run;
data temp;
 set have;
 by id;
 if first.id or 
 lag(leave_date)>enter_date or
 enter_date>lag(leave_date)+30 then group+1;
 run;

 proc sql;
 create table want as
 select *,min(enter_date) as new_enter_date format=date9.,
 max(leave_date) as new_leave_date format=date9.
  from temp 
   group by group;
quit;

View solution in original post

mkeintz · Posted 10-31-2019 11:35 PM

The first row of your sample data has leave_date=1/21/2019, and the second row has enter_date=2/27/2019. That's more than a 30-day difference - it's 31 days. But you kept the record with the remaining ID=1 records. Was that intentional?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

amber7391 · Posted 10-31-2019 11:52 PM

Thanks for pointing it out. It’s a typo I have corrected it.

mkeintz · Posted 11-01-2019 12:15 AM

As you read a set of records, you need to compare the current-record LEAVE_DATE to the next record's ENTER_DATE. You can preserve the first ENTER_DATE in an extra variable (_ENTER1). If the gap is greater than 30 days, or you are at the end of an ID group, then you can re-read the same records (without re-reading LEAVE_DATE or ENTER_DATE), and output those records.

You don't need to re-read ENTER_DATE because you can replace it with _ENTER1. And you don't need to re-read LEAVE_DATE because the LEAVE_DATE at the end of the first pass is the one to keep:

data have;
  infile datalines ;
  input id  (enter_date leave_date service_date) ( :mmddyy9.);
  format  enter_date -- service_date date9.;
datalines;
1 1/21/19  1/27/19  1/21/19 
1 2/25/19  3/9/19   2/25/19 
1 4/1/19   4/2/19   4/1/19 
1 5/1/19   5/5/19   5/1/19 
2 3/1/18   3/4/18   3/1/18 
2 3/10/18  6/10/18  3/7/18 
3 6/7/18   7/5/18   6/2/18 
3 7/3/19   7/9/19   7/3/19 
3 8/11/19  8/30/19  8/11/19 
3 9/1/19   9/5/19   9/1/19 
4 1/31/18  2/14/18  1/31/18 
4 2/14/18  5/14/18  2/14/18 
4 6/1/18   6/7/18   6/1/18 
5 11/12/18 12/12/18 11/11/18 
5 12/28/18 2/1/19   12/28/18 
6 1/1/19   2/1/19   1/1/19 
7 2/21/18  2/24/18  2/21/18 
7 2/27/18  3/3/18   2/26/18 
7 10/10/18 10/23/18 10/10/18 
7 11/1/18  12/1/18  11/1/18 
run;

data want (drop=_:);
  do _n=1 by 1 until (last.id=1 or _nxt_enter > leave_date + 30);
    set have (keep=id);
    by id;
    merge have 
          have (firstobs=2 keep=enter_date rename=(enter_date=_nxt_enter));
    if _n=1 then _enter1=enter_date;
  end;
  enter_date=_enter1;
  do _i=1 to _n;
    set have (drop=enter_date leave_date);
    output;
  end;
run;

If the first do group there is both a SET statement and a MERGE statement. These are parallel streams of data. The SET statement keeps only the ID variable and is accompanied by a "BY ID" statement, which produces the dummy variable LAST.ID (indicating that the record in hand is the last one for the current ID.
The MERGE statement reads the identical record as the SET statement and simulaneously one variable (ENTER_DATE renamed to _NXT_ENTER) from the next record (due to the "FIRSTOBS=2" parameter). The _NXT_ENTER is compared to the current LEAVE_DATE+30 at the end of every do-loop iteration, because of the "UNTIL …" condition.
Note the "KEEP=ID" in the first SET statement is not absolutely necessary (all the variables are re-read in the MERGE statement). But I use it as a technique to remind me of why I use the SET statement (just to implement the "BY ID" statement).
Once the UNTIL condition is satisfied copy _ENTER1 to ENTER_DATE,
The re-read exactly _N records, keeping all the variables except ENTER_DATE and LEAVE_DATE, and output each of those re-read records.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

amber7391 · Posted 11-01-2019 09:55 AM

Thank you so much for such a detailed answer! The logic is absolutely correct, but I don't know why it also change the dates of the records which are not qualified as re-enter. Basically it changes the records to the same enter_date and leave_date for different ids .

Ksharp · Posted 11-01-2019 08:00 AM

data have;
  infile datalines ;
  input id  (enter_date leave_date service_date) ( :mmddyy9.);
  format  enter_date -- service_date date9.;
datalines;
1 1/21/19  1/27/19  1/21/19 
1 2/25/19  3/9/19   2/25/19 
1 4/1/19   4/2/19   4/1/19 
1 5/1/19   5/5/19   5/1/19 
2 3/1/18   3/4/18   3/1/18 
2 3/10/18  6/10/18  3/7/18 
3 6/7/18   7/5/18   6/2/18 
3 7/3/19   7/9/19   7/3/19 
3 8/11/19  8/30/19  8/11/19 
3 9/1/19   9/5/19   9/1/19 
4 1/31/18  2/14/18  1/31/18 
4 2/14/18  5/14/18  2/14/18 
4 6/1/18   6/7/18   6/1/18 
5 11/12/18 12/12/18 11/11/18 
5 12/28/18 2/1/19   12/28/18 
6 1/1/19   2/1/19   1/1/19 
7 2/21/18  2/24/18  2/21/18 
7 2/27/18  3/3/18   2/26/18 
7 10/10/18 10/23/18 10/10/18 
7 11/1/18  12/1/18  11/1/18 
;
run;
data temp;
 set have;
 by id;
 if first.id or 
 lag(leave_date)>enter_date or
 enter_date>lag(leave_date)+30 then group+1;
 run;

 proc sql;
 create table want as
 select *,min(enter_date) as new_enter_date format=date9.,
 max(leave_date) as new_leave_date format=date9.
  from temp 
   group by group;
quit;

amber7391 · Posted 11-01-2019 11:19 AM

It works perfectly! Thank you very much!

How to assign dates for rows

Re: How to assign dates for rows

Re: How to assign dates for rows

Re: How to assign dates for rows

Re: How to assign dates for rows

Re: How to assign dates for rows

Re: How to assign dates for rows

Re: How to assign dates for rows

SAS Innovate 2025: Call for Content

Classroom Training Available!