BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
amber7391
Fluorite | Level 6

Hi everyone, 

I have a problem assigning proper dates for different ids.

Please help me out. I really appreciate it. 

 

For different ids, if the enter_date is the same as the previous leave_date or within the previous leave_date + 30 days, consider it as re-enter, then use the first enter_date as the enter_date, and the last leave_date as leave_date. 

 

The following is the original data and the data I would like it to look like. The data has been sorted.

 

have   
identer_dateleave_dateservice_date
11/21/191/27/191/21/19
12/25/193/9/192/25/19
14/1/194/2/194/1/19
15/1/195/5/195/1/19
23/1/183/4/183/1/18
23/10/186/10/183/7/18
36/7/187/5/186/2/18
37/3/197/9/197/3/19
38/11/198/30/198/11/19
39/1/199/5/199/1/19
41/31/182/14/181/31/18
42/14/185/14/182/14/18
46/1/186/7/186/1/18
511/12/1812/12/1811/11/18
512/28/182/1/1912/28/18
61/1/192/1/191/1/19
72/21/182/24/182/21/18
72/27/183/3/182/26/18
710/10/1810/23/1810/10/18
711/1/1812/1/1811/1/18

 

want   
identer_dateleave_dateservice_date
11/21/195/5/191/21/19
11/21/195/5/192/25/19
11/21/195/5/194/1/19
11/21/195/5/195/1/19
23/1/186/10/183/1/18
23/1/186/10/183/7/18
36/7/187/5/186/2/18
37/3/199/5/197/3/19
37/3/199/5/198/11/19
37/3/199/5/199/1/19
41/31/186/7/181/31/18
41/31/186/7/182/14/18
41/31/186/7/186/1/18
511/12/182/1/1911/11/18
511/12/182/1/1912/28/18
61/1/192/1/191/1/19
72/21/183/3/182/21/18
72/21/183/3/182/26/18
710/10/1812/1/1810/10/18
710/10/1812/1/1811/1/18

 

Thank you very much for taking the time to help me out!

 

Amber

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
  infile datalines ;
  input id  (enter_date leave_date service_date) ( :mmddyy9.);
  format  enter_date -- service_date date9.;
datalines;
1 1/21/19  1/27/19  1/21/19 
1 2/25/19  3/9/19   2/25/19 
1 4/1/19   4/2/19   4/1/19 
1 5/1/19   5/5/19   5/1/19 
2 3/1/18   3/4/18   3/1/18 
2 3/10/18  6/10/18  3/7/18 
3 6/7/18   7/5/18   6/2/18 
3 7/3/19   7/9/19   7/3/19 
3 8/11/19  8/30/19  8/11/19 
3 9/1/19   9/5/19   9/1/19 
4 1/31/18  2/14/18  1/31/18 
4 2/14/18  5/14/18  2/14/18 
4 6/1/18   6/7/18   6/1/18 
5 11/12/18 12/12/18 11/11/18 
5 12/28/18 2/1/19   12/28/18 
6 1/1/19   2/1/19   1/1/19 
7 2/21/18  2/24/18  2/21/18 
7 2/27/18  3/3/18   2/26/18 
7 10/10/18 10/23/18 10/10/18 
7 11/1/18  12/1/18  11/1/18 
;
run;
data temp;
 set have;
 by id;
 if first.id or 
 lag(leave_date)>enter_date or
 enter_date>lag(leave_date)+30 then group+1;
 run;

 proc sql;
 create table want as
 select *,min(enter_date) as new_enter_date format=date9.,
 max(leave_date) as new_leave_date format=date9.
  from temp 
   group by group;
quit;

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

The first row of your sample data has leave_date=1/21/2019, and the second row has enter_date=2/27/2019.  That's more than a 30-day difference - it's 31 days.   But you kept the record with the remaining ID=1 records.  Was that intentional?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
amber7391
Fluorite | Level 6

Thanks for pointing it out. It’s a typo I have corrected it. 

mkeintz
PROC Star

As you read a set of records, you need to compare the current-record LEAVE_DATE to the next record's ENTER_DATE.  You can preserve the first ENTER_DATE in an extra variable (_ENTER1).  If the gap is greater than 30 days, or you are at the end of an ID group, then you can re-read the same records (without re-reading LEAVE_DATE or ENTER_DATE), and output those records.

 

You don't need to re-read ENTER_DATE because you can replace it with _ENTER1.  And you don't need to re-read LEAVE_DATE because the LEAVE_DATE at the end of the first pass is the one to keep:

 

data have;
  infile datalines ;
  input id  (enter_date leave_date service_date) ( :mmddyy9.);
  format  enter_date -- service_date date9.;
datalines;
1 1/21/19  1/27/19  1/21/19 
1 2/25/19  3/9/19   2/25/19 
1 4/1/19   4/2/19   4/1/19 
1 5/1/19   5/5/19   5/1/19 
2 3/1/18   3/4/18   3/1/18 
2 3/10/18  6/10/18  3/7/18 
3 6/7/18   7/5/18   6/2/18 
3 7/3/19   7/9/19   7/3/19 
3 8/11/19  8/30/19  8/11/19 
3 9/1/19   9/5/19   9/1/19 
4 1/31/18  2/14/18  1/31/18 
4 2/14/18  5/14/18  2/14/18 
4 6/1/18   6/7/18   6/1/18 
5 11/12/18 12/12/18 11/11/18 
5 12/28/18 2/1/19   12/28/18 
6 1/1/19   2/1/19   1/1/19 
7 2/21/18  2/24/18  2/21/18 
7 2/27/18  3/3/18   2/26/18 
7 10/10/18 10/23/18 10/10/18 
7 11/1/18  12/1/18  11/1/18 
run;

data want (drop=_:);
  do _n=1 by 1 until (last.id=1 or _nxt_enter > leave_date + 30);
    set have (keep=id);
    by id;
    merge have 
          have (firstobs=2 keep=enter_date rename=(enter_date=_nxt_enter));
    if _n=1 then _enter1=enter_date;
  end;
  enter_date=_enter1;
  do _i=1 to _n;
    set have (drop=enter_date leave_date);
    output;
  end;
run;
  1.  If the first do group there is both a SET statement and a MERGE statement.  These are parallel streams of data.  The SET statement keeps only the ID variable and is accompanied by a "BY ID" statement, which produces the dummy variable LAST.ID (indicating that the record in hand is the last one for the current ID.
  2. The MERGE statement reads the identical record as the SET statement and simulaneously one variable (ENTER_DATE renamed to _NXT_ENTER) from the next record (due to the "FIRSTOBS=2" parameter).  The _NXT_ENTER is compared to the current LEAVE_DATE+30 at the end of every do-loop iteration, because of the "UNTIL …" condition.
  3. Note the "KEEP=ID" in the first SET statement is not absolutely necessary (all the variables are re-read in the MERGE statement).  But I use it as a technique to remind me of why I use the SET statement (just to implement the "BY ID" statement).
  4. Once the UNTIL condition is satisfied copy _ENTER1 to ENTER_DATE, 
  5.  The re-read exactly _N records, keeping all the variables except ENTER_DATE and LEAVE_DATE, and output each of those re-read records.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
amber7391
Fluorite | Level 6

Thank you so much for such a detailed answer! The logic is absolutely correct, but I don't know why it also change the dates of the records which are not qualified as re-enter. Basically it changes the records to the same enter_date and leave_date for different ids .

Ksharp
Super User
data have;
  infile datalines ;
  input id  (enter_date leave_date service_date) ( :mmddyy9.);
  format  enter_date -- service_date date9.;
datalines;
1 1/21/19  1/27/19  1/21/19 
1 2/25/19  3/9/19   2/25/19 
1 4/1/19   4/2/19   4/1/19 
1 5/1/19   5/5/19   5/1/19 
2 3/1/18   3/4/18   3/1/18 
2 3/10/18  6/10/18  3/7/18 
3 6/7/18   7/5/18   6/2/18 
3 7/3/19   7/9/19   7/3/19 
3 8/11/19  8/30/19  8/11/19 
3 9/1/19   9/5/19   9/1/19 
4 1/31/18  2/14/18  1/31/18 
4 2/14/18  5/14/18  2/14/18 
4 6/1/18   6/7/18   6/1/18 
5 11/12/18 12/12/18 11/11/18 
5 12/28/18 2/1/19   12/28/18 
6 1/1/19   2/1/19   1/1/19 
7 2/21/18  2/24/18  2/21/18 
7 2/27/18  3/3/18   2/26/18 
7 10/10/18 10/23/18 10/10/18 
7 11/1/18  12/1/18  11/1/18 
;
run;
data temp;
 set have;
 by id;
 if first.id or 
 lag(leave_date)>enter_date or
 enter_date>lag(leave_date)+30 then group+1;
 run;

 proc sql;
 create table want as
 select *,min(enter_date) as new_enter_date format=date9.,
 max(leave_date) as new_leave_date format=date9.
  from temp 
   group by group;
quit;
amber7391
Fluorite | Level 6
It works perfectly! Thank you very much!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 925 views
  • 5 likes
  • 3 in conversation