BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Van2
Fluorite | Level 6
Hello everyone,
I was fortunate enough to get some help yesterday here and again thanks to the two members. My dataset has a number of faulty registrations that I need to address and I hope you guys can help out.
I’m trying to calculate the total number of days each per_id stays at a given inst_id. I have tried to illustrate the four main issues.
Issue 1 : A stay for the same person is registered at an institution B during a stay at institution A (row 1 and 2). Row 2 should be removed entirely.
Issue 2. A stay at an institution B starts during a stay at inst A and ends after moveout date for inst A. The movein date at inst B should be rectified to moveout date at inst A.
Issue 3: sort of the reverse to issue 2. The two separate stays at inst c should be put together to one long stay.
Issue 4: this is a special case where the person moves in and out on the same date. I would like to keep these, unless they are during the middle of a different stay (like issue 1)
It’s quite a mouthful but the end goal is a tidy data set with one row per person per institution per period. Any help is greatly appreciated.
Inst_id Per_id bbMovein Moveout
A A. 01JAN2022 10JAN2022
B. A. 03JAN2022 07JAN2022
A. B. 04JAN2022 08JAN2022
B. B. 07JAN2022. 11JAN2022
C. C. 05JAN2022. 08JAN2022
C. C. 08JAN2022. 09JAN200
E. D. 02JAN2022 02JAN2022
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I added some notes to the sample data, and generated the following:

 

data have;
  input Inst_id :$1. Per_id $1. Movein :date9. Moveout :date9. ;
  format move: date9.;
datalines;
A A 01JAN2022 10JAN2022
B A 03JAN2022 07JAN2022    /* To be Dropped                                         */
A B 04JAN2022 08JAN2022
B B 07JAN2022 11JAN2022    /* Change movein to 08JAN2022, based on preceding moveout*/
C C 05JAN2022 08JAN2022    /* Combine this one and ...                              */
C C 08JAN2022 09JAN2022    /*  ... this one into a single stay 05JAN2022-09Jan2022  */
E D 02JAN2022 02JAN2022
run;

Assuming the data are sorted by PER_ID/MOVEIN, then this produces what I think you want:

 

data want (drop=_:);
  set have;
  by per_id inst_id notsorted;

  retain _min_movein  _max_moveout;

  if first.inst_id then call missing(_min_movein,_max_movout);

  /* Delete totally "contained" reservations */
  if first.per_id=0 and _min_movein<=movein  and moveout<=_max_moveout then delete;

  if first.per_id=1 then do;
    _min_movein=movein;
    _max_moveout=moveout;
  end;
  if first.per_id=0 and inst_id^=lag(inst_id) then do; /*First non-deleted record for this inst_id */
    _min_movein=max(movein,_max_moveout);              /*Remove possible overlap with preceding rec*/
    _max_moveout=moveout;
  end;
  else _max_moveout=max(_max_moveout,moveout);

  if last.inst_id;
  movein=_min_movein;
  moveout=_max_moveout;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

First, please provide sample data in the form of a working DATA step, so that suggested code can be tested by your respondents.  And expected output to confirm the description of your objective.   Help us help you.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

I added some notes to the sample data, and generated the following:

 

data have;
  input Inst_id :$1. Per_id $1. Movein :date9. Moveout :date9. ;
  format move: date9.;
datalines;
A A 01JAN2022 10JAN2022
B A 03JAN2022 07JAN2022    /* To be Dropped                                         */
A B 04JAN2022 08JAN2022
B B 07JAN2022 11JAN2022    /* Change movein to 08JAN2022, based on preceding moveout*/
C C 05JAN2022 08JAN2022    /* Combine this one and ...                              */
C C 08JAN2022 09JAN2022    /*  ... this one into a single stay 05JAN2022-09Jan2022  */
E D 02JAN2022 02JAN2022
run;

Assuming the data are sorted by PER_ID/MOVEIN, then this produces what I think you want:

 

data want (drop=_:);
  set have;
  by per_id inst_id notsorted;

  retain _min_movein  _max_moveout;

  if first.inst_id then call missing(_min_movein,_max_movout);

  /* Delete totally "contained" reservations */
  if first.per_id=0 and _min_movein<=movein  and moveout<=_max_moveout then delete;

  if first.per_id=1 then do;
    _min_movein=movein;
    _max_moveout=moveout;
  end;
  if first.per_id=0 and inst_id^=lag(inst_id) then do; /*First non-deleted record for this inst_id */
    _min_movein=max(movein,_max_moveout);              /*Remove possible overlap with preceding rec*/
    _max_moveout=moveout;
  end;
  else _max_moveout=max(_max_moveout,moveout);

  if last.inst_id;
  movein=_min_movein;
  moveout=_max_moveout;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 570 views
  • 0 likes
  • 2 in conversation