BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Van2
Fluorite | Level 6
Hello everyone,
I was fortunate enough to get some help yesterday here and again thanks to the two members. My dataset has a number of faulty registrations that I need to address and I hope you guys can help out.
I’m trying to calculate the total number of days each per_id stays at a given inst_id. I have tried to illustrate the four main issues.
Issue 1 : A stay for the same person is registered at an institution B during a stay at institution A (row 1 and 2). Row 2 should be removed entirely.
Issue 2. A stay at an institution B starts during a stay at inst A and ends after moveout date for inst A. The movein date at inst B should be rectified to moveout date at inst A.
Issue 3: sort of the reverse to issue 2. The two separate stays at inst c should be put together to one long stay.
Issue 4: this is a special case where the person moves in and out on the same date. I would like to keep these, unless they are during the middle of a different stay (like issue 1)
It’s quite a mouthful but the end goal is a tidy data set with one row per person per institution per period. Any help is greatly appreciated.
Inst_id Per_id bbMovein Moveout
A A. 01JAN2022 10JAN2022
B. A. 03JAN2022 07JAN2022
A. B. 04JAN2022 08JAN2022
B. B. 07JAN2022. 11JAN2022
C. C. 05JAN2022. 08JAN2022
C. C. 08JAN2022. 09JAN200
E. D. 02JAN2022 02JAN2022
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I added some notes to the sample data, and generated the following:

 

data have;
  input Inst_id :$1. Per_id $1. Movein :date9. Moveout :date9. ;
  format move: date9.;
datalines;
A A 01JAN2022 10JAN2022
B A 03JAN2022 07JAN2022    /* To be Dropped                                         */
A B 04JAN2022 08JAN2022
B B 07JAN2022 11JAN2022    /* Change movein to 08JAN2022, based on preceding moveout*/
C C 05JAN2022 08JAN2022    /* Combine this one and ...                              */
C C 08JAN2022 09JAN2022    /*  ... this one into a single stay 05JAN2022-09Jan2022  */
E D 02JAN2022 02JAN2022
run;

Assuming the data are sorted by PER_ID/MOVEIN, then this produces what I think you want:

 

data want (drop=_:);
  set have;
  by per_id inst_id notsorted;

  retain _min_movein  _max_moveout;

  if first.inst_id then call missing(_min_movein,_max_movout);

  /* Delete totally "contained" reservations */
  if first.per_id=0 and _min_movein<=movein  and moveout<=_max_moveout then delete;

  if first.per_id=1 then do;
    _min_movein=movein;
    _max_moveout=moveout;
  end;
  if first.per_id=0 and inst_id^=lag(inst_id) then do; /*First non-deleted record for this inst_id */
    _min_movein=max(movein,_max_moveout);              /*Remove possible overlap with preceding rec*/
    _max_moveout=moveout;
  end;
  else _max_moveout=max(_max_moveout,moveout);

  if last.inst_id;
  movein=_min_movein;
  moveout=_max_moveout;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

First, please provide sample data in the form of a working DATA step, so that suggested code can be tested by your respondents.  And expected output to confirm the description of your objective.   Help us help you.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

I added some notes to the sample data, and generated the following:

 

data have;
  input Inst_id :$1. Per_id $1. Movein :date9. Moveout :date9. ;
  format move: date9.;
datalines;
A A 01JAN2022 10JAN2022
B A 03JAN2022 07JAN2022    /* To be Dropped                                         */
A B 04JAN2022 08JAN2022
B B 07JAN2022 11JAN2022    /* Change movein to 08JAN2022, based on preceding moveout*/
C C 05JAN2022 08JAN2022    /* Combine this one and ...                              */
C C 08JAN2022 09JAN2022    /*  ... this one into a single stay 05JAN2022-09Jan2022  */
E D 02JAN2022 02JAN2022
run;

Assuming the data are sorted by PER_ID/MOVEIN, then this produces what I think you want:

 

data want (drop=_:);
  set have;
  by per_id inst_id notsorted;

  retain _min_movein  _max_moveout;

  if first.inst_id then call missing(_min_movein,_max_movout);

  /* Delete totally "contained" reservations */
  if first.per_id=0 and _min_movein<=movein  and moveout<=_max_moveout then delete;

  if first.per_id=1 then do;
    _min_movein=movein;
    _max_moveout=moveout;
  end;
  if first.per_id=0 and inst_id^=lag(inst_id) then do; /*First non-deleted record for this inst_id */
    _min_movein=max(movein,_max_moveout);              /*Remove possible overlap with preceding rec*/
    _max_moveout=moveout;
  end;
  else _max_moveout=max(_max_moveout,moveout);

  if last.inst_id;
  movein=_min_movein;
  moveout=_max_moveout;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 382 views
  • 0 likes
  • 2 in conversation