Below is a snap shot of the table, obviously it’s bigger than this, but these are the fields I need to use. The issue is that lines 3 and 4 are all 1 inpatient stay, with just an interim bill submitted with a 1 day "gap". Lines 1, 2, 5 are all discrete IP stays.
I need to group lines 3 and 4 and assign 06/15/2017 as the IP_START_DATE to both so that a “visit”: would be (123456||06/15/2017||987654) for all lines 3 and 4. The other part of this is that it is not always just two lines, it could be 3 or more that all combine to a > 90 day stay with interim bills so the lag function doesn't seem to be working for me.
Index | Member ID | DOS Start | DOS End | Provider ID |
OBS | FK_PTY_ID | MEDI_START_DATE | MEDI_END_DATE | HOSP_CODE |
1 | 12345 | 02/01/2017 | 02/01/2017 | 987654 |
2 | 12345 | 03/01/2017 | 03/10/2017 | 987654 |
3 | 12345 | 06/15/2017 | 06/30/2017 | 987654 |
4 | 12345 | 07/01/2017 | 07/10/2017 | 987654 |
5 | 12345 | 08/20/2017 | 08/24/2017 | 987654 |
My idea was to make a “sub_visit” of (FK_PTY_ID||HOSP_CODE), sort by that, then list all the MEDI_START_DATE by ascending and get the “gap” from MEDI_END_DATE to MEDI_START_DATE2 (lagged). If that “gap” is <= 1 then consider it all one visit. If not, it’s a new visit. Something like this:
Index | Member ID | DOS Start | DOS End | Provider ID | Created | Created |
OBS | FK_PTY_ID | MEDI_START_DATE | MEDI_END_DATE | HOSP_CODE | SUB_VISIT | GAP |
1 | 12345 | 02/01/2017 | 02/01/2017 | 987654 | 12345987654 | . |
2 | 12345 | 03/01/2017 | 03/10/2017 | 987654 | 12345987654 | 30 |
3 | 12345 | 06/15/2017 | 06/30/2017 | 987654 | 12345987654 | 97 |
4 | 12345 | 07/01/2017 | 07/10/2017 | 987654 | 12345987654 | 1 |
5 | 12345 | 08/20/2017 | 08/24/2017 | 987654 | 12345987654 | 41 |
Then, reading back it will look for a gap <= to 1 and assign the first MEDI_START_DATE within that visit only. (without reading all the way back to line 1, b/c is was the same member at the same provider but a very different visit months before.)
Index | Member ID | DOS Start | DOS End | Provider ID | Created | Created | Created |
OBS | FK_PTY_ID | MEDI_START_DATE | MEDI_END_DATE | HOSP_CODE | SUB_VISIT | GAP | IP_START |
1 | 12345 | 02/01/2017 | 02/01/2017 | 987654 | 12345987654 | . | 02/01/2017 |
2 | 12345 | 03/01/2017 | 03/10/2017 | 987654 | 12345987654 | 30 | 03/01/2017 |
3 | 12345 | 06/15/2017 | 06/30/2017 | 987654 | 12345987654 | 97 | 06/15/2017 |
4 | 12345 | 07/01/2017 | 07/10/2017 | 987654 | 12345987654 | 1 | 06/15/2015 |
5 | 12345 | 08/20/2017 | 08/24/2017 | 987654 | 12345987654 | 41 | 08/20/2017 |
I had been trying to use code like this, but it will pnly look back one line.
PROC SORT DATA=ALLORES.SUB_VISIT OUT=ALLORES.SUB_VISIT;
BY SUB_VISIT;
BY MEDI_START_DATE;
RUN;
DATA SUB_VISIT;
SET WORK.SUB_VISIT;
MEDI_S1= MEDI_START_DATE;
MEDI_S2 = LAG(MEDI_START_DATE);
MEDI_E1 = MEDI_END_DATE;
MEDI_E2 = LAG(MEDI_END_DATE);
GAP = MEDI_E1 - MEDI_S2;
IF FK_PTY_ID = LAG(FK_PTY_ID)
AND HOSP_CODE = LAG(HOSP_CODE)
AND GAP <= 1
THEN IP_START_DATE = put(MEDI_START_DATE,yymmdd10.);
ELSE IP_START_DATE = put(lag(MEDI_START_DATE),yymmdd10.);
RUN;
Thanks in adance!
A few things stand out about your question. Areas to learn more:
First, become more comfortable with dates. When creating IP_START_DATE, there is no need to create it as a character variable. Keep it as a numeric SAS date, and apply a format to it.
Second, as you have seen the LAG function won't handle everything. Sometimes it won't even give the right answer. If you want to retrieve the value on the previous observation, LAG has to execute on every observation (outside of any IF/THEN conditions).
Third, learning the use of a BY statement in a DATA step is vital. There's more to learn than can be posted here, but it is vital.
All that being said, let's take a look at the problem you posted. Here is a variation on what you posted that works with the original data set (without adding SUB_VISIT):
proc sort data=original_data_set;
by fk_pty_id hosp_code;
run;
The following approach will get you part-way there by assigning observations to a visit, based on the gap. Other calculations (such as when does the visit begin, when does it end) are not included here:
data want;
set original_data_set;
by fk_pty_id hosp_code;
gap = medi_end_date - lag(medi_start_date);
if first.fk_pty_id then visit = 1;
else if first.hosp_code or gap > 1 then visit + 1;
run;
Take a look at just that much, and see how close it gets you to where you need to go with this.
A few things stand out about your question. Areas to learn more:
First, become more comfortable with dates. When creating IP_START_DATE, there is no need to create it as a character variable. Keep it as a numeric SAS date, and apply a format to it.
Second, as you have seen the LAG function won't handle everything. Sometimes it won't even give the right answer. If you want to retrieve the value on the previous observation, LAG has to execute on every observation (outside of any IF/THEN conditions).
Third, learning the use of a BY statement in a DATA step is vital. There's more to learn than can be posted here, but it is vital.
All that being said, let's take a look at the problem you posted. Here is a variation on what you posted that works with the original data set (without adding SUB_VISIT):
proc sort data=original_data_set;
by fk_pty_id hosp_code;
run;
The following approach will get you part-way there by assigning observations to a visit, based on the gap. Other calculations (such as when does the visit begin, when does it end) are not included here:
data want;
set original_data_set;
by fk_pty_id hosp_code;
gap = medi_end_date - lag(medi_start_date);
if first.fk_pty_id then visit = 1;
else if first.hosp_code or gap > 1 then visit + 1;
run;
Take a look at just that much, and see how close it gets you to where you need to go with this.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.