I previously asked how to construct inpatient stays and length of stay here: https://communities.sas.com/t5/SAS-Programming/Count-number-of-inpatient-stays-and-length-of-stay/m-....
On top of that, I'd like to obtain a bit more admission and discharge information per stay. Say, each encounter has a individual claim_id. For each inpatient stay, I wanted to keep the admission claim_id (claim_id when a person was admitted) and the discharge claim_id (claim_id when a person was discharged). The example data look like below (only column claim_id is added, the rest is the same as the old post).
PERSON_ID | provider_id | admission_dt | discharge_dt | claim_id | Note |
1 | A | 8/1/2014 | 1/7/2018 | 12345 | a stay starts (admission_dt) from previous years, 1/1/2018 will be used as initial admission_dt |
1 | A | 8/1/2014 | 1/10/2018 | 12346 | Overlapped stay, so it's considered as the same stay as the stay in last claim (last observation) |
1 | A | 1/11/2018 | 1/31/2018 | 12347 | Consecutive date (date difference<=1) from last observation, so the same inpatient stay |
1 | A | 1/11/2018 | 2/28/2018 | 12348 | |
1 | A | 1/11/2018 | 2/28/2018 | 12349 | |
1 | A | 1/11/2018 | 3/31/2018 | 12350 | |
1 | A | 1/11/2018 | 4/30/2018 | 12351 | |
1 | A | 5/16/2018 | 5/31/2018 | 12352 | Non-consecutive date from last admission, so a separate inpatient stay |
1 | A | 5/16/2018 | 6/30/2018 | 12353 | |
1 | A | 5/16/2018 | 7/31/2018 | 12354 | |
1 | A | 5/16/2018 | 8/31/2018 | 12355 | |
1 | A | 5/16/2018 | 9/30/2018 | 12356 | |
1 | A | 5/16/2018 | 10/31/2018 | 12357 | |
1 | A | 5/16/2018 | 11/30/2018 | 12358 | |
1 | A | 5/16/2018 | 12/31/2018 | 12358 | |
2 | B | 11/1/2017 | 1/31/2018 | 21234 | |
2 | B | 12/24/2017 | 1/1/2018 | 21235 | |
2 | B | 11/1/2017 | 2/28/2018 | 21236 | |
2 | B | 11/1/2017 | 3/31/2018 | 21237 | |
2 | B | 11/1/2017 | 4/30/2018 | 21238 | |
2 | B | 11/1/2017 | 5/31/2018 | 21239 | |
2 | B | 11/1/2017 | 6/30/2018 | 21240 | |
2 | B | 11/1/2017 | 7/31/2018 | 21241 | |
2 | B | 11/1/2017 | 8/31/2018 | 21242 | |
2 | B | 11/1/2017 | 9/30/2018 | 21243 | |
2 | B | 11/1/2017 | 10/5/2018 | 21244 | |
2 | C | 10/6/2018 | 10/31/2018 | 21245 | Consecutive date from last admission but with different provider, so a separate inpatient stay |
2 | C | 10/5/2018 | 11/30/2018 | 21246 | |
2 | C | 10/5/2018 | 12/31/2018 | 21247 | |
3 | D | 1/27/2018 | 1/30/2018 | 31234 | |
3 | D | 1/27/2018 | 2/28/2018 | 31235 | |
3 | E | 3/15/2018 | 31236 | A new inpatient stay at a different provider 15 days after the last stay | |
3 | E | 3/15/2018 | 31237 |
Missing discharge date means the person is not discharged per claim, so 12/31/2018 will be used as the discharge date for the stay |
|
3 |
E | 3/15/2018 | 31238 | ||
3 | E | 3/15/2018 | 31239 | ||
3 | E | 3/15/2018 | 31240 | ||
3 | E | 3/15/2018 | 31241 | ||
3 | E | 4/1/2018 | 31242 | ||
3 | E | 5/1/2018 | 31243 | ||
3 | E | 6/1/2018 | 31244 | ||
3 | E | 7/1/2018 | 31245 | ||
3 | E | 8/1/2018 | 31246 | ||
3 | E | 9/1/2018 | 31247 | ||
3 | E | 10/1/2018 | 31248 | ||
3 | E | 11/1/2018 | 31249 | ||
3 | E | 12/1/2018 | 31250 |
The target output would be
PERSON_ID | provider_id | admission_dt | discharge_dt | length_of_stay | claim_id_adm | claim_id_discharge |
1 | A | 8/1/2014 | 4/30/2018 | 1368 | 12345 | 12351 |
1 | A | 5/16/2018 | 12/31/2018 | 229 | 12352 | 12358 |
2 | B | 11/1/2017 | 10/5/2018 | 338 | 21234 | 21244 |
2 | C | 10/5/2018 | 12/31/2018 | 87 | 21246 | 21247 |
3 | D | 1/27/2018 | 2/28/2018 | 32 | 31234 | 31235 |
3 | E | 3/15/2018 | 12/31/2018 | 291 | 31236 | 31250 |
where the last two columns are newly targeted column compared to the old post.
Much appreciated!
Please do yourself and us a BIG favor and always include your data as a data step with datalines, as I have already shown you. Not having to write that speed up the process of finding a solution, and it prevents any questions about variable attributes (lengths, formats) and real content.
You need only a slight expansion of the original logic, as the "cutover point" for the new variables is the same.
data have;
infile datalines dlm="09"x dsd truncover;
input
person_id $
provider_id $
admission_dt :mmddyy10.
discharge_dt :mmddyy10.
claim_id :$5.
;
format
admission_dt
discharge_dt yymmdd10.
;
datalines;
1 A 8/1/2014 1/7/2018 12345
1 A 8/1/2014 1/10/2018 12346
1 A 1/11/2018 1/31/2018 12347
1 A 1/11/2018 2/28/2018 12348
1 A 1/11/2018 2/28/2018 12349
1 A 1/11/2018 3/31/2018 12350
1 A 1/11/2018 4/30/2018 12351
1 A 5/16/2018 5/31/2018 12352
1 A 5/16/2018 6/30/2018 12353
1 A 5/16/2018 7/31/2018 12354
1 A 5/16/2018 8/31/2018 12355
1 A 5/16/2018 9/30/2018 12356
1 A 5/16/2018 10/31/2018 12357
1 A 5/16/2018 11/30/2018 12358
1 A 5/16/2018 12/31/2018 12358
2 B 11/1/2017 1/31/2018 21234
2 B 12/24/2017 1/1/2018 21235
2 B 11/1/2017 2/28/2018 21236
2 B 11/1/2017 3/31/2018 21237
2 B 11/1/2017 4/30/2018 21238
2 B 11/1/2017 5/31/2018 21239
2 B 11/1/2017 6/30/2018 21240
2 B 11/1/2017 7/31/2018 21241
2 B 11/1/2017 8/31/2018 21242
2 B 11/1/2017 9/30/2018 21243
2 B 11/1/2017 10/5/2018 21244
2 C 10/6/2018 10/31/2018 21245
2 C 10/5/2018 11/30/2018 21246
2 C 10/5/2018 12/31/2018 21247
3 D 1/27/2018 1/30/2018 31234
3 D 1/27/2018 2/28/2018 31235
3 E 3/15/2018 31236
3 E 3/15/2018 31237
3 E 3/15/2018 31238
3 E 3/15/2018 31239
3 E 3/15/2018 31240
3 E 3/15/2018 31241
3 E 4/1/2018 31242
3 E 5/1/2018 31243
3 E 6/1/2018 31244
3 E 7/1/2018 31245
3 E 8/1/2018 31246
3 E 9/1/2018 31247
3 E 10/1/2018 31248
3 E 11/1/2018 31249
3 E 12/1/2018 31250
;
%let def_discharge = %sysfunc(inputn(2018-12-31,yymmdd10.));
data want;
set have;
by person_id provider_id;
retain _start length_of_stay claim_id_adm;
/* length_of stay is included here solely for the purpose of variable order, purely cosmetic! */
_dis = lag(discharge_dt);
_claim_id = lag(claim_id);
if first.provider_id
then do;
_start = admission_dt;
claim_id_adm = claim_id;
end;
if not first.provider_id and _dis ne . and admission_dt gt sum(_dis,1)
then do;
_tmp = admission_dt;
admission_dt = _start;
_start = _tmp;
_tmp = discharge_dt;
discharge_dt = _dis;
length_of_stay = discharge_dt - admission_dt;
claim_id_discharge = _claim_id;
output;
discharge_dt = _tmp;
claim_id_adm = claim_id;
end;
if last.provider_id
then do;
admission_dt = _start;
discharge_dt = coalesce(discharge_dt,&def_discharge.);
length_of_stay = discharge_dt - admission_dt;
claim_id_discharge = claim_id;
output;
end;
drop _: claim_id;
run;
Please do yourself and us a BIG favor and always include your data as a data step with datalines, as I have already shown you. Not having to write that speed up the process of finding a solution, and it prevents any questions about variable attributes (lengths, formats) and real content.
You need only a slight expansion of the original logic, as the "cutover point" for the new variables is the same.
data have;
infile datalines dlm="09"x dsd truncover;
input
person_id $
provider_id $
admission_dt :mmddyy10.
discharge_dt :mmddyy10.
claim_id :$5.
;
format
admission_dt
discharge_dt yymmdd10.
;
datalines;
1 A 8/1/2014 1/7/2018 12345
1 A 8/1/2014 1/10/2018 12346
1 A 1/11/2018 1/31/2018 12347
1 A 1/11/2018 2/28/2018 12348
1 A 1/11/2018 2/28/2018 12349
1 A 1/11/2018 3/31/2018 12350
1 A 1/11/2018 4/30/2018 12351
1 A 5/16/2018 5/31/2018 12352
1 A 5/16/2018 6/30/2018 12353
1 A 5/16/2018 7/31/2018 12354
1 A 5/16/2018 8/31/2018 12355
1 A 5/16/2018 9/30/2018 12356
1 A 5/16/2018 10/31/2018 12357
1 A 5/16/2018 11/30/2018 12358
1 A 5/16/2018 12/31/2018 12358
2 B 11/1/2017 1/31/2018 21234
2 B 12/24/2017 1/1/2018 21235
2 B 11/1/2017 2/28/2018 21236
2 B 11/1/2017 3/31/2018 21237
2 B 11/1/2017 4/30/2018 21238
2 B 11/1/2017 5/31/2018 21239
2 B 11/1/2017 6/30/2018 21240
2 B 11/1/2017 7/31/2018 21241
2 B 11/1/2017 8/31/2018 21242
2 B 11/1/2017 9/30/2018 21243
2 B 11/1/2017 10/5/2018 21244
2 C 10/6/2018 10/31/2018 21245
2 C 10/5/2018 11/30/2018 21246
2 C 10/5/2018 12/31/2018 21247
3 D 1/27/2018 1/30/2018 31234
3 D 1/27/2018 2/28/2018 31235
3 E 3/15/2018 31236
3 E 3/15/2018 31237
3 E 3/15/2018 31238
3 E 3/15/2018 31239
3 E 3/15/2018 31240
3 E 3/15/2018 31241
3 E 4/1/2018 31242
3 E 5/1/2018 31243
3 E 6/1/2018 31244
3 E 7/1/2018 31245
3 E 8/1/2018 31246
3 E 9/1/2018 31247
3 E 10/1/2018 31248
3 E 11/1/2018 31249
3 E 12/1/2018 31250
;
%let def_discharge = %sysfunc(inputn(2018-12-31,yymmdd10.));
data want;
set have;
by person_id provider_id;
retain _start length_of_stay claim_id_adm;
/* length_of stay is included here solely for the purpose of variable order, purely cosmetic! */
_dis = lag(discharge_dt);
_claim_id = lag(claim_id);
if first.provider_id
then do;
_start = admission_dt;
claim_id_adm = claim_id;
end;
if not first.provider_id and _dis ne . and admission_dt gt sum(_dis,1)
then do;
_tmp = admission_dt;
admission_dt = _start;
_start = _tmp;
_tmp = discharge_dt;
discharge_dt = _dis;
length_of_stay = discharge_dt - admission_dt;
claim_id_discharge = _claim_id;
output;
discharge_dt = _tmp;
claim_id_adm = claim_id;
end;
if last.provider_id
then do;
admission_dt = _start;
discharge_dt = coalesce(discharge_dt,&def_discharge.);
length_of_stay = discharge_dt - admission_dt;
claim_id_discharge = claim_id;
output;
end;
drop _: claim_id;
run;
Good day
Please see below for an approach. You should be able to leverage the below code. The approach has three steps:
proc sql;
create table want_base_file as
select distinct
PERSON_ID
,provider_id
,admission_dt
,max(discharge_dt) format=date9. as final_discharge_dt
,intck('day', admission_dt, calculated final_discharge_dt) as length_of_stay
from Data_Import_20220802
group by PERSON_ID
,provider_id
,admission_dt;
quit;
proc sql;
create table want_claim_id_flag as
select distinct
PERSON_ID
,provider_id
,admission_dt
,claim_id
,(case
when claim_id=12345 then 'A'
when claim_id=12346 then 'D'
else '' end) as claim_flag
from Data_Import_20220802;
quit;
proc sql;
create table want_final as
select distinct
a.*
,b.claim_id as claim_id_adm
,c.claim_id as claim_id_discharge
from want_base_file as a
left join want_claim_id_flag as b
on (a.PERSON_ID=b.PERSON_ID
and a.provider_id=b.provider_id
and a.admission_dt=b.admission_dt
and b.claim_flag='A')
left join want_claim_id_flag as C
on (a.PERSON_ID=c.PERSON_ID
and a.provider_id=c.provider_id
and a.admission_dt=c.admission_dt
and c.claim_flag='D');
quit;
How do you plan to make this
,(case
when claim_id=12345 then 'A'
when claim_id=12346 then 'D'
else '' end) as claim_flag
dynamic to work for all stays?
Good day Kurt
Hope you're doing well.
From a dynamic perspective, the approach there would be to focus on the context of the work. Ideally, it would be more sustainable to write it slightly different, based on data inputs from users. Below is simply an example.
proc sql;
create table want_claim_id_flag_dynamic as
select distinct
PERSON_ID
,provider_id
,admission_dt
,claim_id
,(case
when upcase(note) like '%ADMISSION%' then 'A'
when upcase(note) like '%DISCHARGE%' then 'D'
else '' end) as claim_flag_dynamic
from Data_Import_20220802;
quit;
if not first.provider_id and _dis ne . and admission_dt gt sum(_dis,1)
/* this condition becomes true when
- we do not have the first observation of a group (so the previous observation must have been from the same group
- the previous discharge date is not missing (which would indicate a stay without end)
- and the difference between the previous discharge date and the current admission date is greater than 1
the SUM() function is used to prevent "missing value" NOTEs */
then do;
/* since we recognize the end of a stay after the fact, we need to do some juggling with values */
_tmp = admission_dt; /* save the current admission date */
admission_dt = _start; set admission date from the RETAINed value */
_start = _tmp; /* set the new start from the current admission date */
_tmp = discharge_dt; /* save the current discharge date */
discharge_dt = _dis; /* set the discharge date from the previous observation which marks the end of a stay */
length_of_stay = discharge_dt - admission_dt; /* calculate */
claim_id_discharge = _claim_id; /* set the claim from the previous observation */
output; /* write to output */
discharge_dt = _tmp; /* restore the current value (is necessary if the current observation is the last of a group */
claim_id_adm = claim_id; /* set the claim for the start of a stay */
end;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.