I would like to find out the hospital transitions of each patient during year 2018. The claims data look like below
(1) The data I have looks like:
PatID FacID clm_from_dt clm_thru_dt
001 40001 1/3/2018 1/6/2018
001 40001 1/14/2018 1/19/2018
001 40001 3/7/2018 3/10/2018
The data I want looks like (need to insert rows to make up the date gap between two claims in order to capture the transition):
PatID FacID clm_from_dt clm_thru_dt
001 40001 1/3/2018 1/6/2018
001 1/6/2018 1/14/2018
001 40001 1/14/2018 1/19/2018
001 1/19/2018 3/7/2018
001 40001 3/7/2018 3/10/2018
001 3/10/2018 12/31/2018
(2) For some patients, they visited hospital multiple times during their stay in a long term care facility. So there is overlapping between the previous claim through date and next claim from date. The data looks like below:
PatID FacID clm_from_dt clm_thru_dt
002 43517 2/1/2018 3/31/2018
002 40002 2/10/2018 2/11/2018
002 42168 2/12/2018 2/22/2018
002 43517 4/1/2018 5/31/2018
002 40002 4/5/2018 4/7/2018
002 43517 6/1/2018 7/31/2018
For this situation, I would like the data looks like below:
PatID FacID clm_from_dt clm_thru_dt
002 43517 2/1/2018 2/10/2018
002 40002 2/10/2018 2/11/2018
002 42168 2/12/2018 2/22/2018
002 43517 2/22/2018 3/31/2018
002 43517 4/1/2018 4/5/2018
002 40002 4/5/2018 4/7/2018
002 43517 4/7/2018 5/31/2018
002 43517 6/1/2018 7/31/2018
002 7/31/2018 12/31/2018
In general, I would like to insert rows to make sure there is no date gap between claims.
Hi @Bobo3
I don't understand why do you still expect overlapping dates in your output table:
e.g. for patient 001:
001 40001 1/3/2018 1/6/2018
001 1/6/2018 1/14/2018 -> why don't you expect 1/7/2018 and 1/13/2018 ?
001 40001 1/14/2018 1/19/2018
001 1/19/2018 3/7/2018 -> why don't you expect 1/20/2018 and 3/6/2018 ?
001 40001 3/7/2018 3/10/2018
001 3/10/2018 12/31/2018 -> why don't you expect 3/11/2018 and 12/31/2018 ?
If this is a mistake, you could try the following program. When FacID is missing, it corresponds to date ranges without hospitalisations.
Hope this help.
data have;
input PatID $ FacID $ clm_from_dt:mmddyy8. clm_thru_dt:mmddyy8.;
format clm_from_dt clm_thru_dt date9.;
cards;
001 40001 1/3/2018 1/6/2018
001 40001 1/14/2018 1/19/2018
001 40001 3/7/2018 3/10/2018
002 43517 2/1/2018 3/31/2018
002 40002 2/10/2018 2/11/2018
002 42168 2/12/2018 2/22/2018
002 43517 4/1/2018 5/31/2018
002 40002 4/5/2018 4/7/2018
002 43517 6/1/2018 7/31/2018
;
run;
/* STEP1 : creation of 1 dataset with 1 row per day of hospitalization */
data day_hosp;
set have;
/* Each hospital transition is identified by the variable source */
source + 1;
/* 1 row per day at hospital */
format day date9.;
day=clm_from_dt - 1;
do i=clm_from_dt to clm_thru_dt;
day + 1;
output;
end;
drop i clm_from_dt clm_thru_dt;
run;
proc sort data=day_hosp;
by patID day;
run;
data day_hosp2;
set day_hosp;
by patID day source;
/* remove overlapping days*/
if last.day then
output;
run;
/* STEP2 : creation of a dataset with 1 row per day until 12/31/2018 */
proc sql;
create table day_all as select patID, min(clm_from_dt) as start_date
format=date9., '31DEC2018'd as end_date format=date9.
from have group by patID order by patID;
quit;
data day_all2;
set day_all;
format day date9.;
by patID;
do day=(start_date) to (end_date);
day+1;
day=day-1;
output;
end;
drop start_date end_date;
run;
/* STEP3 : merge */
data day_all;
format day date9.;
merge day_hosp2 day_all2;
by patID day;
run;
data want;
set day_all;
format clm_from_dt clm_thru_dt date9.;
by patID source notsorted;
retain clm_from_dt;
if first.source then
clm_from_dt=day;
if last.source then
clm_thru_dt=day;
if last.source then
output;
keep PatID FacID clm_from_dt clm_thru_dt;
run;
proc print data=want;
id patID;
run;
Hi @Bobo3
I don't understand why do you still expect overlapping dates in your output table:
e.g. for patient 001:
001 40001 1/3/2018 1/6/2018
001 1/6/2018 1/14/2018 -> why don't you expect 1/7/2018 and 1/13/2018 ?
001 40001 1/14/2018 1/19/2018
001 1/19/2018 3/7/2018 -> why don't you expect 1/20/2018 and 3/6/2018 ?
001 40001 3/7/2018 3/10/2018
001 3/10/2018 12/31/2018 -> why don't you expect 3/11/2018 and 12/31/2018 ?
If this is a mistake, you could try the following program. When FacID is missing, it corresponds to date ranges without hospitalisations.
Hope this help.
data have;
input PatID $ FacID $ clm_from_dt:mmddyy8. clm_thru_dt:mmddyy8.;
format clm_from_dt clm_thru_dt date9.;
cards;
001 40001 1/3/2018 1/6/2018
001 40001 1/14/2018 1/19/2018
001 40001 3/7/2018 3/10/2018
002 43517 2/1/2018 3/31/2018
002 40002 2/10/2018 2/11/2018
002 42168 2/12/2018 2/22/2018
002 43517 4/1/2018 5/31/2018
002 40002 4/5/2018 4/7/2018
002 43517 6/1/2018 7/31/2018
;
run;
/* STEP1 : creation of 1 dataset with 1 row per day of hospitalization */
data day_hosp;
set have;
/* Each hospital transition is identified by the variable source */
source + 1;
/* 1 row per day at hospital */
format day date9.;
day=clm_from_dt - 1;
do i=clm_from_dt to clm_thru_dt;
day + 1;
output;
end;
drop i clm_from_dt clm_thru_dt;
run;
proc sort data=day_hosp;
by patID day;
run;
data day_hosp2;
set day_hosp;
by patID day source;
/* remove overlapping days*/
if last.day then
output;
run;
/* STEP2 : creation of a dataset with 1 row per day until 12/31/2018 */
proc sql;
create table day_all as select patID, min(clm_from_dt) as start_date
format=date9., '31DEC2018'd as end_date format=date9.
from have group by patID order by patID;
quit;
data day_all2;
set day_all;
format day date9.;
by patID;
do day=(start_date) to (end_date);
day+1;
day=day-1;
output;
end;
drop start_date end_date;
run;
/* STEP3 : merge */
data day_all;
format day date9.;
merge day_hosp2 day_all2;
by patID day;
run;
data want;
set day_all;
format clm_from_dt clm_thru_dt date9.;
by patID source notsorted;
retain clm_from_dt;
if first.source then
clm_from_dt=day;
if last.source then
clm_thru_dt=day;
if last.source then
output;
keep PatID FacID clm_from_dt clm_thru_dt;
run;
proc print data=want;
id patID;
run;
To eliminate overlaps and fill in the gaps (if that's what you want) :
data have;
input PatID $ FacID $ clm_from_dt :mmddyy. clm_thru_dt :mmddyy.;
format clm: yymmdd10.;
datalines;
002 43517 2/1/2018 3/31/2018
002 40002 2/10/2018 2/11/2018
002 42168 2/12/2018 2/22/2018
002 43517 4/1/2018 5/31/2018
002 40002 4/5/2018 4/7/2018
002 43517 6/1/2018 7/31/2018
;
data want;
array f {%sysevalf("31dec2017"d, integer) : %sysevalf("01jan2019"d,integer) } $8;
f{lbound(f)} = "xxxx";
f{hbound(f)} = "xxxx";
do until(last.patId);
set have; by patId;
do d = clm_from_dt to clm_thru_dt;
f{d} = FacId;
end;
end;
do d = "01jan2018"d to "31dec2018"d;
if f{d} ne f{d-1} then do;
facId = f{d};
clm_from_dt = d;
end;
if f{d} ne f{d+1} then do;
clm_thru_dt = d;
output;
end;
end;
keep patId facId clm_from_dt clm_thru_dt;
run;
proc print; run;
Obs. PatID FacID clm_from_dt clm_thru_dt 1 002 2018-01-01 2018-01-31 2 002 43517 2018-02-01 2018-02-09 3 002 40002 2018-02-10 2018-02-11 4 002 42168 2018-02-12 2018-02-22 5 002 43517 2018-02-23 2018-04-04 6 002 40002 2018-04-05 2018-04-07 7 002 43517 2018-04-08 2018-07-31 8 002 2018-08-01 2018-12-31
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.