Greetings SAS Experts,
I really need your help. I have the sample data for year 2021 below. For each patient ID, a missing discharge date means the patient is still hospitalized. My final aim is to calculate the number of days each patient spends outside the hospital. Thus, I would like to employ the following for missing discharge dates.
1) If distinct patient ID with only one claim, (e.g; patient id 3 and 10, I would like to recode missing discahrge dates as end of the year (31DEC2021).
2) If admit date for last claim is missing (e.g for patient ID 4, claim F, discharge date should be recoded as end of the year (31DEC2021) and patient ID 9 claim AD)
3) If consecutive admit dates are the same and discharge date is missing, missing discharge date should be coded as the first non-missing discharge date with same admit date (e.g patient ID 6 claim L, recoded as 24DEC2021 (because of same admit date as claim M) , Patient ID 8, claims U,V,W discharge date recoded as 22OCT2021 (because of same admit date as claim X), Patient ID 9, claims AA, AB (because of same admit date as claim AC).
All the codes I tried didn't work correctly. I have attached what the dataset should look like (want dataset) Thank you very much
data HAVE_DATA(label='Patient Data');
infile datalines dsd truncover;
input Pt_ID:$8. Claim_ID:$1. Admit_Date:$10. Discharge_Date:$10.;
datalines;
1 A 29-Aug-21 09-Sep-21
2 B 18-Jun-21 29-Jun-21
3 C 19-Nov-17 .
4 D 08-Nov-20 22-Sep-21
4 E 20-Oct-21 18-Nov-21
4 F 19-Nov-21 .
5 G 19-Nov-21 19-Dec-21
6 H 18-Jan-21 28-Feb-21
6 I 02-Mar-21 10-Mar-21
6 J 16-Mar-21 15-May-21
6 K 21-Apr-21 22-Apr-21
6 L 05-May-21 .
6 M 05-May-21 24-Dec-21
6 N 24-Dec-21 28-Dec-21
7 O 08-Jul-20 30AAPR2021
7 P 30-Apr-21 03-May-21
7 Q 03-May-21 23-May-21
7 R 23-May-21 01-Jun-21
7 S 01-Jun-21 05-Jul-21
7 T 05-Jul-21 12-Jul-21
8 U 08-Jun-20 .
8 V 08-Jun-20 .
8 W 08-Jun-20 .
8 X 08-Jun-20 22-Oct-21
8 Y 08SE2021 10-Sep-21
8 Z 21-Oct-21 25-Oct-21
9 AA 13-Mar-19 .
9 AB 13-Mar-19 .
9 AC 13-Mar-19 18-May-21
9 AD 18-May-21 .
10 AE 01-Feb-21 .
;;;;
Want data set
Pt_ID | Claim_no | Admit_Date | Discharge_Date |
1 | A | 29AUG2021 | 09SEP2021 |
2 | B | 18JUN2021 | 29JUN2021 |
3 | C | 19NOV2017 | 31DEC2021 |
4 | D | 08NOV2020 | 22SEPT2021 |
4 | E | 20OCT2021 | 18NOV2021 |
4 | F | 19NOV2021 | 19DEC2021 |
5 | G | 19NOV2021 | 19DEC2021 |
6 | H | 18JAN2021 | 28FEB2021 |
6 | I | 02MAR2021 | 10MAR2021 |
6 | J | 16MAR2021 | 15MAY2021 |
6 | K | 21APR2021 | 22APR2021 |
6 | L | 05MAY2021 | 24DEC2021 |
6 | M | 05MAY2021 | 24DEC2021 |
6 | N | 24DEC2021 | 28DEC2021 |
7 | O | 08JUL2020 | 30AAPR2021 |
7 | P | 30APR2021 | 03MAY2021 |
7 | Q | 03MAY2021 | 23MAY2021 |
7 | R | 23MAY2021 | 01JUN2021 |
7 | S | 01JUN2021 | 05JUL2021 |
7 | T | 05JUL2021 | 12JUL2021 |
8 | U | 08JUN2020 | 22OCT2021 |
8 | V | 08JUN2020 | 22OCT2021 |
8 | W | 08JUN2020 | 22OCT2021 |
8 | X | 08JUN2020 | 22OCT2021 |
8 | Y | 08SE2021 | 10SEP2021 |
8 | Z | 21OCT2021 | 25OCT2021 |
9 | AA | 13MAR2019 | 18MAY2021 |
9 | AB | 13MAR2019 | 18MAY2021 |
9 | AC | 13MAR2019 | 18MAY2021 |
9 | AD | 18MAY2021 | 31DEC2021 |
10 | AE | 01FEB2021 | 31DEC2021 |
See if this gives you what you want. I edited the have data a bit.
data HAVE_DATA(label='Patient Data');
infile datalines missover;
input Pt_ID Claim_ID:$2. (Admit_Date Discharge_Date)(:date9.);
format Admit_Date Discharge_Date date9.;
datalines;
1 A 29-Aug-21 09-Sep-21
2 B 18-Jun-21 29-Jun-21
3 C 19-Nov-17 .
4 D 08-Nov-20 22-Sep-21
4 E 20-Oct-21 18-Nov-21
4 F 19-Nov-21 .
5 G 19-Nov-21 19-Dec-21
6 H 18-Jan-21 28-Feb-21
6 I 02-Mar-21 10-Mar-21
6 J 16-Mar-21 15-May-21
6 K 21-Apr-21 22-Apr-21
6 L 05-May-21 .
6 M 05-May-21 24-Dec-21
6 N 24-Dec-21 28-Dec-21
7 O 08-Jul-20 30AAPR2021
7 P 30-Apr-21 03-May-21
7 Q 03-May-21 23-May-21
7 R 23-May-21 01-Jun-21
7 S 01-Jun-21 05-Jul-21
7 T 05-Jul-21 12-Jul-21
8 U 08-Jun-20 .
8 V 08-Jun-20 .
8 W 08-Jun-20 .
8 X 08-Jun-20 22-Oct-21
8 Y 08-Sep-21 10-Sep-21
8 Z 21-Oct-21 25-Oct-21
9 AA 13-Mar-19 .
9 AB 13-Mar-19 .
9 AC 13-Mar-19 18-May-21
9 AD 18-May-21 .
10 AE 01-Feb-21 .
;;;;
data want(drop = d);
if 0 then set HAVE_DATA;
do _N_ = 1 by 1 until (last.Pt_ID | Discharge_Date);
set HAVE_DATA;
by Pt_ID;
d = Discharge_Date;
end;
do _N_ = 1 to _N_;
set HAVE_DATA;
by Pt_ID;
if last.Pt_ID and Discharge_Date = . then Discharge_Date = intnx('year', Admit_Date, 0, 'e');
if Discharge_Date = . then Discharge_Date = d;
output;
end;
run;
Result:
Pt_ID Claim_ID Admit_Date Discharge_Date 1 A 29AUG2021 09SEP2021 2 B 18JUN2021 29JUN2021 3 C 19NOV2017 31DEC2017 4 D 08NOV2020 22SEP2021 4 E 20OCT2021 18NOV2021 4 F 19NOV2021 31DEC2021 5 G 19NOV2021 19DEC2021 6 H 18JAN2021 28FEB2021 6 I 02MAR2021 10MAR2021 6 J 16MAR2021 15MAY2021 6 K 21APR2021 22APR2021 6 L 05MAY2021 24DEC2021 6 M 05MAY2021 24DEC2021 6 N 24DEC2021 28DEC2021 7 O 08JUL2020 03MAY2021 7 P 30APR2021 03MAY2021 7 Q 03MAY2021 23MAY2021 7 R 23MAY2021 01JUN2021 7 S 01JUN2021 05JUL2021 7 T 05JUL2021 12JUL2021 8 U 08JUN2020 22OCT2021 8 V 08JUN2020 22OCT2021 8 W 08JUN2020 22OCT2021 8 X 08JUN2020 22OCT2021 8 Y 08SEP2021 10SEP2021 8 Z 21OCT2021 25OCT2021 9 AA 13MAR2019 18MAY2021 9 AB 13MAR2019 18MAY2021 9 AC 13MAR2019 18MAY2021 9 AD 18MAY2021 31DEC2021 10 AE 01FEB2021 31DEC2021
See if this gives you what you want. I edited the have data a bit.
data HAVE_DATA(label='Patient Data');
infile datalines missover;
input Pt_ID Claim_ID:$2. (Admit_Date Discharge_Date)(:date9.);
format Admit_Date Discharge_Date date9.;
datalines;
1 A 29-Aug-21 09-Sep-21
2 B 18-Jun-21 29-Jun-21
3 C 19-Nov-17 .
4 D 08-Nov-20 22-Sep-21
4 E 20-Oct-21 18-Nov-21
4 F 19-Nov-21 .
5 G 19-Nov-21 19-Dec-21
6 H 18-Jan-21 28-Feb-21
6 I 02-Mar-21 10-Mar-21
6 J 16-Mar-21 15-May-21
6 K 21-Apr-21 22-Apr-21
6 L 05-May-21 .
6 M 05-May-21 24-Dec-21
6 N 24-Dec-21 28-Dec-21
7 O 08-Jul-20 30AAPR2021
7 P 30-Apr-21 03-May-21
7 Q 03-May-21 23-May-21
7 R 23-May-21 01-Jun-21
7 S 01-Jun-21 05-Jul-21
7 T 05-Jul-21 12-Jul-21
8 U 08-Jun-20 .
8 V 08-Jun-20 .
8 W 08-Jun-20 .
8 X 08-Jun-20 22-Oct-21
8 Y 08-Sep-21 10-Sep-21
8 Z 21-Oct-21 25-Oct-21
9 AA 13-Mar-19 .
9 AB 13-Mar-19 .
9 AC 13-Mar-19 18-May-21
9 AD 18-May-21 .
10 AE 01-Feb-21 .
;;;;
data want(drop = d);
if 0 then set HAVE_DATA;
do _N_ = 1 by 1 until (last.Pt_ID | Discharge_Date);
set HAVE_DATA;
by Pt_ID;
d = Discharge_Date;
end;
do _N_ = 1 to _N_;
set HAVE_DATA;
by Pt_ID;
if last.Pt_ID and Discharge_Date = . then Discharge_Date = intnx('year', Admit_Date, 0, 'e');
if Discharge_Date = . then Discharge_Date = d;
output;
end;
run;
Result:
Pt_ID Claim_ID Admit_Date Discharge_Date 1 A 29AUG2021 09SEP2021 2 B 18JUN2021 29JUN2021 3 C 19NOV2017 31DEC2017 4 D 08NOV2020 22SEP2021 4 E 20OCT2021 18NOV2021 4 F 19NOV2021 31DEC2021 5 G 19NOV2021 19DEC2021 6 H 18JAN2021 28FEB2021 6 I 02MAR2021 10MAR2021 6 J 16MAR2021 15MAY2021 6 K 21APR2021 22APR2021 6 L 05MAY2021 24DEC2021 6 M 05MAY2021 24DEC2021 6 N 24DEC2021 28DEC2021 7 O 08JUL2020 03MAY2021 7 P 30APR2021 03MAY2021 7 Q 03MAY2021 23MAY2021 7 R 23MAY2021 01JUN2021 7 S 01JUN2021 05JUL2021 7 T 05JUL2021 12JUL2021 8 U 08JUN2020 22OCT2021 8 V 08JUN2020 22OCT2021 8 W 08JUN2020 22OCT2021 8 X 08JUN2020 22OCT2021 8 Y 08SEP2021 10SEP2021 8 Z 21OCT2021 25OCT2021 9 AA 13MAR2019 18MAY2021 9 AB 13MAR2019 18MAY2021 9 AC 13MAR2019 18MAY2021 9 AD 18MAY2021 31DEC2021 10 AE 01FEB2021 31DEC2021
Thank you very much!! That worked.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.