Hello everyone, I would love some guidance. I have a dataset of ER visits and hospital admissions. I have transposed the data from long to wide. Here is an example below:
patient_id | encounter_type1 | encounter_type2 | admtdate1 | admtdate2 | dschdate1 | dschdate2 | status1 | status2 |
1 | er |
| 1/1/2020 |
|
| 1/1/2020 | died |
|
2 | er | inpatient | 3/2/2020 | 3/6/2020 | 3/2/2020 | 5/3/2020 | alive | died |
3 | er | er | 4/4/2020 | 4/5/202 | 4/4/2020 | 4/5/2020 | alive | died |
4 | inpatient | er | 5/2/2020 | 5/12/2020 | 5/7/2020 | 5/12/2020 | alive | alive |
encounter_type = ER visit or inpatient (i.e. hospital admission)
admtdate = date of admission (inpatient) or service (ER)
dschdate = date of discharge
status = alive or not
I now want to: 1) Create a new variable that sums how many ER visits someone had within the month prior to a hospital admission (if they were admitted); and 2) Create a new variable that specifies where a person died (ER or inpatient)
I am imagining the output looking something like this:
patient_id | encounter_type1 | encounter_type2 | admtdate1 | admtdate2 | dschdate1 | dschdate2 | status1 | status2 | newvar1 | newvar2 |
1 | er |
| 1/1/2020 |
|
| 1/1/2020 | died |
| . | er |
2 | er | inpatient | 3/2/2020 | 3/6/2020 | 3/2/2020 | 5/3/2020 | alive | died | 1 | inpatient |
3 | er | er | 4/4/2020 | 4/5/202 | 4/4/2020 | 4/5/2020 | alive | died | . | er |
4 | inpatient | er | 5/2/2020 | 5/12/2020 | 5/7/2020 | 5/12/2020 | alive | alive | 0 |
|
I thank you in advance for your time and expertise.
One way to approach this is with "double DoW loop". You read the data twice in one data step, one patient at a time.
data have;
input
patient_id $
encounter_type :$10.
admtdate :mmddyy10.
dschdate :mmddyy10.
status $
;
format
admtdate
dschdate yymmdd10.
;
datalines;
1 er 1/1/2020 1/1/2020 died
2 er 3/2/2020 3/2/2020 alive
2 inpatient 3/6/2020 5/3/2020 died
3 er 4/4/2020 4/4/2020 alive
3 er 4/5/2020 4/5/2020 died
4 inpatient 5/2/2020 5/7/2020 alive
4 er 5/12/2020 5/12/2020 alive
;
data want (keep=patient_id inpatientdate ercount deathlocation);
*find the inpatient admission date for the patient (will be null if no inpatient admission) ;
do until (last.patient_id) ;
set have ;
by patient_id ;
if encounter_type='inpatient' then inpatientdate=admtdate ;
end ;
ercount=0 ;
*look through all the records for the patient again, count the er addmisions within 30 days of inpatient ;
*and assign the death location ;
do until (last.patient_id) ;
set have ;
by patient_id ;
if encounter_type='er' and (not missing(inpatientdate)) then do ;
if (0<=(inpatientdate-admtdate)<=30) then ercount=ercount+1 ;
end ;
if status='died' then deathlocation=encounter_type ;
end ;
output ;
format inpatientdate yymmdd10. ;
run ;
proc print data=want ;
run ;
Returns:
patient_ Obs id inpatientdate ercount deathlocation 1 1 . 0 er 2 2 2020-03-06 1 inpatient 3 3 . 0 er 4 4 2020-05-02 0
Please post an example of your untransposed (long) data.
Please see below:
patient_id | encounter_type | admtdate | dschdate | status |
1 | er | 1/1/2020 | 1/1/2020 | died |
2 | er | 3/2/2020 | 3/2/2020 | alive |
2 | inpatient | 3/6/2020 | 5/3/2020 | died |
3 | er | 4/4/2020 | 4/4/2020 | alive |
3 | er | 4/5/2020 | 4/5/2020 | died |
4 | inpatient | 5/2/2020 | 5/7/2020 | alive |
4 | er | 5/12/2020 | 5/12/2020 | alive |
See this code:
data have;
input
patient_id $
encounter_type :$10.
admtdate :mmddyy10.
dschdate :mmddyy10.
status $
;
format
admtdate
dschdate yymmdd10.
;
datalines;
1 er 1/1/2020 1/1/2020 died
2 er 3/2/2020 3/2/2020 alive
2 inpatient 3/6/2020 5/3/2020 died
3 er 4/4/2020 4/4/2020 alive
3 er 4/5/2020 4/5/2020 died
4 inpatient 5/2/2020 5/7/2020 alive
4 er 5/12/2020 5/12/2020 alive
;
%let start = %sysfunc(inputn(2000-01-01,yymmdd10.));
%let end = %sysfunc(today());
data want;
array visits {&start.:&end.} _temporary_;
set have;
by patient_id;
retain
newvar1
newvar2
;
if first.patient_id
then do;
call missing(of visits{*});
newvar1 = .;
newvar2 = " ";
end;
visits{admtdate} = (encounter_type = "er");
if encounter_type = "inpatient" then newvar1 = sum(of visits{*});
if status = "died" then newvar2 = encounter_type;
if last.patient_id
then do;
output;
end;
keep patient_id newvar1 newvar2;
run;
proc print data=want noobs;
run;
Result:
patient_id newvar1 newvar2 1 . er 2 1 inpatient 3 . er 4 0
If you need to deal with multiple "inpatient" encounters, the code needs to be modified for this (resetting the array, OUTPUTting).
Thank you so much for taking the time to help me with this code. The one outstanding question I have is how to ensure that only ER visits within the 30 days prior to the inpatient visit are counted? Thus no ER visits after the inpatient visit are counted nor are ER visits 30+ days prior to the inpatient visit counted.
I knew there was something 😉
I will dig into this tomorrow.
Can a patient have two in-patient visits?
If a patient has an ER visit on Jan 1, then in-patient on Jan 10, then ER visit January 15 and and in-patient on January 25, how would you want to count the ER visits? Would you want 1 ER visit prior to the JAN 10 in-patient visit and 2 ER visits prior to the January 25 in-patient visit (so essentially double- count the Jan 1 ER visit)? Or would you want the January 25 in-patient visit to have just 1 prior ER visit?
This is a great question. Sorry I didn't address it in my original post. For right now we are focusing on how many ER visits a given patient has in the 30 days prior to their initial inpatient admission. Thus, this dataset only includes one inpatient visit per rln (patient).
One way to approach this is with "double DoW loop". You read the data twice in one data step, one patient at a time.
data have;
input
patient_id $
encounter_type :$10.
admtdate :mmddyy10.
dschdate :mmddyy10.
status $
;
format
admtdate
dschdate yymmdd10.
;
datalines;
1 er 1/1/2020 1/1/2020 died
2 er 3/2/2020 3/2/2020 alive
2 inpatient 3/6/2020 5/3/2020 died
3 er 4/4/2020 4/4/2020 alive
3 er 4/5/2020 4/5/2020 died
4 inpatient 5/2/2020 5/7/2020 alive
4 er 5/12/2020 5/12/2020 alive
;
data want (keep=patient_id inpatientdate ercount deathlocation);
*find the inpatient admission date for the patient (will be null if no inpatient admission) ;
do until (last.patient_id) ;
set have ;
by patient_id ;
if encounter_type='inpatient' then inpatientdate=admtdate ;
end ;
ercount=0 ;
*look through all the records for the patient again, count the er addmisions within 30 days of inpatient ;
*and assign the death location ;
do until (last.patient_id) ;
set have ;
by patient_id ;
if encounter_type='er' and (not missing(inpatientdate)) then do ;
if (0<=(inpatientdate-admtdate)<=30) then ercount=ercount+1 ;
end ;
if status='died' then deathlocation=encounter_type ;
end ;
output ;
format inpatientdate yymmdd10. ;
run ;
proc print data=want ;
run ;
Returns:
patient_ Obs id inpatientdate ercount deathlocation 1 1 . 0 er 2 2 2020-03-06 1 inpatient 3 3 . 0 er 4 4 2020-05-02 0
Thank you for the beautiful solution and for explaining the code. On to the next part of the analysis!
This is the array code, adapted to deal with the 30 days window and multiple "inpatient" encounters:
data want;
array visits {&start.:&end.} _temporary_;
set have;
by patient_id;
if first.patient_id
then do;
call missing(of visits{*});
newvar1 = .;
newvar2 = " ";
end;
visits{admtdate} = (encounter_type = "er");
if status = "died" then newvar2 = encounter_type;
if encounter_type = "inpatient"
then do;
do i = admtdate - 30 to admtdate - 1;
newvar1 + sum(0,visits{i});
end;
output;
call missing(of visits{*});
newvar1 = .;
newvar2 = "";
end;
if last.patient_id and (newvar1 ne . or newvar2 ne "")
then output;
retain
newvar2
;
keep patient_id newvar1 newvar2;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.