I have a rather complex merge operation I am trying to perform on inpatient and outpatient medical claims.
The basic idea is once a patient has been discharged from an inpatient hospital stay, identify the first consecutive outpatient PCP visit, and the first consecutive outpatient Specialist visit. There may be many but I only want the first for each. If there are none, then it should be tracked as a missing value. There may be many discharges for the same patient. If the next consecutive inpatient row for the patient has a readmission_flag = Y, then I'd like to track that too.
I am wondering if advice can be offered on how to do this, either by setting up the initial data steps differently, or by creating a merge with several considerations. Thank you for your help.
Have - Inpatient | |||||||||
Patient_ID | Diagnostic_Category | Type_of_Service | Admit_Date | Discharge_Date | Readmission_Flag | ||||
A | Nervous System | Medical | 21-Jan-16 | 15-Feb-16 | |||||
A | Circulatory System | Surgical | 15-Jul-16 | 21-Jul-16 | |||||
A | Infection | Medical | 31-Jul-16 | 2-Aug-16 | Y | ||||
Have - Outpatient | |||||||||
Patient_ID | Type_of_Service | OP_Visit_DT | PROVIDER | ||||||
A | Lab | 1-Jan-16 | Specialist | ||||||
A | Office Visit | 15-Jan-16 | PCP | ||||||
A | Office Visit | 18-Feb-16 | Specialist | ||||||
A | Office Visit | 20-Feb-16 | PCP | ||||||
A | Office Visit | 22-Feb-16 | PCP | ||||||
A | Office Visit | 15-Mar-16 | Specialist | ||||||
A | Office Visit | 12-Sep-16 | Specialist | ||||||
A | Office Visit | 1-Oct-16 | Specialist | ||||||
Merged- Want | |||||||||
Patient_ID | Diagnostic_Category | Type_of_Service | Admit_Date | Discharge_Date | First_OP_Visit_Specailist_Dt | First_OP_Visit_Specailist_TOS | First_OP_Visit_PCP_Dt | First_OP_Visit_PCP_TOS | Resulted _Readmission |
A | Nervous System | Medical | 21-Jan-16 | 15-Feb-16 | 18-Feb-16 | Office Visit | 20-Feb-16 | Office Visit | |
A | Circulatory System | Surgical | 15-Jul-16 | 21-Jul-16 | 12-Sep-16 | Office Visit | na | na | Y |
A | Infection | Medical | 31-Jul-16 | 2-Aug-16 | 12-Sep-16 | Office Visit | na | na |
I hope next tested code will enable you to addapt it finnaly to your needs,
especially addapt the macro variables on top of the program.
You may need addapt also some output dates if you want them as character (including 'na') instead numeric.
libname pat '/folders/myshortcuts/My_Folders/patient_data';
%let max_array = 10; /* max number of Specialist, PCP occurences per patient_id */
%let max_srv = 3; /* number of unique values of Type_Of_Service */
/* preparing dataset inpatient by adding seq no per patient_id */
proc sort data=pat.ip; by Patient_ID Admit_Date; run;
/* preparing outpatient dataset */
proc sort data=pat.op; by Patient_ID provider OP_Visit_DT; run;
data op_arrays (keep=Patient_ID opspdt1-opspdt&max_array opcpdt1-opcpdt&max_array
spsrv1-spsrv&max_array pcpsrv1-pcpsrv&max_array )
srv_array (keep=last_srv serv1-serv&max_srv);
;
set pat.op end=eof;
by Patient_ID notsorted provider;
length serv1-serv&max_srv $12;
retain opspdt1-opspdt&max_array
opcpdt1-opcpdt&max_array
spsrv1 -spsrv&max_array
pcpsrv1-pcpsrv&max_array
serv1-serv&max_srv last_srv;
array spdt {&max_array} opspdt1-opspdt&max_array;
array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
array spsrv {&max_array} spsrv1-spsrv&max_array;
array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
array srv serv1-serv&max_srv; /* accumulated list of type_of_service */
if _N_ = 1 then do;
i_srv = 1;
last_srv = 1;
srv(i_srv) = Type_of_Service;
end;
i_srv = whichc(Type_of_Service , of srv(*));
if i_srv = 0
then do;
last_srv+1;
i_srv = last_srv;
srv(i_srv) = Type_of_Service;
end;
if first.patient_id then do;
do i=1 to &max_array;
spdt(i)=.; spsrv(i)=.;
pcpdt(i)=.; pcpsrv(i)=.;
end;
i=1; j=1; /* i, j - pointers to arrays */
end;
if provider = 'Specialist' then do;
spsrv(i) = i_srv;
spdt(i) = op_visit_DT;
i+1;
end; else
if provider = 'PCP' then do;
pcpsrv(j) = i_srv;
pcpdt(j) = op_visit_DT; j+1;
end;
else put '>>>Warning: ' provider=;
if last.patient_id then output op_arrays;
if eof then do;
output srv_array;
put '>>> There are ' last_srv= ' Type_of_service.';
end;
RUN;
data ip_op;
/* dummy format statement to define order of variables */
format patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
First_op_Visit_Specialist_DT First_op_Visit_Specialist_TOS
First_op_Visit_PCP_DT First_op_Visit_PCP_TOS Readmission_Flag
;
format Admit_Date Discharge_Date
First_op_Visit_Specialist_DT
First_op_Visit_PCP_DT date9.;
retain phase 0 serv1-serv&max_srv last_srv;
if phase=0 then do;
set srv_array(obs=1);
phase=1;
end;
merge pat.ip (in=inip)
op_arrays;
by Patient_ID;
if inip;
length First_op_Visit_Specialist_DT First_op_Visit_PCP_DT 8
First_op_Visit_Specialist_TOS First_op_Visit_PCP_TOS $12
;
FORMAT First_op_Visit_Specialist_DT First_op_Visit_PCP_DT DATE9.;
array spdt {&max_array} opspdt1-opspdt&max_array;
array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
array spsrv {&max_array} spsrv1-spsrv&max_array;
array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
array srv serv1-serv&max_srv; /* accumulated list of type_of_service */
i=1;
do until(spdt(i) > Discharge_date);
i+1;
if i > &max_array then leave;
end;
if i > &max_array then do;
First_op_Visit_Specialist_TOS = 'na';
First_op_Visit_Specialist_DT = .;
end;
else do;
i_srv = spsrv(i);
First_op_Visit_Specialist_TOS = srv(i_srv);
First_op_Visit_Specialist_DT = spdt(i);
end;
i=1;
do until(pcpdt(i) > Discharge_date);
i+1;
if i > &max_array then leave;
end;
if i > &max_array then do;
First_op_Visit_PCP_TOS = 'na';
First_op_Visit_PCP_DT = .;
end;
else do;
i_srv = pcpsrv(i);
First_op_Visit_PCP_TOS = srv(i_srv);
First_op_Visit_PCP_DT = pcpdt(i);
end;
KEEP patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
First_op_Visit_Specialist_DT First_op_Visit_Specialist_TOS
First_op_Visit_PCP_DT First_op_Visit_PCP_TOS Readmission_Flag
;
run;
I have tried to merge the two datasets and came to the conclusion that
it is absolutely not clear what are the rules - which outpatient row to merge
to a which inpatient row.
Besides, it is difficult to convert the two datasets, given as tables, into test data.
Please post it as text/csv files.
Thank you for taking the time to look into it. Sorry if my description wasn't clear. The merge is pretty complicated and I still am not able to solve it myself - so maybe I need a new approach. Anyway - I attached 3 SAS data sets.
At a high level, I work for a Health Plan, and we would like to determine the rate at which patients follow up with either a PCP or Specailst following a hospital admission.
Ip - ipatient discharge information
Op - outpatient visit information. Visits can be with either a PCP or Specialist.
What I would like to do is take the IP data set, and for a given patient and discharge date, scan the Op data set for the first outpatient visit that occured after the discharge. I would like to do this twice: for the first PCP vist, and the first Specialist vist. So for every patient/discharge date combination, we identify the first subsequent outpatient visit with a PCP, and the first visit with a specialist. If no visit occured in either category, the field should be blank.
The final step is to determine whether that hospital discharge was followed by a readmission. This way we can track how follow up visits and readmissions are related. So in this case in the IP table, if a patient/discharge date combination is followed by a row containing (Readmission_Flag = Y), then on the final data set I would like to populate a field identifying this discharge as having a readmission. So for example if a patient was discharged Jan 1 and the row following it has a Readmission_Flag = Y then I want to flag the Jan 1 discharge as "resulting in a readmission).
Hope that clarifies, and again thank you for helping with this project.
Few more questions:
1) Would you say that I can filter or neglect LAB rows from OP ?
2) It seems to me that it will be helpfull to transpose OP dataset with two arrays of visit dates,
one of SPECIALIST and the other of PCP, than merge OP data with IP by patient_ID.
Searching for 1st date ge IP.discharge_date is then very easy.
The question is - what is the upper limit of each kind of visits? would 100 be enough ?
1) It is just a sample fake data set I have created, so while all rows but one say Office Visit, it can really take on many other values like "behavioral health visit" or "radiology". So the detail in the row is important. I will need to know what kind of visit that first outpatient visit is.
2) There really is no upper limit, if the time period i am looking at is 1 year, a patient could essentially have more than 100 outpatient office visits. Is it just a matter of computer resources? So if your solution bounds the array at 100 values, could I increase it if needed?
I hope next tested code will enable you to addapt it finnaly to your needs,
especially addapt the macro variables on top of the program.
You may need addapt also some output dates if you want them as character (including 'na') instead numeric.
libname pat '/folders/myshortcuts/My_Folders/patient_data';
%let max_array = 10; /* max number of Specialist, PCP occurences per patient_id */
%let max_srv = 3; /* number of unique values of Type_Of_Service */
/* preparing dataset inpatient by adding seq no per patient_id */
proc sort data=pat.ip; by Patient_ID Admit_Date; run;
/* preparing outpatient dataset */
proc sort data=pat.op; by Patient_ID provider OP_Visit_DT; run;
data op_arrays (keep=Patient_ID opspdt1-opspdt&max_array opcpdt1-opcpdt&max_array
spsrv1-spsrv&max_array pcpsrv1-pcpsrv&max_array )
srv_array (keep=last_srv serv1-serv&max_srv);
;
set pat.op end=eof;
by Patient_ID notsorted provider;
length serv1-serv&max_srv $12;
retain opspdt1-opspdt&max_array
opcpdt1-opcpdt&max_array
spsrv1 -spsrv&max_array
pcpsrv1-pcpsrv&max_array
serv1-serv&max_srv last_srv;
array spdt {&max_array} opspdt1-opspdt&max_array;
array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
array spsrv {&max_array} spsrv1-spsrv&max_array;
array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
array srv serv1-serv&max_srv; /* accumulated list of type_of_service */
if _N_ = 1 then do;
i_srv = 1;
last_srv = 1;
srv(i_srv) = Type_of_Service;
end;
i_srv = whichc(Type_of_Service , of srv(*));
if i_srv = 0
then do;
last_srv+1;
i_srv = last_srv;
srv(i_srv) = Type_of_Service;
end;
if first.patient_id then do;
do i=1 to &max_array;
spdt(i)=.; spsrv(i)=.;
pcpdt(i)=.; pcpsrv(i)=.;
end;
i=1; j=1; /* i, j - pointers to arrays */
end;
if provider = 'Specialist' then do;
spsrv(i) = i_srv;
spdt(i) = op_visit_DT;
i+1;
end; else
if provider = 'PCP' then do;
pcpsrv(j) = i_srv;
pcpdt(j) = op_visit_DT; j+1;
end;
else put '>>>Warning: ' provider=;
if last.patient_id then output op_arrays;
if eof then do;
output srv_array;
put '>>> There are ' last_srv= ' Type_of_service.';
end;
RUN;
data ip_op;
/* dummy format statement to define order of variables */
format patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
First_op_Visit_Specialist_DT First_op_Visit_Specialist_TOS
First_op_Visit_PCP_DT First_op_Visit_PCP_TOS Readmission_Flag
;
format Admit_Date Discharge_Date
First_op_Visit_Specialist_DT
First_op_Visit_PCP_DT date9.;
retain phase 0 serv1-serv&max_srv last_srv;
if phase=0 then do;
set srv_array(obs=1);
phase=1;
end;
merge pat.ip (in=inip)
op_arrays;
by Patient_ID;
if inip;
length First_op_Visit_Specialist_DT First_op_Visit_PCP_DT 8
First_op_Visit_Specialist_TOS First_op_Visit_PCP_TOS $12
;
FORMAT First_op_Visit_Specialist_DT First_op_Visit_PCP_DT DATE9.;
array spdt {&max_array} opspdt1-opspdt&max_array;
array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
array spsrv {&max_array} spsrv1-spsrv&max_array;
array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
array srv serv1-serv&max_srv; /* accumulated list of type_of_service */
i=1;
do until(spdt(i) > Discharge_date);
i+1;
if i > &max_array then leave;
end;
if i > &max_array then do;
First_op_Visit_Specialist_TOS = 'na';
First_op_Visit_Specialist_DT = .;
end;
else do;
i_srv = spsrv(i);
First_op_Visit_Specialist_TOS = srv(i_srv);
First_op_Visit_Specialist_DT = spdt(i);
end;
i=1;
do until(pcpdt(i) > Discharge_date);
i+1;
if i > &max_array then leave;
end;
if i > &max_array then do;
First_op_Visit_PCP_TOS = 'na';
First_op_Visit_PCP_DT = .;
end;
else do;
i_srv = pcpsrv(i);
First_op_Visit_PCP_TOS = srv(i_srv);
First_op_Visit_PCP_DT = pcpdt(i);
end;
KEEP patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
First_op_Visit_Specialist_DT First_op_Visit_Specialist_TOS
First_op_Visit_PCP_DT First_op_Visit_PCP_TOS Readmission_Flag
;
run;
Thank you so much for this solution. I spent today testing the logic with my actual data set with several hundred thousand rows of data and it worked sucessfully! I appreciate the time and effort you put into this.
I am curious, can you recommend any trainings or resources to learn this type of advanced SAS logic? This solution is much more sophisticated than anything I would have been able to come up with on my own.
Best,
Mike
@mikemangini, you asked for:
can you recommend any trainings or resources to learn this type of advanced SAS logic?
You are probably an analyst while I'm a programmer who retired from work.
I'm glad that I can contibute from my knowledge and time and at same time fulfill my free time.
Try to understand the program, the structure and order of steps, functions and mainly the use of RETAIN.
If you realy want to learn it then break the program into pieces and be sure you understand each part of it.
Programming is a kind of art based on logic and programming language (any, not only SAS) and
a lot of experience.
I'm not an analyst.
Wish you success.
Shmuel
Good morning. I am doing more testing today and found a minor issue with the code. I attached three new sample data sets ip2.sas7bdat,op2.sas7bdat, op3.sas7bdat. If you run your code on the data sets it will highlight the issue.
Considering the new sample data I provided: the patient in this case was discharged from the hospital on Mar 5 2015. In the Op2 data set, there is a PCP visit on Mar 31 2015. However, the final data set, Ip_op, shows the first PCP visit post discharge is Nov 23 2015. So the program overlooked the Mar 31 visit, which was actually the first OP visit post admission.
What seems to be happening is the code is missing the first obersvation by PCP and Specialist. In the final sample data set, Op3, i removed all specialist rows except for one. The encounter occurs after the inpatient discharge, but the program overlooks it and reports back no specialist visit occurred.
I am working on trying to modify the code to fix the issue but haven't nailed it down yet. I am wondering if you could take a look.
Best,
Mike
I don't know is it a SAS bug or my bug, anyway I found how to fix it:
I changed two lines, instead
do until(spdt(i) > Discharge_date);
it should be: do while (spdt(i) le Discharge_Date);
similar change, instaed:
do until(pcpdt(i) > Discharge_date);
it should be: do while (pcpdt(i) le Discharge_Date);
THANK YOU. I tested it out on Op3 and it worked as expected. Thank you again for finding the solution so quickly.
Hi - I am wondering if you can help again modifying this code. I'd like to add another condition to the specialist look up that matches a behavioral health hospital discharge with a behavioral health specialist visit. So if the hospital discharge has a behavioral health description, as you are scanning the array for specalist visits, ignore anything that doesn't have service type that is behavioral health related.
Here is an example of the Srv_array using my actualy data:
serv1 |
serv2 |
serv3 |
serv4 |
serv5 |
serv6 |
serv7 |
last_srv |
VSSPC |
MH |
MH/SA |
VSPCP |
SURG |
SA |
CONSUL |
7 |
At the portion of your code that starts here:
i=1;
do until(spdt(i) > Discharge_date); ......
I would like to add the logic: IF Inpatient Type_of_Service in (‘Behavioral’, ‘Substance Abuse’), then take the first visit with the service type array value:
spsrv(i) = 'MH' or 'MH/SA' or 'SA'
If non of these values are found, then no follow up occurred and an 'NA' is placed.
I tried adding this in and had trouble with arrays going out of bounds. Is it something you might be able to help with?
Please disregard new request. I believe I came up with a solution after a few days. Not the most elegant but gets the job done - I split the two data sets into two, behavioral health and non, and pass them seperately through the aglorithm then union them at the end.
Thanks again for your support.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.