BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mikemangini
Obsidian | Level 7

I have a rather complex merge operation I am trying to perform on inpatient and outpatient medical claims.

 

The basic idea is once a patient has been discharged from an inpatient hospital stay, identify the first consecutive outpatient PCP visit, and the first consecutive outpatient Specialist visit. There may be many but I only want the first for each. If there are none, then it should be tracked as a missing value. There may be many discharges for the same patient. If the next consecutive inpatient row for the patient has a readmission_flag = Y, then I'd like to track that too.

 

I am wondering if advice can be offered on how to do this, either by setting up the initial data steps differently, or by creating a merge with several considerations. Thank you for your help.

 

 

Have - Inpatient                  
Patient_ID Diagnostic_Category Type_of_Service Admit_Date Discharge_Date Readmission_Flag        
A Nervous System Medical 21-Jan-16 15-Feb-16          
A Circulatory System Surgical 15-Jul-16 21-Jul-16          
A Infection Medical 31-Jul-16 2-Aug-16 Y        
                   
Have - Outpatient                
Patient_ID Type_of_Service OP_Visit_DT PROVIDER            
A Lab 1-Jan-16 Specialist            
A Office Visit 15-Jan-16 PCP            
A Office Visit 18-Feb-16 Specialist            
A Office Visit 20-Feb-16 PCP            
A Office Visit 22-Feb-16 PCP            
A Office Visit 15-Mar-16 Specialist            
A Office Visit 12-Sep-16 Specialist            
A Office Visit 1-Oct-16 Specialist            
                   
                   
Merged- Want                  
Patient_ID Diagnostic_Category Type_of_Service Admit_Date Discharge_Date First_OP_Visit_Specailist_Dt First_OP_Visit_Specailist_TOS First_OP_Visit_PCP_Dt First_OP_Visit_PCP_TOS Resulted _Readmission
A Nervous System Medical 21-Jan-16 15-Feb-16 18-Feb-16 Office Visit 20-Feb-16 Office Visit  
A Circulatory System Surgical 15-Jul-16 21-Jul-16 12-Sep-16 Office Visit na na Y
A Infection Medical 31-Jul-16 2-Aug-16 12-Sep-16 Office Visit na na  
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

I hope next tested code will enable you to addapt it finnaly to your needs,

especially addapt the macro variables on top of the program.

You may need addapt also some output dates if you want them as character (including 'na') instead numeric.

 

libname pat '/folders/myshortcuts/My_Folders/patient_data';
%let max_array = 10; /* max number of Specialist, PCP occurences per patient_id */
%let max_srv = 3;    /* number of unique values of Type_Of_Service */

/* preparing dataset inpatient by adding seq no per patient_id */
proc sort data=pat.ip; by Patient_ID Admit_Date; run;

/* preparing outpatient dataset */
proc sort data=pat.op; by Patient_ID provider OP_Visit_DT; run;

data op_arrays (keep=Patient_ID  opspdt1-opspdt&max_array  opcpdt1-opcpdt&max_array
                     spsrv1-spsrv&max_array  pcpsrv1-pcpsrv&max_array )  
     srv_array (keep=last_srv serv1-serv&max_srv);
   ; 
 set pat.op  end=eof;
  by Patient_ID notsorted provider;
  
     length serv1-serv&max_srv $12; 
     retain opspdt1-opspdt&max_array  
            opcpdt1-opcpdt&max_array
            spsrv1 -spsrv&max_array
            pcpsrv1-pcpsrv&max_array
            serv1-serv&max_srv last_srv;
     array spdt  {&max_array} opspdt1-opspdt&max_array;
     array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
     array spsrv {&max_array} spsrv1-spsrv&max_array;
     array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
     
     array srv serv1-serv&max_srv;  /* accumulated list of type_of_service */
     
     if _N_ = 1 then do;
        i_srv = 1;
        last_srv = 1;
        srv(i_srv) = Type_of_Service;
     end;
        
     i_srv = whichc(Type_of_Service , of srv(*));
     if i_srv = 0 
        then do;
           last_srv+1;
           i_srv = last_srv;
           srv(i_srv) = Type_of_Service;  
        end;
      
     if first.patient_id then do;
        do i=1 to &max_array; 
           spdt(i)=.;   spsrv(i)=.;
           pcpdt(i)=.;  pcpsrv(i)=.;
        end;
        i=1; j=1;  /* i, j - pointers to arrays */
     end;
     
     if provider = 'Specialist' then do;
        spsrv(i) = i_srv;
        spdt(i)  = op_visit_DT; 
        i+1;
     end; else
     if provider = 'PCP' then do;
        pcpsrv(j) = i_srv;
        pcpdt(j) = op_visit_DT; j+1;
     end; 
     else put '>>>Warning: ' provider=;
     
     if last.patient_id then output op_arrays;
     if eof then do;
        output srv_array;
        put '>>> There are ' last_srv= ' Type_of_service.';
     end;
 RUN;
 
 
 data ip_op;
      /* dummy format statement to define order of variables */
     format patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
          First_op_Visit_Specialist_DT  First_op_Visit_Specialist_TOS
          First_op_Visit_PCP_DT  First_op_Visit_PCP_TOS  Readmission_Flag
       ;
     format Admit_Date Discharge_Date 
            First_op_Visit_Specialist_DT 
            First_op_Visit_PCP_DT   date9.;
      
      retain phase 0 serv1-serv&max_srv last_srv;
      if phase=0 then do;
         set srv_array(obs=1);
         phase=1;
      end;
      
 merge pat.ip (in=inip)
       op_arrays;
  by Patient_ID;
     if inip;
     
     length First_op_Visit_Specialist_DT   First_op_Visit_PCP_DT 8
            First_op_Visit_Specialist_TOS  First_op_Visit_PCP_TOS $12
         ;
     FORMAT First_op_Visit_Specialist_DT   First_op_Visit_PCP_DT DATE9.;
     
     array spdt  {&max_array} opspdt1-opspdt&max_array;
     array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
     array spsrv {&max_array} spsrv1-spsrv&max_array;
     array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
     
     array srv serv1-serv&max_srv;  /* accumulated list of type_of_service */

     i=1;
     do until(spdt(i) > Discharge_date); 
        i+1; 
        if i > &max_array then leave; 
     end;
     if i > &max_array then do;
         First_op_Visit_Specialist_TOS = 'na';
         First_op_Visit_Specialist_DT  = .;       
     end; 
     else do;
         i_srv =  spsrv(i);
         First_op_Visit_Specialist_TOS = srv(i_srv);
         First_op_Visit_Specialist_DT  = spdt(i);
     end;
     
     i=1;
     do until(pcpdt(i) > Discharge_date); 
        i+1; 
        if i > &max_array then leave;         
     end; 
     if i > &max_array then do;
        First_op_Visit_PCP_TOS = 'na';
        First_op_Visit_PCP_DT  = .;     
     end;
     else do;
         i_srv = pcpsrv(i);
         First_op_Visit_PCP_TOS = srv(i_srv);
         First_op_Visit_PCP_DT  = pcpdt(i);
     end;
     
     KEEP patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
          First_op_Visit_Specialist_DT  First_op_Visit_Specialist_TOS
          First_op_Visit_PCP_DT  First_op_Visit_PCP_TOS  Readmission_Flag
       ;

run;
 
 
      

View solution in original post

13 REPLIES 13
Shmuel
Garnet | Level 18

I have tried to merge the two datasets and came to the conclusion that

it is absolutely not clear what are the rules - which outpatient row to merge

to a which inpatient row.

 

Besides, it is difficult to convert the two datasets, given as tables, into test data.

Please post it as text/csv files.

mikemangini
Obsidian | Level 7

Thank you for taking the time to look into it. Sorry if my description wasn't clear. The merge is pretty complicated and I still am not able to solve it myself - so maybe I need a new approach. Anyway - I attached 3 SAS data sets.

 

At a high level, I work for a Health Plan, and we would like to determine the rate at which patients follow up with either a PCP or Specailst following a hospital admission.

 

Ip - ipatient discharge information

Op - outpatient visit information. Visits can be with either a PCP or Specialist.

 

What I would like to do is take the IP data set, and for a given patient and discharge date, scan the Op data set for the first outpatient visit that occured after the discharge. I would like to do this twice: for the first PCP vist, and the first Specialist vist. So for every patient/discharge date combination, we identify the first subsequent outpatient visit with a PCP, and the first visit with a specialist. If no visit occured in either category, the field should be blank.

 

The final step is to determine whether that hospital discharge was followed by a readmission. This way we can track how follow up visits and readmissions are related. So in this case in the IP table, if a patient/discharge date combination is followed by a row containing (Readmission_Flag = Y), then on the final data set I would like to populate a field identifying this discharge as having a readmission. So for example if a patient was discharged Jan 1 and the row following it has a Readmission_Flag = Y then I want to flag the Jan 1 discharge as "resulting in a readmission).

 

Hope that clarifies, and again thank you for helping with this project.

 

Shmuel
Garnet | Level 18

Few more questions:

1) Would you say that I can filter or neglect LAB rows from OP ?

2) It seems to me that it will be helpfull to transpose OP dataset with two arrays of visit dates,

    one of SPECIALIST and the other of PCP, than merge OP data with IP by patient_ID.

    Searching for 1st date ge IP.discharge_date is then very easy.

    The question is - what is the upper limit of each kind of visits? would 100 be enough ?

mikemangini
Obsidian | Level 7

1) It is just a sample fake data set I have created, so while all rows but one say Office Visit, it can really take on many other values like "behavioral health visit" or "radiology". So the detail in the row is important. I will need to know what kind of visit that first outpatient visit is.

2) There really is no upper limit, if the time period i am looking at is 1 year, a patient could essentially have more than 100 outpatient office visits. Is it just a matter of computer resources? So if your solution bounds the array at 100 values, could I increase it if needed?

Shmuel
Garnet | Level 18

I hope next tested code will enable you to addapt it finnaly to your needs,

especially addapt the macro variables on top of the program.

You may need addapt also some output dates if you want them as character (including 'na') instead numeric.

 

libname pat '/folders/myshortcuts/My_Folders/patient_data';
%let max_array = 10; /* max number of Specialist, PCP occurences per patient_id */
%let max_srv = 3;    /* number of unique values of Type_Of_Service */

/* preparing dataset inpatient by adding seq no per patient_id */
proc sort data=pat.ip; by Patient_ID Admit_Date; run;

/* preparing outpatient dataset */
proc sort data=pat.op; by Patient_ID provider OP_Visit_DT; run;

data op_arrays (keep=Patient_ID  opspdt1-opspdt&max_array  opcpdt1-opcpdt&max_array
                     spsrv1-spsrv&max_array  pcpsrv1-pcpsrv&max_array )  
     srv_array (keep=last_srv serv1-serv&max_srv);
   ; 
 set pat.op  end=eof;
  by Patient_ID notsorted provider;
  
     length serv1-serv&max_srv $12; 
     retain opspdt1-opspdt&max_array  
            opcpdt1-opcpdt&max_array
            spsrv1 -spsrv&max_array
            pcpsrv1-pcpsrv&max_array
            serv1-serv&max_srv last_srv;
     array spdt  {&max_array} opspdt1-opspdt&max_array;
     array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
     array spsrv {&max_array} spsrv1-spsrv&max_array;
     array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
     
     array srv serv1-serv&max_srv;  /* accumulated list of type_of_service */
     
     if _N_ = 1 then do;
        i_srv = 1;
        last_srv = 1;
        srv(i_srv) = Type_of_Service;
     end;
        
     i_srv = whichc(Type_of_Service , of srv(*));
     if i_srv = 0 
        then do;
           last_srv+1;
           i_srv = last_srv;
           srv(i_srv) = Type_of_Service;  
        end;
      
     if first.patient_id then do;
        do i=1 to &max_array; 
           spdt(i)=.;   spsrv(i)=.;
           pcpdt(i)=.;  pcpsrv(i)=.;
        end;
        i=1; j=1;  /* i, j - pointers to arrays */
     end;
     
     if provider = 'Specialist' then do;
        spsrv(i) = i_srv;
        spdt(i)  = op_visit_DT; 
        i+1;
     end; else
     if provider = 'PCP' then do;
        pcpsrv(j) = i_srv;
        pcpdt(j) = op_visit_DT; j+1;
     end; 
     else put '>>>Warning: ' provider=;
     
     if last.patient_id then output op_arrays;
     if eof then do;
        output srv_array;
        put '>>> There are ' last_srv= ' Type_of_service.';
     end;
 RUN;
 
 
 data ip_op;
      /* dummy format statement to define order of variables */
     format patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
          First_op_Visit_Specialist_DT  First_op_Visit_Specialist_TOS
          First_op_Visit_PCP_DT  First_op_Visit_PCP_TOS  Readmission_Flag
       ;
     format Admit_Date Discharge_Date 
            First_op_Visit_Specialist_DT 
            First_op_Visit_PCP_DT   date9.;
      
      retain phase 0 serv1-serv&max_srv last_srv;
      if phase=0 then do;
         set srv_array(obs=1);
         phase=1;
      end;
      
 merge pat.ip (in=inip)
       op_arrays;
  by Patient_ID;
     if inip;
     
     length First_op_Visit_Specialist_DT   First_op_Visit_PCP_DT 8
            First_op_Visit_Specialist_TOS  First_op_Visit_PCP_TOS $12
         ;
     FORMAT First_op_Visit_Specialist_DT   First_op_Visit_PCP_DT DATE9.;
     
     array spdt  {&max_array} opspdt1-opspdt&max_array;
     array pcpdt {&max_array} opcpdt1-opcpdt&max_array;
     array spsrv {&max_array} spsrv1-spsrv&max_array;
     array pcpsrv{&max_array} pcpsrv1-pcpsrv&max_array;
     
     array srv serv1-serv&max_srv;  /* accumulated list of type_of_service */

     i=1;
     do until(spdt(i) > Discharge_date); 
        i+1; 
        if i > &max_array then leave; 
     end;
     if i > &max_array then do;
         First_op_Visit_Specialist_TOS = 'na';
         First_op_Visit_Specialist_DT  = .;       
     end; 
     else do;
         i_srv =  spsrv(i);
         First_op_Visit_Specialist_TOS = srv(i_srv);
         First_op_Visit_Specialist_DT  = spdt(i);
     end;
     
     i=1;
     do until(pcpdt(i) > Discharge_date); 
        i+1; 
        if i > &max_array then leave;         
     end; 
     if i > &max_array then do;
        First_op_Visit_PCP_TOS = 'na';
        First_op_Visit_PCP_DT  = .;     
     end;
     else do;
         i_srv = pcpsrv(i);
         First_op_Visit_PCP_TOS = srv(i_srv);
         First_op_Visit_PCP_DT  = pcpdt(i);
     end;
     
     KEEP patient_id Diagnostic_Category Type_of_Service Admit_Date Discharge_Date
          First_op_Visit_Specialist_DT  First_op_Visit_Specialist_TOS
          First_op_Visit_PCP_DT  First_op_Visit_PCP_TOS  Readmission_Flag
       ;

run;
 
 
      
mikemangini
Obsidian | Level 7

Thank you so much for this solution. I spent today testing the logic with my actual data set with several hundred thousand rows of data and it worked sucessfully! I appreciate the time and effort you put into this.

 

I am curious, can you recommend any trainings or resources to learn this type of advanced SAS logic? This solution is much more sophisticated than anything I would have been able to come up with on my own.

 

Best,

Mike

Shmuel
Garnet | Level 18

@mikemangini, you asked for:

can you recommend any trainings or resources to learn this type of advanced SAS logic?

 

You are probably an analyst while I'm a programmer who retired from work.

I'm glad that I can contibute from my knowledge and time and at same time fulfill my free time.

 

Try to understand the program, the structure and order of steps, functions and mainly the use of RETAIN.

If you realy want to learn it then break the program into pieces and be sure you understand each part of it.

 

Programming is a kind of art based on logic and programming language (any, not only SAS) and

a lot of experience.

 

I'm not an analyst. 

Wish you success.

 

Shmuel

mikemangini
Obsidian | Level 7

Good morning. I am doing more testing today and found a minor issue with the code. I attached three new sample data sets ip2.sas7bdat,op2.sas7bdat, op3.sas7bdat. If you run your code on the data sets it will highlight the issue.

 

Considering the new sample data I provided: the patient in this case was discharged from the hospital on Mar 5 2015. In the Op2 data set, there is a PCP visit on Mar 31 2015. However, the final data set, Ip_op, shows the first PCP visit post discharge is Nov 23 2015. So the program overlooked the Mar 31 visit, which was actually the first OP visit post admission.

 

What seems to be happening is the code is missing the first obersvation by PCP and Specialist. In the final sample data set, Op3, i removed all specialist rows except for one. The encounter occurs after the inpatient discharge, but the program overlooks it and reports back no specialist visit occurred.

 

I am working on trying to modify the code to fix the issue but haven't nailed it down yet. I am wondering if you could take a look.

 

Best,

Mike

Shmuel
Garnet | Level 18

I don't know is it a SAS bug or my bug, anyway I found how to fix it:

 

I changed two lines, instead 

do until(spdt(i) > Discharge_date);

it should be:  do while (spdt(i) le Discharge_Date);

 

similar change, instaed:

do until(pcpdt(i) > Discharge_date); 

 it should be:  do while (pcpdt(i) le Discharge_Date);

Shmuel
Garnet | Level 18
I couldn't download OP3 dataset -
I got 3 times net error.
mikemangini
Obsidian | Level 7

THANK YOU. I tested it out on Op3 and it worked as expected. Thank you again for finding the solution so quickly.

mikemangini
Obsidian | Level 7

Hi - I am wondering if you can help again modifying this code. I'd like to add another condition to the specialist look up that matches a behavioral health hospital discharge with a behavioral health specialist visit. So if the hospital discharge has a behavioral health description, as you are scanning the array for specalist visits, ignore anything that doesn't have service type that is behavioral health related. 

 

Here is an example of the Srv_array using my actualy data:

serv1

serv2

serv3

serv4

serv5

serv6

serv7

last_srv

VSSPC

MH

MH/SA

VSPCP

SURG

SA

CONSUL

7

 

At the portion of your code that starts here: 

 i=1;
     do until(spdt(i) > Discharge_date); ......

 

I would like to add the logic: IF Inpatient Type_of_Service in (‘Behavioral’, ‘Substance Abuse’), then take the first visit with the service type array value:

spsrv(i) = 'MH' or 'MH/SA' or 'SA'

If non of these values are found, then no follow up occurred and an 'NA' is placed. 

 

I tried adding this in and had trouble with arrays going out of bounds. Is it something you might be able to help with? 

mikemangini
Obsidian | Level 7

Please disregard new request. I believe I came up with a solution after a few days. Not the most elegant but gets the job done - I split the two data sets into two, behavioral health and non, and pass them seperately through the aglorithm then union them at the end. 

 

Thanks again for your support.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2044 views
  • 5 likes
  • 2 in conversation