Here are two sample datasets (they are all fake datasets) outpatient https://drive.google.com/open?id=179_L_qnZdKY5-EZnwy4BEzyEZP-VT-in inpatient https://drive.google.com/open?id=1vhUa_yTflLEXSR6xdZG_hOYxCPRnXVkw I did not distinguish between filtered/master here, but it's easy to create filters (any random filter that restrict the dataset to a smaller one will serve as an example, it will work even if you don't distinguish, just duplicate and rename one as master and another as filtered)
I have two datasets that look like this, study inter-hospital transfer My dataset is enormous. I am able to do the whole thing in SAS, but is very very very slow :((( I will show my code here, but I am seeking ways to improve running time.
  master_inpatient
   ID   admsn_dt    thru_dt      prvdr_num       
    341   2013-04-01  2013-04-02    G
    230   2013-06-01  2013-06-03    I
    232   2013-07-31  2013-07-31    F
    124   2013-04-29  2013-04-29    C
    232   2013-07-31  2013-08-20    Q
  filtered_inpatient
   ID   admsn_dt    thru_dt      prvdr_num       
    341   2013-04-01  2013-04-02    G
    232   2013-07-31  2013-07-31    F
    232   2013-07-31  2013-08-20    Q
   master_outpatient
   ID     thru_dt     prvdr_num
    348   2013-09-23   Z
    124   2013-04-29   A
    331   2013-06-14   G
    439   2013-02-01   B
    331   2013-06-14   D
   filtered_outpatient
   ID     thru_dt     prvdr_num
    124   2013-04-29   A
    331   2013-06-14   G
    439   2013-02-01   B
    331   2013-06-14   DI have two master datasets: an inpatient dataset and an outpatient dataset, and two filtered datasets: some filter on diganosis (e.g., including only patients with diagnosis of TB) is applied to the master dataset, make the dataset shorter than master dataset ID is patient ID, and admsn_dt is the day you are admitted to a hospital, thru_dt is the day you are discharged/transferred. Outpatient only has a thru_dt because in outpatient setting you don't need to be admitted into the hospital to be treated. Imagine that you can be transferred from an outpatient setting (ER) to an inpatient setting, an inpatient setting to an outpatient setting (ER), an outpatient setting (ER) to an outpatient setting (ER), and an inpatient setting to an inpatient setting (ER). As a result, there are four types of transfer happens in the two dataset.
I want the filtered dataset (filtered_inpatient or filtered_outpatient) to be the origin and master datasets (master_inpatient and master_outpatient) to be the destination because a patient need to be satisfied with some diagnosis, and then what we care is where he/she transferred (the patient don't need to have that diagnosis at the destination) In Sum: The four transfer type is if outpatient --> inpatient: filtered_outpatient(ID, thru_dt)--> master_inpatient(ID, admsn_dt) if outpatient --> outpatient: filtered_outpatient(ID, thru_dt)-->master_outpatient(ID,thru_dt) if inpatient --> inpatient: filtered_inpatient(ID, thru_dt)-->master_inpatient(ID,admsn_dt) if inpatient --> outpatient: filtered_inpatient(ID, thru_dt)-->master_inpatient(ID,thru_dt)
What I'd like to do is to obtain this third dataset, if the prvdr_num (provider number) are different, and the difference in date is less than 1 day (0 or 1). transtype indicate type of transfer: from inpatient to outpatient is inpout, for example.
The final dataset should look something like this:
   df3
   ID   fromdate     todate     from_prvdr  to_prvdr    d     transtype
    124   2013-04-29   2013-04-29  C           A          0      inpout
    232   2013-07-31   2013-07-31  F           Q          0      inpinp
    331   2013-06-14   2013-06-14  G           D          0      outoutAnother thing is that, when matching within file, it's highly likely that you get something like this:
ID   fromdate     todate       from_prvdr    to_prvdr
1    3/30/2011    3/31/2011    43291         48329
1    3/31/2011    3/30/2011    48329         43291
OR 
ID   fromdate     todate       from_prvdr    to_prvdr
1    3/31/2011    3/31/2011    43291         48329
1    3/31/2011    3/31/2011    48329         43291
(In this latter case I can just exclude duplicate by date later in R, but I need to get rid of the first case)Here is what I tried (and succeeded).
#this is an example of outpatient--> inpatient
#all variables in master datasets have an i prefix
proc sort data= etl.master_inpatient;
    by iID iadmsn_dt; 
run;
proc sort data= etl.filtered_outpatient;
    by ID thru_dt; 
run;
data fnl.matchdate_inpinp;
   set etl.master_inpatient end = eof;
      do p = 1 to num;
         set etl.filtered_outpatient nobs = num point = p;
         if iID = ID then do;
            d = abs(iadmsn_dt-thru_dt);
            put iID = ID = iadmsn_dt = thru_dt= d =;
         if d <= 1 then output;
         end;
         else continue;
      end;
      put '===========================';
   if eof then stop;
run;There is no error in the code, but I have to do this seperately for four types of transfer, and merge them together in R later. I took me more than two days to finish running one year's data, I really want something more efficient as I have 8 year data.
Also, as I said, when matching within file, it is likely that we get some repetitive results (like described above), i really hope this can be solved.
 
                             
                     
                             
                     
                             
                             
                     
                            