Here are two sample datasets (they are all fake datasets) outpatient https://drive.google.com/open?id=179_L_qnZdKY5-EZnwy4BEzyEZP-VT-in inpatient https://drive.google.com/open?id=1vhUa_yTflLEXSR6xdZG_hOYxCPRnXVkw I did not distinguish between filtered/master here, but it's easy to create filters (any random filter that restrict the dataset to a smaller one will serve as an example, it will work even if you don't distinguish, just duplicate and rename one as master and another as filtered)
I have two datasets that look like this, study inter-hospital transfer My dataset is enormous. I am able to do the whole thing in SAS, but is very very very slow :((( I will show my code here, but I am seeking ways to improve running time.
master_inpatient
ID admsn_dt thru_dt prvdr_num
341 2013-04-01 2013-04-02 G
230 2013-06-01 2013-06-03 I
232 2013-07-31 2013-07-31 F
124 2013-04-29 2013-04-29 C
232 2013-07-31 2013-08-20 Q
filtered_inpatient
ID admsn_dt thru_dt prvdr_num
341 2013-04-01 2013-04-02 G
232 2013-07-31 2013-07-31 F
232 2013-07-31 2013-08-20 Q
master_outpatient
ID thru_dt prvdr_num
348 2013-09-23 Z
124 2013-04-29 A
331 2013-06-14 G
439 2013-02-01 B
331 2013-06-14 D
filtered_outpatient
ID thru_dt prvdr_num
124 2013-04-29 A
331 2013-06-14 G
439 2013-02-01 B
331 2013-06-14 D
I have two master datasets: an inpatient dataset and an outpatient dataset, and two filtered datasets: some filter on diganosis (e.g., including only patients with diagnosis of TB) is applied to the master dataset, make the dataset shorter than master dataset ID is patient ID, and admsn_dt is the day you are admitted to a hospital, thru_dt is the day you are discharged/transferred. Outpatient only has a thru_dt because in outpatient setting you don't need to be admitted into the hospital to be treated. Imagine that you can be transferred from an outpatient setting (ER) to an inpatient setting, an inpatient setting to an outpatient setting (ER), an outpatient setting (ER) to an outpatient setting (ER), and an inpatient setting to an inpatient setting (ER). As a result, there are four types of transfer happens in the two dataset.
I want the filtered dataset (filtered_inpatient or filtered_outpatient) to be the origin and master datasets (master_inpatient and master_outpatient) to be the destination because a patient need to be satisfied with some diagnosis, and then what we care is where he/she transferred (the patient don't need to have that diagnosis at the destination) In Sum: The four transfer type is if outpatient --> inpatient: filtered_outpatient(ID, thru_dt)--> master_inpatient(ID, admsn_dt) if outpatient --> outpatient: filtered_outpatient(ID, thru_dt)-->master_outpatient(ID,thru_dt) if inpatient --> inpatient: filtered_inpatient(ID, thru_dt)-->master_inpatient(ID,admsn_dt) if inpatient --> outpatient: filtered_inpatient(ID, thru_dt)-->master_inpatient(ID,thru_dt)
What I'd like to do is to obtain this third dataset, if the prvdr_num (provider number) are different, and the difference in date is less than 1 day (0 or 1). transtype indicate type of transfer: from inpatient to outpatient is inpout, for example.
The final dataset should look something like this:
df3
ID fromdate todate from_prvdr to_prvdr d transtype
124 2013-04-29 2013-04-29 C A 0 inpout
232 2013-07-31 2013-07-31 F Q 0 inpinp
331 2013-06-14 2013-06-14 G D 0 outout
Another thing is that, when matching within file, it's highly likely that you get something like this:
ID fromdate todate from_prvdr to_prvdr
1 3/30/2011 3/31/2011 43291 48329
1 3/31/2011 3/30/2011 48329 43291
OR
ID fromdate todate from_prvdr to_prvdr
1 3/31/2011 3/31/2011 43291 48329
1 3/31/2011 3/31/2011 48329 43291
(In this latter case I can just exclude duplicate by date later in R, but I need to get rid of the first case)
Here is what I tried (and succeeded).
#this is an example of outpatient--> inpatient
#all variables in master datasets have an i prefix
proc sort data= etl.master_inpatient;
by iID iadmsn_dt;
run;
proc sort data= etl.filtered_outpatient;
by ID thru_dt;
run;
data fnl.matchdate_inpinp;
set etl.master_inpatient end = eof;
do p = 1 to num;
set etl.filtered_outpatient nobs = num point = p;
if iID = ID then do;
d = abs(iadmsn_dt-thru_dt);
put iID = ID = iadmsn_dt = thru_dt= d =;
if d <= 1 then output;
end;
else continue;
end;
put '===========================';
if eof then stop;
run;
There is no error in the code, but I have to do this seperately for four types of transfer, and merge them together in R later. I took me more than two days to finish running one year's data, I really want something more efficient as I have 8 year data.
Also, as I said, when matching within file, it is likely that we get some repetitive results (like described above), i really hope this can be solved.