Hello everyone, I have two data files that include patients' information on (1) the dose and duration of medications they take, and (2) whether or not they have had hospital readmissions, respectively. Here are parts of these data files (extracted for just one patient): data drug;
input dose_drug1 dose_drug2 date_start date_stop;
datalines;
7.5 5 01/17/2008 03/16/2008
22.5 3 03/20/2008 04/01/2008
30 8.5 08/07/2008 08/22/2008
15 5 11/23/2008 12/22/2008
; data readmission;
input readm_number date;
datalines;
1 02/20/2008
2 05/03/2008
3 07/10/2008
4 09/28/2008
5 11/09/2008
6 12/06/2008
; Now, I want to create another data file with 4 variables: (1) dose of drug 1, (2) dose of drug 2, (3) duration of supply, and (4) an indicator variable that takes binary (i.e., 0/1) values, where 1 means that using drugs has an impact on the hospital readmission within a time window (e.g., 30 days, 60 days, etc.). For example, for the case of 30-day time window, here is what I want: data want30;
input dose_drug1 dose_drug2 duration indicator;
datalines;
7.5 5 30 1 * note a (see below for notes)
7.5 5 26 0 * note b
22.5 3 11 0 * note c
30 8.5 15 0 * similar to note c
15 5 13 1 * similar to note a
15 5 16 0 * similar to note b
; Regarding the code above, please take the following points into account: note a: the date_start is 01/17, and readmission #1 occurs on 02/20. The difference is more than 30 days, so the duration becomes 30. note b: since readmission #1 occurs on 02/20, duration will be equal to the remaining period until date_stop (03/16-02/20=26 days). Also, the date_stop is 03/16, but the next readmission (#2) is on 05/03 (the difference is beyond 30 days). That's why indicator is 0. note c: there is no readmission within 30 days of the date_stop. To further clarify, here is the case for 60-day time window: data want60;
input dose_drug1 dose_drug2 duration indicator;
datalines;
7.5 5 33 1 * note a (see below for notes)
15 4 23 1 * note b
30 8.5 15 1 * note c
15 5 13 1 * note d
15 5 16 0 * note e
; note a: indicator 1 is because of readmission #1 on 02/20 note b: indicator 1 is because of readmission #2 on 05/03. Also, readmission #2 occurs on 05/03. Because of the 60-day time window, there will be an overlap with the first drug supply in the "drug" data for 03/16-(05/03-60days)=11days. Also, from the second drug supply, we have 11 days. So, I obtain 15 as (11*7.5+11*22.5)/(11+11). Similarly, we have 4 as (11*5+11*3)/(11+11). note c: indicator 1 is because of readmission #4 on 09/28 note d: indicator 1 is because of readmission #6 on 12/06 note e: indicator 0 is because there won't be any other readmission within 60 days of the date_stop for the last drug supply I know that the procedure explained above is not very straightforward. That's why I provided as much explanations as I could. Nevertheless, please let me know if you need more information. Thank you very much for any thoughts/ideas.
... View more