I'm working on dispensed medication data. The variables in this data set are sold date, daily supply, daily dose, calculated medication end date, and cluster means the dispense was continuous without gap being >=15 days. If the last medication ending date was > 15 days apart from another opioids medication dispense date, the new cluster started.
solddt | rxsup | dose | med_end_dt | cluster |
09/10/2009 | 16 | 200 | 9/26/2009 | 1 |
10/02/2009 | 2 | 45 | 10/4/2009 | 1 |
10/06/2009 | 66 | 90.91 | 12/11/2009 | 1 |
10/12/2009 | 16 | 200 | 10/28/2009 | 1 |
10/21/2009 | 12 | 62.5 | 11/2/2009 | 1 |
11/03/2009 | 33 | 181.82 | 12/6/2009 | 1 |
11/12/2009 | 2 | 45 | 11/14/2009 | 1 |
11/12/2009 | 16 | 200 | 11/28/2009 | 1 |
12/04/2009 | 33 | 181.82 | 1/6/2010 | 1 |
12/14/2009 | 25 | 128 | 1/8/2010 | 1 |
12/17/2009 | 2 | 45 | 12/19/2009 | 1 |
01/15/2010 | 5 | 45 | 1/20/2010 | 2 |
02//18/2010 | 25 | 90 | 3/15/2010 | 3 |
I'd like to separate the overlapping records into new observations to get a data set that looks like
t0 | dose | t1 | New daily dose | Cluster |
10-Sep-09 | 200 | 9/26/2009 | 200 | 1 |
2-Oct-09 | 45 | 10/4/2009 | 45 | 1 |
6-Oct-09 | 90.91 | 10/11/2009 | 90.91 | 1 |
12-Oct-09 | 200 | 10/20/2009 | 290.91 | 1 |
21-Oct-09 | 62.5 | 10/28/2009 | 353.41 | 1 |
10/29/2009 | 11/2/2009 | 153.41 | 1 | |
3-Nov-09 | 181.82 | 11/11/2009 | 272.73 | 1 |
12-Nov-09 | 200 | 11/14/2009 | 335.91 | 1 |
11/15/2009 | 16 | 11/28/2009 | 290.91 | 1 |
11/29/2020 | 12/3/2009 | 272.73 | 1 | |
4-Dec-09 | 181.82 | 12/6/2009 | 454.55 | 1 |
12/7/2009 | 12/11/2009 | 272.73 | 1 | |
12/12/2009 | 12/13/2009 | 181.82 | 1 | |
14-Dec-09 | 25 | 12/16/2009 | 309.82 | 1 |
17-Dec-09 | 45 | 12/19/2009 | 354.82 | 1 |
20-Dec-09 | 1/6/2010 | 309.82 | 1 | |
1/7/2010 | 1/8/2010 | 128 | 1 | |
15-Jan-10 | 1/20/2010 | 45 | 2 | |
18-Feb-10 | 3/12/2010 | 90 | 3 |
I was able to identify overlapping records using the lag function, but am not sure how to proceed.
Please provide your input data as a SAS data step, and not as a screen capture. See here for instructions.
data test;
infile datalines delimiter = ',';
input solddt: date9. rxsup dose med_end_dt: date9. cluster ;
format solddt yymmdd10. med_end_dt yymmdd10.;
datalines;
10sep2009, 16, 200, 26sep2009, 1
02oct2009, 2, 45, 04oct2009, 1
06oct2009, 66, 90.91, 11dec2009, 1
12oct2009, 16, 200, 28oct2009, 1
21oct2009, 12, 62.5, 02nov2009, 1
03nov2009, 33, 181.82, 06dec2009, 1
12nov2009, 2, 45, 14nov2009, 1
12nov2009, 16, 200, 28nov2009, 1
04dec2009, 33, 181.82, 06jan2010, 1
14dec2009, 25, 128, 08jan2010, 1
17dec2009, 2, 45, 09dec2009, 1
15jan2010, 5, 45, 20jan2010, 2
18feb2010, 25, 90, 15mar2010, 3
;
run;
The input data is provided above. Thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.