Hi
Data set:
I have a data set with variables as
ID agent_id drug_code beg_date end_date effective_run_date
I want to combine the date ranges if rows have same id, agent_id,drug_code without any gaps between beg_date and end_dates of next rows with same data variables with bringing the max effective run date.
Scenario 1:
ID agent_id drug_code beg_date end_date effective_run_date 1 A1 100 1/1/2018 12/31/2018 12/1/2018
1 A1 100 1/10/2018 3/31/2019 3/1/2019
1 A2 100 1/1/2018 5/31/2018 4/1/2018
1 A2 100 2/1/2018 3/31/2018 3/1/2018
1 A2 200 4/1/2018 12/31/2018 12/1/2018
1 A2 200 1/2/2019 5/31/2019 5/1/2019
Expected data for scenario 1:
ID agent_id drug_code beg_date end_date effective_run_date 1 A1 100 1/1/2018 3/31/2019 3/1/2019
1 A2 100 1/1/2018 5/31/2018 4/1/2018
1 A2 200 4/1/2018 12/31/2018 12/1/2018
1 A2 200 1/2/2019 5/31/2019 5/1/2019
which I achieved successfully for scenario 1
Scenario 2:
From here the scenarios are tricky since we have same ID,Agent_ID can have enrolled for different drug_codes and we want to keep all the unique records based on the latest effective run dates:
Scenarios 2.1:
ID agent_id drug_code beg_date end_date effective_run_date 1 A3 100 1/1/2018 3/31/2019 3/1/2019
1 A3 200 1/1/2018 3/31/2019 3/31/2019
In the above case we have two different drug_codes but with same beg_date and end_date but different effective date. In this case I want to remove the first record since the effective date is not the latest.
Expected data for scenario 2.1:
ID agent_id drug_code beg_date end_date effective_run_date 1 A3 200 1/1/2018 3/31/2019 3/31/2019
Scenarios 2.2:
ID agent_id drug_code beg_date end_date effective_run_date 1 A4 301 1/1/2018 3/31/2019 3/1/2019
1 A4 302 2/1/2018 3/31/2019 3/31/2019
In the above case we have same id and different drug_code and also the beg_dates are not same, in this scenario I do not want to delete first record since it has one month which is unique. I want to change the end_date of the first record (next record beg_date -1 ) when the effective_run_date is not latest.
Expected data for scenario 2.2:
ID agent_id drug_code beg_date end_date effective_run_date 1 A4 301 1/1/2018 1/31/2018 3/1/2019
1 A4 302 2/1/2018 3/31/2019 3/31/2019
Scenarios 2.3:
ID agent_id drug_code beg_date end_date effective_run_date 1 A5 450 2/1/2018 5/31/2019 2/1/2019
1 A5 459 11/1/2018 5/31/2019 2/1/2019
In the above case, as we have two drug_codes with different date ranges but same effective_run_date then I want to have my date range of first record to split based on the overlap date range.
Expected data for scenario 2.3:
ID agent_id drug_code beg_date end_date effective_run_date 1 A5 459 2/1/2018 10/31/2018 3/31/2019
1 A5 459 11/1/2018 3/31/2019 3/31/2019
1 A5 450 11/1/2018 3/31/2019 3/31/2019
Actual data have more than two drug_codes and millions of records in the source data set. I couldn't achieve the scenario 2.1,2.2,2.3. Any help will be greatly appreciable.
Hi @sri21592
here is my attempt to handle the different scenarios:
could you please provide more sample data for scenarios 2.2 and 2.3 as the ones you provided do not reflect the use cases. Thank you!
/* Scenario 1 */
data have;
input ID agent_id $ drug_code beg_date:MMDDYY10. end_date:MMDDYY10. effective_run_date:MMDDYY10.;
format beg_date end_date effective_run_date MMDDYY10.;
datalines;
1 A1 100 1/1/2018 12/31/2018 12/1/2018
1 A1 100 1/10/2018 3/31/2019 3/1/2019
1 A2 100 1/1/2018 5/31/2018 4/1/2018
1 A2 100 2/1/2018 3/31/2018 3/1/2018
1 A2 200 4/1/2018 12/31/2018 12/1/2018
1 A2 200 1/2/2019 5/31/2019 5/1/2019
1 A3 100 1/1/2018 3/31/2019 3/14/2019
1 A3 200 1/1/2018 3/31/2019 3/15/2019
;
run;
proc sort data=have;
by ID agent_id drug_code beg_date;
run;
data have_case1;
set have;
by ID agent_id drug_code;
format _lag MMDDYY10.;
_lag = lag(end_date);
if first.drug_code then _lag=0;
if _lag < beg_date then flag + 1; /* put _lag + 1 if you want to consider 2 consecutive days as no discontinuation */
drop _lag;
run;
proc sql;
create table handle_case1 as
select ID, agent_id, drug_code,
min(beg_date) as beg_date format = MMDDYY10.,
max(end_date) as end_date format = MMDDYY10.,
max(effective_run_date) as effective_run_date format = MMDDYY10.
from have_case1
group by ID, agent_id, drug_code, flag;
quit;
/* Scenario 2 */
/* Scenario 2.1 */
proc sql;
create table handle_case2_1 as
select *
from handle_case1
group by ID, agent_id, beg_date, end_date
having effective_run_date = max(effective_run_date);
quit;
/* Scenario 2.2 */
/* same id */
/* different drug_code */
/* different beg_dates */
/* Need sample data + desired output*/
/* Scenario 2.3 */
/* same id */
/* different drug_code */
/* different beg_dates and end_dates */
/* same effective_run_date */
/* Need sample data + desired output*/
Hi @ed_sas_member,
Thanks for the quick response and trying to help me. Greatly appreciated. Here is the actual data set for combined scenarios 2.1,2.2 and 2.3 together.
Data set for scenarios - 2.1,2.2 and 2.3:
ID | agent_id | drug_code | beg_date | end_date | effective_run_date |
11 | 450 | BC | 11/1/2016 | 12/31/2016 | 11/2/2016 |
11 | 450 | BD | 11/1/2016 | 12/31/2016 | 12/2/2016 |
1 | 580 | A1 | 1/1/2014 | 5/31/2018 | 8/29/2014 |
1 | 580 | 586 | 11/1/2017 | 11/30/2017 | 5/7/2018 |
1 | 580 | 567 | 12/1/2017 | 10/31/2018 | 5/7/2018 |
3 | 757 | O1 | 8/1/2017 | 1/31/2019 | 6/26/2017 |
3 | 757 | P1 | 8/1/2018 | 8/31/2018 | 8/3/2018 |
6 | 999 | P1 | 9/1/2015 | 11/30/2016 | 10/20/2015 |
6 | 999 | E1 | 10/1/2015 | 11/30/2015 | 10/20/2015 |
8 | 686 | AL | 1/1/2016 | 8/31/2016 | 1/4/2016 |
8 | 686 | BL | 6/1/2016 | 6/30/2016 | 1/23/2017 |
8 | 686 | 779 | 8/1/2016 | 8/31/2016 | 1/26/2017 |
8 | 686 | 775 | 8/1/2016 | 1/31/2017 | 1/12/2017 |
8 | 686 | 779 | 10/1/2016 | 1/31/2017 | 1/12/2017 |
9 | 99 | BA | 11/1/2015 | 12/31/2015 | 11/30/2015 |
9 | 99 | CA | 12/1/2015 | 12/31/2015 | 11/30/2015 |
10 | 699 | F1 | 3/1/2015 | 12/31/2015 | 11/29/2016 |
10 | 699 | BB | 11/1/2015 | 3/31/2016 | 11/7/2016 |
Expected results: (I included comments and scenario type)
ID | agent_id | drug_code | beg_date | end_date | effective_run_date | comments | Scenario |
11 | 450 | BD | 11/1/2016 | 12/31/2016 | 12/2/2016 | I don’t want the first record since both records date ranges are same but effective date should be latest | Scenario 2.1 |
1 | 580 | A1 | 1/1/2014 | 10/31/2017 | 8/29/2014 | (Changed the end date since the effective date is not latest compared to the other dates and I am not missing any unique months by changing the date). | Scenario 2.2 |
1 | 580 | 586 | 11/1/2017 | 11/30/2017 | 5/7/2018 | ||
1 | 580 | 567 | 12/1/2017 | 10/31/2018 | 5/7/2018 | ||
3 | 757 | O1 | 8/1/2017 | 7/31/2018 | 6/26/2017 | I split the first record into two records since there is other reecord with one month with latest effective date. I am not losing any unique months again. | Scenario 2.2 |
3 | 757 | O1 | 9/1/2018 | 1/31/2019 | 6/26/2017 | ||
3 | 757 | P1 | 8/1/2018 | 8/31/2018 | 8/3/2018 | ||
6 | 999 | P1 | 9/1/2015 | 9/30/2015 | 10/20/2015 | Here, when effective dates are same, I want to keep the overlap date ranges and split the non overlap months into three different records. | Scenario 2.3 |
6 | 999 | P1 | 10/1/2015 | 11/30/2015 | 10/20/2015 | ||
6 | 999 | P1 | 12/1/2015 | 11/30/2016 | 10/20/2015 | ||
6 | 999 | E1 | 10/1/2015 | 11/30/2015 | 10/20/2015 | ||
8 | 686 | AL | 1/1/2016 | 5/31/2016 | 1/4/2016 | you can this case is three scenarios combines, first looking if they have same date ranges and removing the non latest effective date, changing the end dates based on the effective run dates keeping all the month date ranges and finally scenario 2.3 splitting the date ranges when they have same effective date ranges. | combined scenarios 2.1,2.2 and 2.3 |
8 | 686 | AL | 7/1/2016 | 7/31/2016 | 1/4/2016 | ||
8 | 686 | BL | 6/1/2016 | 6/30/2016 | 1/23/2017 | ||
8 | 686 | 779 | 8/1/2016 | 8/31/2016 | 1/26/2017 | ||
8 | 686 | 775 | 9/1/2016 | 9/30/2017 | 1/12/2017 | ||
8 | 686 | 775 | 10/1/2016 | 1/31/2017 | 1/12/2017 | ||
8 | 686 | 779 | 10/1/2016 | 1/31/2017 | 1/12/2017 | ||
9 | 99 | BA | 11/1/2015 | 11/30/2015 | 11/30/2015 | I am just splitting the first record into two date ranges since there is overlap and splitting non overlap date ranges since both have same effective run dates. | Scenario 2.3 |
9 | 99 | BA | 12/1/2015 | 12/31/2015 | 11/30/2015 | ||
9 | 99 | CA | 12/1/2015 | 12/31/2015 | 11/30/2015 | ||
10 | 699 | F1 | 3/1/2015 | 12/31/2015 | 11/29/2016 | Here, second record date range is overlapped but since the effective run date is not latest, I changed it after the end date of the first record. | Scenario 2.2 |
10 | 699 | BB | 1/1/2016 | 3/31/2016 | 11/7/2016 |
Hi @sri21592
Thank you very much for providing those example. Very helpful +++
Just a question: in case of overlapping dates (scenario 2.2 and 2.3), I don't understand why you want to loose information concerning drug intake :
e.g. for patient 699 -> according to your logic, you will lose the drug intake of drug BB between nov 2015 and dec 2015.
why don't you split the dates but keep information such a F1, F1 + BB, F1 with the associated period for each combination.
Input
10 | 699 | F1 | 3/1/2015 | 12/31/2015 | 11/29/2016 |
10 | 699 | BB | 11/1/2015 | 3/31/2016 | 11/7/2016 |
Output:
10 | 699 | F1 | 3/1/2015 | 12/31/2015 | 11/29/2016 | Here, second record date range is overlapped but since the effective run date is not latest, I changed it after the end date of the first record. | Scenario 2.2 |
10 | 699 | BB | 1/1/2016 | 3/31/2016 | 11/7/2016 |
Hi @ed_sas_member ,
I am not actually losing any information since that is a duplicate entry. They might have entered multiple entries for the different drug codes. They initially might have a drug code 'A' program but in between they might have shifted to new drug_code 'B' for few months and they might come back again to Drug code 'A' program. Effective run date tells us which is the latest entry in the system since those are updated after 6 months to 1 year of actual program enrollment dates.
e.g. for patient 699 -> I will lose the drug intake of drug BB between nov 2015 and dec 2015, because same person cannot have more than more than one drug at same span. Nov and Dec 2015 are captured by the first record with latest entry date which means that the first record is true for nov and dec 15.
you might have question on how can two drug codes can be at same effective date. (Current system captures the date but not the timestamp for the entries, it might change in future but we have very less cases of same effective dates compared to millions of overlap date ranges).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.