BookmarkSubscribeRSS Feed
sri21592
Fluorite | Level 6

Hi

Data set:
I have a data set with variables as
ID agent_id drug_code beg_date end_date effective_run_date

I want to combine the date ranges if rows have same id, agent_id,drug_code without any gaps between beg_date and end_dates of next rows with same data variables with bringing the max effective run date.

Scenario 1:                                                 
ID   agent_id drug_code beg_date    end_date    effective_run_date                                      1   A1  100 1/1/2018    12/31/2018  12/1/2018                               
1   A1  100 1/10/2018   3/31/2019   3/1/2019                                
1   A2  100 1/1/2018    5/31/2018   4/1/2018                                
1   A2  100 2/1/2018    3/31/2018   3/1/2018                                
1   A2  200 4/1/2018    12/31/2018  12/1/2018                               
1   A2  200 1/2/2019    5/31/2019   5/1/2019  

Expected data for scenario 1:

ID  agent_id  drug_code beg_date    end_date    effective_run_date                                      1   A1  100 1/1/2018    3/31/2019   3/1/2019                                
1   A2  100 1/1/2018    5/31/2018   4/1/2018                                
1   A2  200 4/1/2018    12/31/2018  12/1/2018                               
1   A2  200 1/2/2019    5/31/2019   5/1/2019    

which I achieved successfully for scenario 1

Scenario 2:
From here the scenarios are tricky since we have same ID,Agent_ID can have enrolled for different drug_codes and we want to keep all the unique records based on the latest effective run dates:

Scenarios 2.1:

ID  agent_id drug_code  beg_date    end_date    effective_run_date                                      1   A3  100 1/1/2018    3/31/2019   3/1/2019                                
1   A3  200 1/1/2018    3/31/2019   3/31/2019   

In the above case we have two different drug_codes but with same beg_date and end_date but different effective date. In this case I want to remove the first record since the effective date is not the latest.

Expected data for scenario 2.1:

ID  agent_id drug_code  beg_date    end_date    effective_run_date                                      1   A3  200 1/1/2018    3/31/2019   3/31/2019   

Scenarios 2.2:

ID agent_id  drug_code  beg_date    end_date    effective_run_date                                      1   A4  301 1/1/2018    3/31/2019   3/1/2019                                
1   A4  302 2/1/2018    3/31/2019   3/31/2019   

In the above case we have same id and different drug_code and also the beg_dates are not same, in this scenario I do not want to delete first record since it has one month which is unique. I want to change the end_date of the first record (next record beg_date -1 ) when the effective_run_date is not latest.

Expected data for scenario 2.2:

ID agent_id  drug_code  beg_date    end_date    effective_run_date                                      1   A4  301 1/1/2018    1/31/2018   3/1/2019                                
1   A4  302 2/1/2018    3/31/2019   3/31/2019   

Scenarios 2.3:

ID agent_id  drug_code  beg_date    end_date    effective_run_date                                      1   A5  450 2/1/2018    5/31/2019   2/1/2019                                
1   A5  459 11/1/2018   5/31/2019   2/1/2019    

In the above case, as we have two drug_codes with different date ranges but same effective_run_date then I want to have my date range of first record to split based on the overlap date range.

Expected data for scenario 2.3:

ID agent_id  drug_code  beg_date    end_date    effective_run_date                                      1   A5  459 2/1/2018    10/31/2018  3/31/2019                               
1   A5  459 11/1/2018   3/31/2019   3/31/2019                               
1   A5  450 11/1/2018   3/31/2019   3/31/2019   

Actual data have more than two drug_codes and millions of records in the source data set. I couldn't achieve the scenario 2.1,2.2,2.3.  Any help will be greatly appreciable.

5 REPLIES 5
sri21592
Fluorite | Level 6
code for Scenario 1:

Code for scenario 1: (which I achieved exactly as I wanted)
data WORK.new_end_dates_v1 (drop=id2 drug_code2
rename=(beg2=beg_dos
end2=end_dos
edbc2=edbc_dos));
set WORK.sort;
retain beg2 end2 edbc2;
agent_id2=lag1(agent_id);
id2=lag1(id);
drug_code2=lag1(drug_code);

if agent_id2=agent_id and id2=id and drug_code2=drug_code and beg_date le(end2+1) then
do;
beg2=min(beg_date,beg2);
end2=max(end_date,end2);
edbc2=max(edbc_run_date,edbc2);
end;
else
do;
seg+1;
beg2=beg_date;
end2=end_date;
edbc2=edbc_run_date;
end;

format beg2 end2 edbc_run_date mmddyy10.;
run;

data WORK.new_end_dates_v2(drop=agent_id2 seg beg_dos end_dos edbc_dos);
retain beg_date end_date edbc_run_date;
set WORK.new_end_dates_v1;
by agent_id id seg;
format beg_date end_date mmddyy10.;

if first.seg then
do;
end;
if last.seg then
do;
beg_date = beg_dos;
end_date = end_dos;
edbc_run_date = edbc_dos;
output;
end;
run;
ed_sas_member
Meteorite | Level 14

Hi @sri21592 

 

here is my attempt to handle the different scenarios:

could you please provide more sample data for scenarios 2.2 and 2.3 as the ones you provided do not reflect the use cases. Thank you!

 

/* Scenario 1 */

data have;
	input ID agent_id $ drug_code beg_date:MMDDYY10. end_date:MMDDYY10. effective_run_date:MMDDYY10.;
	format beg_date end_date effective_run_date MMDDYY10.;
	datalines;
1 A1 100 1/1/2018 12/31/2018 12/1/2018
1 A1 100 1/10/2018 3/31/2019 3/1/2019
1 A2 100 1/1/2018 5/31/2018 4/1/2018
1 A2 100 2/1/2018 3/31/2018 3/1/2018
1 A2 200 4/1/2018 12/31/2018 12/1/2018
1 A2 200 1/2/2019 5/31/2019 5/1/2019
1 A3 100 1/1/2018 3/31/2019 3/14/2019
1 A3 200 1/1/2018 3/31/2019 3/15/2019
;
run;

proc sort data=have;
	by ID agent_id drug_code beg_date;
run;

data have_case1;
	set have;
	by ID agent_id drug_code;
	format _lag MMDDYY10.;
	_lag = lag(end_date);
	if first.drug_code then _lag=0;
	if _lag < beg_date then flag + 1; /* put _lag + 1 if you want to consider 2 consecutive days as no discontinuation */
	drop _lag;
run;

proc sql;
	create table handle_case1 as
	select ID, agent_id, drug_code,
		   min(beg_date) as beg_date format = MMDDYY10.,
		   max(end_date) as end_date format = MMDDYY10.,
		   max(effective_run_date) as effective_run_date format = MMDDYY10.
	from have_case1
	group by ID, agent_id, drug_code, flag;
quit;

/* Scenario 2 */

	/* Scenario 2.1 */

proc sql;
	create table handle_case2_1 as
	select *
	from handle_case1
	group by ID, agent_id, beg_date, end_date
	having effective_run_date = max(effective_run_date);
quit;

	/* Scenario 2.2 */
	/* same id */
	/* different drug_code */
	/* different beg_dates */

		/* Need sample data + desired output*/
		
	
	/* Scenario 2.3 */
	/* same id */
	/* different drug_code */
	/* different beg_dates and end_dates */
	/* same effective_run_date */

		/* Need sample data + desired output*/

 

 

sri21592
Fluorite | Level 6

Hi @ed_sas_member,

 

Thanks for the quick response and trying to help me. Greatly appreciated. Here is the actual data set for combined scenarios 2.1,2.2 and 2.3 together.

 

Data set for scenarios - 2.1,2.2 and 2.3:

 

IDagent_iddrug_codebeg_dateend_dateeffective_run_date
11450BC11/1/201612/31/201611/2/2016
11450BD11/1/201612/31/201612/2/2016
1580A11/1/20145/31/20188/29/2014
158058611/1/201711/30/20175/7/2018
158056712/1/201710/31/20185/7/2018
3757O18/1/20171/31/20196/26/2017
3757P18/1/20188/31/20188/3/2018
6999P19/1/201511/30/201610/20/2015
6999E110/1/201511/30/201510/20/2015
8686AL1/1/20168/31/20161/4/2016
8686BL6/1/20166/30/20161/23/2017
86867798/1/20168/31/20161/26/2017
86867758/1/20161/31/20171/12/2017
868677910/1/20161/31/20171/12/2017
999BA11/1/201512/31/201511/30/2015
999CA12/1/201512/31/201511/30/2015
10699F13/1/201512/31/201511/29/2016
10699BB11/1/20153/31/201611/7/2016

 

 

Expected results: (I included comments and scenario type)

IDagent_iddrug_codebeg_dateend_dateeffective_run_datecommentsScenario
11450BD11/1/201612/31/201612/2/2016I don’t want the first record since both records date ranges are same but effective date should be latestScenario 2.1
1580A11/1/201410/31/20178/29/2014(Changed the end date since the effective date is not latest compared to the other dates and I am not missing any unique months by changing the date).Scenario 2.2
158058611/1/201711/30/20175/7/2018
158056712/1/201710/31/20185/7/2018
3757O18/1/20177/31/20186/26/2017I split the first record into two records since there is other reecord with one month with latest effective date. I am not losing any unique months again.Scenario 2.2
3757O19/1/20181/31/20196/26/2017
3757P18/1/20188/31/20188/3/2018
6999P19/1/20159/30/201510/20/2015Here, when effective dates are same, I want to keep the overlap date ranges and split the non overlap months into three different records.Scenario 2.3
6999P110/1/201511/30/201510/20/2015
6999P112/1/201511/30/201610/20/2015
6999E110/1/201511/30/201510/20/2015
8686AL1/1/20165/31/20161/4/2016you can this case is three scenarios combines, first looking if they have same date ranges and removing the non latest effective date, changing the end dates based on the effective run dates keeping all the month date ranges and finally scenario 2.3 splitting the date ranges when they have same effective date ranges.combined scenarios 2.1,2.2 and 2.3
8686AL7/1/20167/31/20161/4/2016
8686BL6/1/20166/30/20161/23/2017
86867798/1/20168/31/20161/26/2017
86867759/1/20169/30/20171/12/2017
868677510/1/20161/31/20171/12/2017
868677910/1/20161/31/20171/12/2017
999BA11/1/201511/30/201511/30/2015I am just splitting the first record into two date ranges since there is overlap and splitting non overlap date ranges since both have same effective run dates.Scenario 2.3
999BA12/1/201512/31/201511/30/2015
999CA12/1/201512/31/201511/30/2015
10699F13/1/201512/31/201511/29/2016Here, second record date range is overlapped but since the effective run date is not latest, I changed it after the end date of the first record.Scenario 2.2
10699BB1/1/20163/31/201611/7/2016
ed_sas_member
Meteorite | Level 14

Hi @sri21592 

 

Thank you very much for providing those example. Very helpful +++

Just a question: in case of overlapping dates (scenario 2.2 and 2.3), I don't understand why you want to loose information concerning drug intake :

e.g. for patient 699 -> according to your logic, you will lose the drug intake of drug BB between  nov 2015 and dec 2015.

why don't you split the dates but keep information such a F1, F1 + BB, F1 with the associated period for each combination.

 

Input

10 699 F1 3/1/2015 12/31/2015 11/29/2016
10 699 BB 11/1/2015 3/31/2016 11/7/2016

 

Output:

10 699 F1 3/1/2015 12/31/2015 11/29/2016 Here, second record date range is overlapped but since the effective run date is not latest, I changed it after the end date of the first record. Scenario 2.2
10 699 BB 1/1/2016 3/31/2016 11/7/2016

 

sri21592
Fluorite | Level 6

Hi @ed_sas_member ,

 

I am not actually losing any information since that is a duplicate entry. They might have entered multiple entries for the different drug codes. They initially might have a drug code 'A' program but in between they might have shifted to new drug_code 'B' for few months and they might come back again to Drug code 'A' program. Effective run date tells us which is the latest entry in the system since those are updated after 6 months to 1 year of actual program enrollment dates.

 

e.g. for patient 699 -> I will lose the drug intake of drug BB between  nov 2015 and dec 2015, because same person cannot have more than more than one drug at same span. Nov and Dec 2015 are captured by the first record with latest entry date which means that the first record is true for nov and dec 15.

 

you might have question on how can two drug codes can be at same effective date. (Current system captures the  date but not the timestamp for the entries, it might change in future but we have very less cases of same effective dates compared to millions of overlap date ranges).

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1060 views
  • 0 likes
  • 2 in conversation