Re: Manage Date Conflicts - Overlap

radhikaa4 · Posted 03-03-2020 10:25 AM

I have the following dataset tthat I need to pull out or break overlap date/time

HAVE:

id	start_date_time	stop_date_time
1	3/5/2020 0:00	3/5/2020 23:12
1	3/5/2020 9:20	3/5/2020 9:45

WANT:

id	start_date_time	stop_date_time
1	3/5/2020 0:00	3/5/2020 9:20
1	3/5/2020 9:20	3/5/2020 9:45
1	3/5/2020 9:45	3/5/2020 23:12

I am not sure how to do this and I have been using this as a source but after writing the code, it didn't work. Any help would be greately appreciated! http://support.sas.com/resources/papers/proceedings09/079-2009.pdf

yabwon · Posted 03-03-2020 10:49 AM

Hi @radhikaa4 ,

maybe like that (mind the example is on dates not timestamps, but you can adjust it):

data have;
input Start : date9. End : date9.;
format Start mmddyy10. End mmddyy10.;
cards;
30JUN2017	30JUN2018
01JAN2018	01JAN2019
30JUN2018	30JUN2019
01JAN2019	01JAN2020
;
run;

data have2;
set
  have(keep =Start rename=(Start=date))
  have(keep =End   rename=(End  =date))
;
run;

proc sort data = have2;
  by Date;
run;

data want;
  set have2;
  lag_date = lag(date);
  if _N_ > 1;
  start = date;
  end = lag_date;
  format Start mmddyy10. End mmddyy10.;
  keep start end;
run;

proc print;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

radhikaa4 · Posted 03-03-2020 11:08 AM

Hi @yabwon . Thanks. How do I include ID in it? I tried keep= subjectId

yabwon · Posted 03-03-2020 12:11 PM

Hi @radhikaa4 ,

Sorry, I missed the ID, try this:

data have;
input ID Start : date9. End : date9.;
format Start mmddyy10. End mmddyy10.;
cards;
1 30JUN2017	30JUN2018
1 01JAN2018	01JAN2019
1 30JUN2018	30JUN2019
1 01JAN2019	01JAN2020
2 30JUN2017	30JUN2018
2 01JAN2018	01JAN2019
3 30JUN2018	30JUN2019
3 01JAN2019	01JAN2020
;
run;

data have2;
set
  have(keep =Start ID rename=(Start=date))
  have(keep =End   ID rename=(End  =date))
;
run;

proc sort data = have2;
  by ID Date;
run;

data want;
  set have2;
  by ID;
  lag_date = lag(date);
  if not first.ID ;
  start = date;
  end = lag_date;
  format Start mmddyy10. End mmddyy10.;
  keep id start end;
run;

proc print;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ksharp · Posted 03-05-2020 09:20 AM

data have;
infile cards expandtabs;
input id	start_date_time	& anydtdtm. stop_date_time & anydtdtm.;
format start_date_time stop_date_time mdyampm.;
cards;
1	3/5/2020 0:00	 3/5/2020 23:12
1	3/5/2020 9:20	 3/5/2020 9:45
;
data temp;
 set have;
 n+1;
 do temp=start_date_time to stop_date_time;
  output;
 end;
 drop start_date_time  stop_date_time;
run;
proc sort data=temp;
by id temp;
run;
data temp1;
do until(last.temp);
 set temp;
 by  id temp;
 length x $ 40;
 x=cats(x,n);
end;
run;
data want;
 set temp1;
 by id x notsorted;
 retain start;
 if first.x then start=temp;
 if last.x then do;end=temp;output;end;
 format start end mdyampm.;
 drop n x temp;
run;

joreoh · Posted 04-22-2020 06:33 PM

I'm working on a medication data set with overlapped period and daily dose for each record

data test;
	infile datalines delimiter = ',';
	input solddt: date9. rxsup  dose  med_end_dt: date9. cluster ;
	format solddt yymmdd10. med_end_dt yymmdd10.;
	datalines;
	10sep2009, 16, 200, 26sep2009, 1
	02oct2009, 2, 45, 04oct2009, 1
	06oct2009, 66, 90.91, 11dec2009, 1
	12oct2009, 16, 200, 28oct2009, 1
	21oct2009, 12, 62.5, 02nov2009, 1
	03nov2009, 33, 181.82, 06dec2009, 1
	12nov2009, 2, 45, 14nov2009, 1
	12nov2009, 16, 200, 28nov2009, 1
	04dec2009, 33, 181.82, 06jan2010, 1
	14dec2009, 25, 128, 08jan2010, 1
	17dec2009, 2, 45, 09dec2009, 1
	15jan2010, 5, 45, 20jan2010, 2
	18feb2010, 25, 90, 15mar2010, 3
	;
run;

How would you implement a new variable newdailydose which is the sum of daily dose during the overlapped periods after the records are separated?

Also, please explain how x is concatenated with n and x in the do loop in your code

yabwon · Posted 04-23-2020 07:22 AM

Hi,

do you mean something like this:

data test2;
  set test;
  do date = solddt to med_end_dt;
    output;
  end;
  keep date dose cluster;
run;
proc sql;
  create table test3 as
  select cluster, date format yymmdd10., sum(dose) as sumDose
  from test2
  group by cluster, date
  order by cluster, date
  ;
run;
data test4;
  do until(last.sumDose);
    set test3;
    by cluster sumDose notsorted;
    if first.sumDose then solddt = date;
    if last.sumDose then
      do;
        med_end_dt = date;
        dose = sumDose; 
      end;
  end;
  drop sumDose date;
  format solddt yymmdd10. med_end_dt yymmdd10.;
run;
proc print;
run;

?

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ksharp · Posted 04-23-2020 07:28 AM

OK. Plz post the output you are looking for in order to better understand your question.

Where there was overlapped date , one date or obs correspond to many obs after separated it one by one

,like

cluster n temp

1 1 21oct2020

1 2 21oct2020

here N represent the row number(1 is the first row ,2 is the second row)

" please explain how x is concatenated with n and x in the do loop in your code"

do until(last.temp);
set temp;
by cluster temp;
length x $ 40;
x=cats(x,n);

end;

It process one group (cluster and temp are group variables) data. (e.g. do until(last.temp) )

x concated the row number within the same group.

firstly, x= the first n (1),since x is missing value.

secondly, x= the first and second n ( 12 ),since x is RETAIN in the same data step loop.

.........

until the last.temp ,and output the last group value .if data like above (only two obs), the output is

cluster n temp x

1 2 21oct2020 12

Here x identity which obs has 21oct2020 , (the first obs and second one)

Here x also identity the during of overlap date.

cluster n temp x

1 2 21oct2020 12

1 2 22oct2020 12

1 2 23oct2020 12

The last data step could get this:

cluster n temp x start end

1 2 23oct2020 12 21oct2020 23oct2020

data test;
	infile datalines delimiter = ',';
	input solddt: date9. rxsup  dose  med_end_dt: date9. cluster ;
	format solddt yymmdd10. med_end_dt yymmdd10.;
	datalines;
	10sep2009, 16, 200, 26sep2009, 1
	02oct2009, 2, 45, 04oct2009, 1
	06oct2009, 66, 90.91, 11dec2009, 1
	12oct2009, 16, 200, 28oct2009, 1
	21oct2009, 12, 62.5, 02nov2009, 1
	03nov2009, 33, 181.82, 06dec2009, 1
	12nov2009, 2, 45, 14nov2009, 1
	12nov2009, 16, 200, 28nov2009, 1
	04dec2009, 33, 181.82, 06jan2010, 1
	14dec2009, 25, 128, 08jan2010, 1
	17dec2009, 2, 45, 09dec2009, 1
	15jan2010, 5, 45, 20jan2010, 2
	18feb2010, 25, 90, 15mar2010, 3
	;
run;
data temp;
 set test;
 n+1;
 do temp=solddt to med_end_dt;
  output;
 end;
 drop solddt  med_end_dt rxsup;
 format temp date9.;
run;
proc sort data=temp;
by cluster temp;
run;
data temp1;
sum_dose=0;
do until(last.temp);
 set temp;
 by  cluster temp;
 length x $ 40;
 x=cats(x,n);
 sum_dose+dose;
end;
drop dose;
run;
data want;
 set temp1;
 by cluster x notsorted;
 retain start;
 if first.x then start=temp;
 if last.x then do;end=temp;output;end;
 format start end date9.;
 drop n x temp;
run;

joreoh · Posted 04-24-2020 11:23 AM

Hello! The output I want is

t0	dose	t1	New daily dose	Cluster
10-Sep-09	200	9/26/2009	200	1
2-Oct-09	45	10/4/2009	45	1
6-Oct-09	90.91	10/11/2009	90.91	1
12-Oct-09	200	10/20/2009	290.91	1
21-Oct-09	62.5	10/28/2009	353.41	1
10/29/2009		11/2/2009	153.41	1
3-Nov-09	181.82	11/11/2009	272.73	1
12-Nov-09	200	11/14/2009	335.91	1
11/15/2009	16	11/28/2009	290.91	1
11/29/2020		12/3/2009	272.73	1
4-Dec-09	181.82	12/6/2009	454.55	1
12/7/2009		12/11/2009	272.73	1
12/12/2009		12/13/2009	181.82	1
14-Dec-09	25	12/16/2009	309.82	1
17-Dec-09	45	12/19/2009	354.82	1
20-Dec-09		1/6/2010	309.82	1
1/7/2010		1/8/2010	128	1
15-Jan-10		1/20/2010	45	2
18-Feb-10		3/12/2010	90	3

Ksharp · Posted 04-25-2020 06:46 AM

So did you run my code?
if want include 'dose' column, remove statement 'drop dose;'

joreoh · Posted 04-27-2020 11:39 AM

@Ksharp Yes! Thank you for the explanation. It was very helpful in understanding how the program works. My concern my concern now is that this program is not robust for large data sets. I am possibly looking at millions of records with potentially long dispense duration periods for different subjects. Any ideas on how I can make the program robust?

Ksharp · Posted 04-28-2020 07:05 AM

I don't know what do you mean by 'Robust' .
Do you mean it is very slowly for big table ?
You could split the big table into many small tables and write a macro to go through all these small tables.

Or perhaps you want start a new session to explain it more details and let others see it too .

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away