BookmarkSubscribeRSS Feed
radhikaa4
Calcite | Level 5

I have the following dataset tthat I need to pull out or break overlap date/time 

 

HAVE:

idstart_date_timestop_date_time
13/5/2020 0:003/5/2020 23:12
13/5/2020 9:203/5/2020 9:45

 

WANT:

idstart_date_timestop_date_time
13/5/2020 0:003/5/2020 9:20
13/5/2020 9:203/5/2020 9:45
13/5/2020 9:453/5/2020 23:12

 

I am not sure how to do this and I have been using this as a source but after writing the code, it didn't work. Any help would be greately appreciated! http://support.sas.com/resources/papers/proceedings09/079-2009.pdf

11 REPLIES 11
yabwon
Onyx | Level 15

Hi @radhikaa4 ,

 

maybe like that (mind the example is on dates not timestamps, but you can adjust it):

 

data have;
input Start : date9. End : date9.;
format Start mmddyy10. End mmddyy10.;
cards;
30JUN2017	30JUN2018
01JAN2018	01JAN2019
30JUN2018	30JUN2019
01JAN2019	01JAN2020
;
run;

data have2;
set
  have(keep =Start rename=(Start=date))
  have(keep =End   rename=(End  =date))
;
run;

proc sort data = have2;
  by Date;
run;

data want;
  set have2;
  lag_date = lag(date);
  if _N_ > 1;
  start = date;
  end = lag_date;
  format Start mmddyy10. End mmddyy10.;
  keep start end;
run;

proc print;
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



radhikaa4
Calcite | Level 5

Hi @yabwon . Thanks. How do I include ID in it? I tried keep= subjectId

yabwon
Onyx | Level 15

Hi @radhikaa4 ,

 

Sorry, I missed the ID, try this:

 

data have;
input ID Start : date9. End : date9.;
format Start mmddyy10. End mmddyy10.;
cards;
1 30JUN2017	30JUN2018
1 01JAN2018	01JAN2019
1 30JUN2018	30JUN2019
1 01JAN2019	01JAN2020
2 30JUN2017	30JUN2018
2 01JAN2018	01JAN2019
3 30JUN2018	30JUN2019
3 01JAN2019	01JAN2020
;
run;

data have2;
set
  have(keep =Start ID rename=(Start=date))
  have(keep =End   ID rename=(End  =date))
;
run;

proc sort data = have2;
  by ID Date;
run;

data want;
  set have2;
  by ID;
  lag_date = lag(date);
  if not first.ID ;
  start = date;
  end = lag_date;
  format Start mmddyy10. End mmddyy10.;
  keep id start end;
run;

proc print;
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User
data have;
infile cards expandtabs;
input id	start_date_time	& anydtdtm. stop_date_time & anydtdtm.;
format start_date_time stop_date_time mdyampm.;
cards;
1	3/5/2020 0:00	 3/5/2020 23:12
1	3/5/2020 9:20	 3/5/2020 9:45
;
data temp;
 set have;
 n+1;
 do temp=start_date_time to stop_date_time;
  output;
 end;
 drop start_date_time  stop_date_time;
run;
proc sort data=temp;
by id temp;
run;
data temp1;
do until(last.temp);
 set temp;
 by  id temp;
 length x $ 40;
 x=cats(x,n);
end;
run;
data want;
 set temp1;
 by id x notsorted;
 retain start;
 if first.x then start=temp;
 if last.x then do;end=temp;output;end;
 format start end mdyampm.;
 drop n x temp;
run;

 

joreoh
Calcite | Level 5

I'm working on a medication data set with overlapped period and daily dose for each record

data test;
	infile datalines delimiter = ',';
	input solddt: date9. rxsup  dose  med_end_dt: date9. cluster ;
	format solddt yymmdd10. med_end_dt yymmdd10.;
	datalines;
	10sep2009, 16, 200, 26sep2009, 1
	02oct2009, 2, 45, 04oct2009, 1
	06oct2009, 66, 90.91, 11dec2009, 1
	12oct2009, 16, 200, 28oct2009, 1
	21oct2009, 12, 62.5, 02nov2009, 1
	03nov2009, 33, 181.82, 06dec2009, 1
	12nov2009, 2, 45, 14nov2009, 1
	12nov2009, 16, 200, 28nov2009, 1
	04dec2009, 33, 181.82, 06jan2010, 1
	14dec2009, 25, 128, 08jan2010, 1
	17dec2009, 2, 45, 09dec2009, 1
	15jan2010, 5, 45, 20jan2010, 2
	18feb2010, 25, 90, 15mar2010, 3
	;
run;

 

How would you implement a new variable newdailydose which is the sum of daily dose during the overlapped periods after the records are separated?

 

Also, please explain how x is concatenated with n and x in the do loop in your code

yabwon
Onyx | Level 15

Hi,

 

do you mean something like this:

data test2;
  set test;
  do date = solddt to med_end_dt;
    output;
  end;
  keep date dose cluster;
run;
proc sql;
  create table test3 as
  select cluster, date format yymmdd10., sum(dose) as sumDose
  from test2
  group by cluster, date
  order by cluster, date
  ;
run;
data test4;
  do until(last.sumDose);
    set test3;
    by cluster sumDose notsorted;
    if first.sumDose then solddt = date;
    if last.sumDose then
      do;
        med_end_dt = date;
        dose = sumDose; 
      end;
  end;
  drop sumDose date;
  format solddt yymmdd10. med_end_dt yymmdd10.;
run;
proc print;
run;

?

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User

OK. Plz post the output you are looking for in order to better understand your question.

Where there was overlapped date , one date or obs correspond to many obs after separated it one by one

,like

 

 cluster n temp

1   1   21oct2020

1   2   21oct2020

 

here N represent the row number(1 is the first row ,2 is the second row)

" please explain how x is concatenated with n and x in the do loop in your code"

 

do until(last.temp);
set temp;
by cluster temp;
length x $ 40;
x=cats(x,n);

end;

 

It process one group (cluster and temp are group variables) data. (e.g. do until(last.temp)  )

x concated the row number within the same group.

firstly, x= the first n (1),since x is missing value.

secondly,  x= the first and second n ( 12 ),since x is RETAIN in the same data step loop.

.........

until the last.temp  ,and output the last group value .if data like above (only two obs), the output is 

cluster  n    temp            x

1          2   21oct2020    12

Here x identity which obs has 21oct2020 , (the first obs and second one)

 

Here x also identity the during of overlap date.

cluster  n    temp            x

1          2   21oct2020    12

1          2   22oct2020    12

1          2   23oct2020    12

The last data step could get this:

cluster n     temp           x     start            end

1          2    23oct2020   12   21oct2020  23oct2020

 

 

 

 

 

data test;
	infile datalines delimiter = ',';
	input solddt: date9. rxsup  dose  med_end_dt: date9. cluster ;
	format solddt yymmdd10. med_end_dt yymmdd10.;
	datalines;
	10sep2009, 16, 200, 26sep2009, 1
	02oct2009, 2, 45, 04oct2009, 1
	06oct2009, 66, 90.91, 11dec2009, 1
	12oct2009, 16, 200, 28oct2009, 1
	21oct2009, 12, 62.5, 02nov2009, 1
	03nov2009, 33, 181.82, 06dec2009, 1
	12nov2009, 2, 45, 14nov2009, 1
	12nov2009, 16, 200, 28nov2009, 1
	04dec2009, 33, 181.82, 06jan2010, 1
	14dec2009, 25, 128, 08jan2010, 1
	17dec2009, 2, 45, 09dec2009, 1
	15jan2010, 5, 45, 20jan2010, 2
	18feb2010, 25, 90, 15mar2010, 3
	;
run;
data temp;
 set test;
 n+1;
 do temp=solddt to med_end_dt;
  output;
 end;
 drop solddt  med_end_dt rxsup;
 format temp date9.;
run;
proc sort data=temp;
by cluster temp;
run;
data temp1;
sum_dose=0;
do until(last.temp);
 set temp;
 by  cluster temp;
 length x $ 40;
 x=cats(x,n);
 sum_dose+dose;
end;
drop dose;
run;
data want;
 set temp1;
 by cluster x notsorted;
 retain start;
 if first.x then start=temp;
 if last.x then do;end=temp;output;end;
 format start end date9.;
 drop n x temp;
run;

 

joreoh
Calcite | Level 5

Hello! The output I want is

 

t0doset1New daily doseCluster
10-Sep-092009/26/20092001
2-Oct-094510/4/2009451
6-Oct-0990.9110/11/200990.911
12-Oct-0920010/20/2009290.911
21-Oct-0962.510/28/2009353.411
10/29/2009 11/2/2009153.411
3-Nov-09181.8211/11/2009272.731
12-Nov-0920011/14/2009335.911
11/15/20091611/28/2009290.911
11/29/2020 12/3/2009272.731
4-Dec-09181.8212/6/2009454.551
12/7/2009 12/11/2009272.731
12/12/2009 12/13/2009181.821
14-Dec-092512/16/2009309.821
17-Dec-094512/19/2009354.821
20-Dec-09 1/6/2010309.821
1/7/2010 1/8/20101281
15-Jan-10 1/20/2010452
18-Feb-10 3/12/2010903
Ksharp
Super User
So did you run my code?
if want include 'dose' column, remove statement 'drop dose;'
joreoh
Calcite | Level 5

 @Ksharp Yes! Thank you for the explanation. It was very helpful in understanding how the program works. My concern my concern now is that this program is not robust for large data sets. I am possibly looking at millions of records with potentially long dispense duration periods for different subjects. Any ideas on how I can make the program robust?

 

 

Ksharp
Super User
I don't know what do you mean by 'Robust' .
Do you mean it is very slowly for big table ?
You could split the big table into many small tables and write a macro to go through all these small tables.

Or perhaps you want start a new session to explain it more details and let others see it too .

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1236 views
  • 0 likes
  • 4 in conversation