Hello,
I am new to this board so apologies if I don't yet know the proper etiquette for posting. I have found a couple questions posed that are similar to mine in nature, but none is quite what I am looking for. I don't want to waste anyone's time so if I have missed another solution please point me in the right direction and I will be happy to try those out!
I am looking to find the total days overlapped between two drug classes throughout 1 year. There may be multiple periods of overlap throughout the year, however, I want to sum the total days of overlap into one final number. I would also like to include the first day of overlap.
I have already broken my data down into years so each file is one complete year. Below is a simplified version of my data with what I currently 'have' and the output I 'want'.
HAVE | HAVE | HAVE | HAVE | HAVE | WANT | WANT |
ID | Year | RxStart | RxEnd | DrugClass | Overlap (days) | 1st Day of Overlap |
1 | 2009 | 17898 | 17928 | 0 | 30 | 17898 |
1 | 2009 | 17898 | 17928 | 1 | 30 | 17898 |
1 | 2009 | 18141 | 18171 | 0 | 30 | 17898 |
1 | 2009 | 18171 | 18201 | 0 | 30 | 17898 |
2 | 2009 | 18079 | 18115 | 1 | 15 | 18110 |
2 | 2009 | 18110 | 18140 | 0 | 15 | 18110 |
2 | 2009 | 18215 | 18230 | 1 | 15 | 18110 |
2 | 2009 | 18220 | 18234 | 0 | 15 | 18110 |
3 | 2009 | 18089 | 18125 | 0 | 0 | . |
3 | 2009 | 18126 | 18146 | 0 | 0 | . |
3 | 2009 | 18156 | 18186 | 1 | 0 | . |
3 | 2009 | 18186 | 18196 | 1 | 0 | . |
Any and all help is appreciated! Thank you for your time.
Using 1-year arrays:
data have;
input ID Year RxStart RxEnd DrugClass;
format RxStart RxEnd yymmdd10.;
datalines;
1 2009 17898 17928 0
1 2009 17898 17928 1
1 2009 18141 18171 0
1 2009 18171 18201 0
2 2009 18079 18115 1
2 2009 18110 18140 0
2 2009 18215 18230 1
2 2009 18220 18234 0
3 2009 18089 18125 0
3 2009 18126 18146 0
3 2009 18156 18186 1
3 2009 18186 18196 1
;
proc sort data=have; by year ID; run;
data want;
array c0{0:366};
array c1{0:366};
do until(last.ID);
set have; by year id;
do date = rxStart to rxEnd;
i = intck("day", mdy(1, 1, year), date);
if drugClass = 0
then c0{i} = 1;
else c1{i} = 1;
end;
end;
overlap = 0;
do i = 0 to 366;
if c0{i} and c1{i} then do;
overlap = overlap + 1;
if missing(firstDay)
then firstDay = intnx("day", mdy(1, 1, year), i);
end;
end;
do until(last.ID);
set have; by year id;
output;
end;
format firstDay yymmdd10.;
drop c0: c1: date i;
run;
proc print data=want noobs; run;
Using 1-year arrays:
data have;
input ID Year RxStart RxEnd DrugClass;
format RxStart RxEnd yymmdd10.;
datalines;
1 2009 17898 17928 0
1 2009 17898 17928 1
1 2009 18141 18171 0
1 2009 18171 18201 0
2 2009 18079 18115 1
2 2009 18110 18140 0
2 2009 18215 18230 1
2 2009 18220 18234 0
3 2009 18089 18125 0
3 2009 18126 18146 0
3 2009 18156 18186 1
3 2009 18186 18196 1
;
proc sort data=have; by year ID; run;
data want;
array c0{0:366};
array c1{0:366};
do until(last.ID);
set have; by year id;
do date = rxStart to rxEnd;
i = intck("day", mdy(1, 1, year), date);
if drugClass = 0
then c0{i} = 1;
else c1{i} = 1;
end;
end;
overlap = 0;
do i = 0 to 366;
if c0{i} and c1{i} then do;
overlap = overlap + 1;
if missing(firstDay)
then firstDay = intnx("day", mdy(1, 1, year), i);
end;
end;
do until(last.ID);
set have; by year id;
output;
end;
format firstDay yymmdd10.;
drop c0: c1: date i;
run;
proc print data=want noobs; run;
Hello @ssulli11,
Your overlap counts (30, 15) suggest that RxEnd is the day after the last drug administration (in the respective period). If this is the case, you should adapt the relevant DO loop in PG's solution accordingly:
do date = rxStart to rxEnd-1;
data have;
input ID Year RxStart RxEnd DrugClass;
format RxStart RxEnd yymmdd10.;
datalines;
1 2009 17898 17928 0
1 2009 17898 17928 1
1 2009 18141 18171 0
1 2009 18171 18201 0
2 2009 18079 18115 1
2 2009 18110 18140 0
2 2009 18215 18230 1
2 2009 18220 18234 0
3 2009 18089 18125 0
3 2009 18126 18146 0
3 2009 18156 18186 1
3 2009 18186 18196 1
;
data want;
do until(last.id);
set have;
by id;
k=lag(rxend);
if first.id then do; flag=0;Day1stofOverlap=.;Overlap=0;end;
if id=lag(id) and lag(drugclass) ne drugclass and flag eq 0 and rxstart<k then do;
dif=k-rxstart;flag=1;
if not Day1stofOverlap then Day1stofOverlap=rxstart;
end;
else flag=0;
Overlap+dif;
call missing(dif);
end;
do until(last.id);
set have;
by id;
output;
end;
format Day1stofOverlap yymmdd10.;
drop k dif;
run;
A request plz:
If the above works or does not work let me know a feedback be it positive or negative. The logic was given to me by a Python expert college mate of mine by the name Mr. Jim Scully at DePaul university, Chicago. I merely translated his logic to SAS syntax. We look forward to your feedback for our continuous learning and improvement. Thank you
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.