Hi SAS Pros,
I have a drug administration dataset like this:
ID | No_of_Prescription | Start_taking_date | Day_supply | Last_taking_date | drug |
1 | 1 | 1/1/2020 | 21 | 1/21/2020 | A |
1 | 2 | 1/5/2020 | 10 | 1/14/2020 | B |
1 | 3 | 1/18/2020 | 4 | 1/21/2020 | B |
1 | 4 | 1/21/2020 | 10 | 1/30/2020 | B |
2 | 1 | 1/2/2020 | 15 | 1/16/2020 | A |
2 | 2 | 1/9/2020 | 10 | 1/18/2020 | B |
2 | 3 | 1/12/2020 | 15 | 1/26/2020 | B |
2 | 4 | 2/2/2020 | 15 | 2/16/2020 | B |
3 | 1 | 1/3/2020 | 10 | 1/12/2020 | B |
3 | 2 | 1/5/2020 | 10 | 1/14/2020 | A |
3 | 3 | 1/9/2020 | 10 | 1/18/2020 | B |
3 | 4 | 1/11/2020 | 20 | 1/30/2020 | A |
3 | 5 | 1/1/2020 | 20 | 1/20/2020 | B |
4 | 1 | 1/4/2020 | 6 | 1/9/2020 | B |
4 | 2 | 1/4/2020 | 10 | 1/13/2020 | A |
4 | 3 | 1/7/2020 | 15 | 1/21/2020 | B |
4 | 4 | 1/13/2020 | 11 | 1/23/2020 | B |
There are two types of drug that I am interested in, drug A and B. The time period I am interested in is from 1/1/2020 to 12/31/2020 (I know there is no date later than 1/31 in the table, please just assume there are. My final data will eventually include the data from 1/1 to 12/31). I want to create
1. a series of dummy variables for every single date during this time period to represent whether drug A or B was administered on every single date. For example, for ID=1, drug A was taking from 1/1/2020 to/1/13/2020, and drug B was taking from 1/5 to 1/14, 1/18 to 1/21, and 1/21 to 1/30. So, what I want is to create a separate table like below to know the exact dates where drug A and B were taking (I only list the dummies for ID=1 here)
ID | drug | 1/1/2020 | 1/2/2020 | 1/3/2020 | 1/4/2020 | 1/5/2020 | 1/6/2020 | 1/7/2020 | 1/8/2020 | 1/9/2020 | 1/10/2020 | 1/11/2020 | 1/12/2020 | 1/13/2020 | 1/14/2020 | 1/15/2020 | 1/16/2020 | 1/17/2020 | 1/18/2020 | 1/19/2020 | 1/20/2020 | 1/21/2020 | 1/22/2020 | ######## | ######## | ######## | ######## | ######## | ######## | ######## | 1/30/2020 | 1/31/2020 |
1 | A | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||||||||
B | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2. My next step is to know every time the first date that both drug A and B were both taking (1/5/2020 and 1/18/2020), how many days the overlapping use lasted for each time (10 days and 4days), and the last date that both drugs were taking (1/14/2020 and 1/21/2020). Wanted table like this:
ID | overlap_first_date | length | overlap_end_date |
1 | 1/5/2020 | 10 | 1/14/2020 |
1/18/2020 | 4 | 1/21/2020 |
Please have as many rows as whenever overlapping taking drug A and B happened for each ID.
I know I had posted a similar question before and some SAS masters have provided their valuable solutions for me. This question is somehow different than the previous one, so if you happened to see my early post, just FYI why I ask it here in a new thread.
I would really appreciate any help with this question!
Best regards,
C
Please post data from now on as a SAS data step, and not as screen captures. We cannot test our code from screen captures.
I think, as I almost always do, that this problem is more easily solved by keeping the data set long instead of turning it into a wide data set.
UNTESTED CODE (I would be glad to test my code before I provide it if the data is provided as described above)
data long;
set have;
do date=start_taking_date to last_taking_date by 1;
output;
end;
run;
proc freq data=long;
tables id*date/noprint out=frequencies;
run;
Now, the dates of overlap have a COUNT of 2 in the data set FREQUENCIES.
The final step creates a new record when the variable COUNT 2 changes to 1 (patient stops taking both drugs), or when the count changes from 1 to 2 (patient starts taking both drugs).
data want;
set frequencies;
by id;
prev_freq=lag(count);
prev_date=lag(date);
if prev_freq=1 and count>=2 or (first.id and count=2) then
overlap_first_date=date;
if prev_freq>=2 and count=1 and not first.id then do;
overlap_last_date=prev_date;
length=overlap_last_date-overlap_first_date;
output;
end;
keep id overlap: length;
run;
Hi,
Thank you so much for the code! I am posting the dataset in a SAS data step here FYI.
I have tested your code and found that the count 2 are not only generated when drug A and B overlaps, but also when two prescription of the same drug overlaps. For example, ID=1 had two types of drug B taken on 1/21, ID=2 had two types of drug B taken from 1/12 to 1/16. The variable Date representing these dates also has a value of 2. I think I need to make value of 2 generated only when different drugs overlaps not for same drug.
I really appreciate your instruction!
Best regards,
data have;
input ID No_of_Prescription Start_taking_date Day_supply Last_taking_date drug $;
datalines;
1 1 1/1/2020 21 1/21/2020 A
1 2 1/5/2020 10 1/14/2020 B
1 3 1/18/2020 4 1/21/2020 B
1 4 1/21/2020 10 1/30/2020 B
2 1 1/2/2020 15 1/16/2020 A
2 2 1/9/2020 10 1/18/2020 B
2 3 1/12/2020 15 1/26/2020 B
2 4 2/2/2020 15 2/16/2020 B
3 1 1/3/2020 10 1/12/2020 B
3 2 1/5/2020 10 1/14/2020 A
3 3 1/9/2020 10 1/18/2020 B
3 4 1/11/2020 20 1/30/2020 A
3 5 1/1/2020 20 1/20/2020 B
4 1 1/4/2020 6 1/9/2020 B
4 2 1/4/2020 10 1/13/2020 A
4 3 1/7/2020 15 1/21/2020 B
4 4 1/13/2020 11 1/23/2020 B
;
run;
Post your code using the Insert SAS Code icon, as @ErikLund_Jensen has done. Unformatted SAS code, esp. when using datalines, sometimes fails when pasted into SAS.
I have modified my code so that it handles properly the case where A is prescribed twice on a day, or B is prescribed twice on a day (or both! 😊 )
data long;
set have;
do date=start_taking_date to last_taking_date by 1;
output;
end;
run;
proc freq data=long;
tables id*drug*date/noprint out=frequencies;
run;
data long1;
merge frequencies(where=(drug='A') rename=(count=countA))
frequencies(where=(drug='B') rename=(count=countB));
by id date;
retain overlap_first_date;
both_drugs=(countA>=1 and countB>=1);
prev_both_drugs=lag(both_drugs);
prev_id=lag(id);
if first.id then prev_both_drugs=0;
if prev_both_drugs=0 and both_drugs=1 and (first.id or id=prev_id)
then overlap_first_date=date;
if prev_both_drugs=1 and both_drugs=0 and id=prev_id then do;
overlap_end_date=date-1;
delta=overlap_end_date-overlap_first_date;
output;
end;
format overlap: yymmdd10.;
drop drug percent prev: countA countB date;
run;
Hi @CynthiaWei
I strongly agree with @PaigeMiller . A wide table is difficult to use. It is much easier to work with long and narrow tables.
The following code uses Proc SQL to identify dates wirh more than one drug instead of Proc Freq, but that is a matter of taste. The real difference is in the data step, where I use an extra SET statement to "look ahead" instead of using lag functions. It is easier to understand and code in my opinion.
* Get test data;
data have;
informat ID No_of_Prescription 8. Start_taking_date mmddyy10. Day_supply 8. Last_taking_date mmddyy10. drug $1.;
format Start_taking_date Last_taking_date mmddyy10.;
input ID No_of_Prescription Start_taking_date Day_supply Last_taking_date drug;
datalines;
1 1 1/1/2020 21 1/21/2020 A
1 2 1/5/2020 10 1/14/2020 B
1 3 1/18/2020 4 1/21/2020 B
1 4 1/21/2020 10 1/30/2020 B
2 1 1/2/2020 15 1/16/2020 A
2 2 1/9/2020 10 1/18/2020 B
2 3 1/12/2020 15 1/26/2020 B
2 4 2/2/2020 15 2/16/2020 B
3 1 1/3/2020 10 1/12/2020 B
3 2 1/5/2020 10 1/14/2020 A
3 3 1/9/2020 10 1/18/2020 B
3 4 1/11/2020 20 1/30/2020 A
3 5 1/1/2020 20 1/20/2020 B
4 1 1/4/2020 6 1/9/2020 B
4 2 1/4/2020 10 1/13/2020 A
4 3 1/7/2020 15 1/21/2020 B
4 4 1/13/2020 11 1/23/2020 B
;
run;
* Roll out on all days between Start_taking_date and Last_taking_date;
data w1 (keep=ID Date Drug);
set have;
format Date mmddyy10.;
do Date = Start_taking_date to Last_taking_date;
output;
end;
run;
* Get rid of double prescriptions (ex. 2 x B at same Date for same ID);
proc sql;
create table w2 as
select distinct ID, Date, Drug
from w1;
quit;
* Keep only dates with more than one drug pr. ID;
proc sql;
create table w3 as
select distinct
ID, Date, count(*) as DrugCount
from w2
group by ID, Date
having DrugCount > 1;
quit;
* Compress to date intervals;
data want (drop=Date NextID NextDate);
set w3 (drop=DrugCount) end=eof;
by ID;
retain Overlap_first_date Length;
format Overlap_first_date mmddyy10. Length 8. Overlap_last_date mmddyy10.;
* Read ID and Date from next record to determine last row in a sequence;
if not eof then set w3 (keep = ID Date rename=(ID=NextID Date=NextDate) firstobs=2);
* Initiate Overlap_first_date and Length in first record;
if _N_ = 1 then do;
Overlap_first_date = Date;
Length = 0;
end;
Length = Length + 1;
* Output end-of-sequence and reset Overlap_first_date and Length;
if last.ID or (NextDate - Date) > 1 then do;
Overlap_last_date = Date;
output;
Overlap_first_date = nextDate;
Length = 0;
end;
run;
Result:
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.