Re: How to create dummy variable for every single date during a period

CynthiaWei · Posted 10-18-2020 04:53 AM

Hi SAS Pros,

I have a drug administration dataset like this:

ID	No_of_Prescription	Start_taking_date	Day_supply	Last_taking_date	drug
1	1	1/1/2020	21	1/21/2020	A
1	2	1/5/2020	10	1/14/2020	B
1	3	1/18/2020	4	1/21/2020	B
1	4	1/21/2020	10	1/30/2020	B
2	1	1/2/2020	15	1/16/2020	A
2	2	1/9/2020	10	1/18/2020	B
2	3	1/12/2020	15	1/26/2020	B
2	4	2/2/2020	15	2/16/2020	B
3	1	1/3/2020	10	1/12/2020	B
3	2	1/5/2020	10	1/14/2020	A
3	3	1/9/2020	10	1/18/2020	B
3	4	1/11/2020	20	1/30/2020	A
3	5	1/1/2020	20	1/20/2020	B
4	1	1/4/2020	6	1/9/2020	B
4	2	1/4/2020	10	1/13/2020	A
4	3	1/7/2020	15	1/21/2020	B
4	4	1/13/2020	11	1/23/2020	B

There are two types of drug that I am interested in, drug A and B. The time period I am interested in is from 1/1/2020 to 12/31/2020 (I know there is no date later than 1/31 in the table, please just assume there are. My final data will eventually include the data from 1/1 to 12/31). I want to create

1. a series of dummy variables for every single date during this time period to represent whether drug A or B was administered on every single date. For example, for ID=1, drug A was taking from 1/1/2020 to/1/13/2020, and drug B was taking from 1/5 to 1/14, 1/18 to 1/21, and 1/21 to 1/30. So, what I want is to create a separate table like below to know the exact dates where drug A and B were taking (I only list the dummies for ID=1 here)

ID	drug	1/1/2020	1/2/2020	1/3/2020	1/4/2020	1/5/2020	1/6/2020	1/7/2020	1/8/2020	1/9/2020	1/10/2020	1/11/2020	1/12/2020	1/13/2020	1/14/2020	1/15/2020	1/16/2020	1/17/2020	1/18/2020	1/19/2020	1/20/2020	1/21/2020	1/22/2020	########	########	########	########	########	########	########	1/30/2020	1/31/2020
1	A	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
1	B					1	1	1	1	1	1	1	1	1	1				1	1	1	1	1	1	1	1	1	1	1	1	1

2. My next step is to know every time the first date that both drug A and B were both taking (1/5/2020 and 1/18/2020), how many days the overlapping use lasted for each time (10 days and 4days), and the last date that both drugs were taking (1/14/2020 and 1/21/2020). Wanted table like this:

ID	overlap_first_date	length	overlap_end_date
1	1/5/2020	10	1/14/2020
	1/18/2020	4	1/21/2020

Please have as many rows as whenever overlapping taking drug A and B happened for each ID.

I know I had posted a similar question before and some SAS masters have provided their valuable solutions for me. This question is somehow different than the previous one, so if you happened to see my early post, just FYI why I ask it here in a new thread.

I would really appreciate any help with this question!

Best regards,

C

PaigeMiller · Posted 10-18-2020 07:30 AM

Please post data from now on as a SAS data step, and not as screen captures. We cannot test our code from screen captures.

I think, as I almost always do, that this problem is more easily solved by keeping the data set long instead of turning it into a wide data set.

UNTESTED CODE (I would be glad to test my code before I provide it if the data is provided as described above)

data long;
    set have;
    do date=start_taking_date to last_taking_date by 1;
        output;
    end;
run;
proc freq data=long;
    tables id*date/noprint out=frequencies;
run;

Now, the dates of overlap have a COUNT of 2 in the data set FREQUENCIES.

The final step creates a new record when the variable COUNT 2 changes to 1 (patient stops taking both drugs), or when the count changes from 1 to 2 (patient starts taking both drugs).

data want;
    set frequencies;
    by id;
    prev_freq=lag(count);
    prev_date=lag(date);
    if prev_freq=1 and count>=2 or (first.id and count=2) then
        overlap_first_date=date;
    if prev_freq>=2 and count=1 and not first.id then do;
        overlap_last_date=prev_date;
        length=overlap_last_date-overlap_first_date;
        output;
    end;
    keep id overlap: length;
run;

--
Paige Miller

CynthiaWei · Posted 10-19-2020 04:06 AM

Hi,

Thank you so much for the code! I am posting the dataset in a SAS data step here FYI.

I have tested your code and found that the count 2 are not only generated when drug A and B overlaps, but also when two prescription of the same drug overlaps. For example, ID=1 had two types of drug B taken on 1/21, ID=2 had two types of drug B taken from 1/12 to 1/16. The variable Date representing these dates also has a value of 2. I think I need to make value of 2 generated only when different drugs overlaps not for same drug.

I really appreciate your instruction!

Best regards,

data have;

input ID No_of_Prescription Start_taking_date Day_supply Last_taking_date drug $;

datalines;

1 1 1/1/2020 21 1/21/2020 A
1 2 1/5/2020 10 1/14/2020 B
1 3 1/18/2020 4 1/21/2020 B
1 4 1/21/2020 10 1/30/2020 B
2 1 1/2/2020 15 1/16/2020 A
2 2 1/9/2020 10 1/18/2020 B
2 3 1/12/2020 15 1/26/2020 B
2 4 2/2/2020 15 2/16/2020 B
3 1 1/3/2020 10 1/12/2020 B
3 2 1/5/2020 10 1/14/2020 A
3 3 1/9/2020 10 1/18/2020 B
3 4 1/11/2020 20 1/30/2020 A
3 5 1/1/2020 20 1/20/2020 B
4 1 1/4/2020 6 1/9/2020 B
4 2 1/4/2020 10 1/13/2020 A
4 3 1/7/2020 15 1/21/2020 B
4 4 1/13/2020 11 1/23/2020 B

;

run;

ScottBass · Posted 10-19-2020 04:50 AM

Post your code using the Insert SAS Code icon, as @ErikLund_Jensen has done. Unformatted SAS code, esp. when using datalines, sometimes fails when pasted into SAS.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

PaigeMiller · Posted 10-19-2020 07:24 AM

I have modified my code so that it handles properly the case where A is prescribed twice on a day, or B is prescribed twice on a day (or both! 😊 )

data long;
    set have;
    do date=start_taking_date to last_taking_date by 1;
        output;
    end;
run;
proc freq data=long;
    tables id*drug*date/noprint out=frequencies;
run;
data long1;
	merge frequencies(where=(drug='A') rename=(count=countA)) 
            frequencies(where=(drug='B') rename=(count=countB));
	by id date;
	retain overlap_first_date;
	both_drugs=(countA>=1 and countB>=1);
	prev_both_drugs=lag(both_drugs);
    prev_id=lag(id);
	if first.id then prev_both_drugs=0;
	if prev_both_drugs=0 and both_drugs=1 and (first.id or id=prev_id)
         then overlap_first_date=date;
	if prev_both_drugs=1 and both_drugs=0 and id=prev_id then do; 
        overlap_end_date=date-1; 
        delta=overlap_end_date-overlap_first_date; 
        output; 
    end;
	format overlap: yymmdd10.;
	drop drug percent prev: countA countB date;
run;

--
Paige Miller

ErikLund_Jensen · Posted 10-19-2020 04:12 AM

Hi @CynthiaWei

I strongly agree with @PaigeMiller . A wide table is difficult to use. It is much easier to work with long and narrow tables.

The following code uses Proc SQL to identify dates wirh more than one drug instead of Proc Freq, but that is a matter of taste. The real difference is in the data step, where I use an extra SET statement to "look ahead" instead of using lag functions. It is easier to understand and code in my opinion.

* Get test data;
data have;
	informat ID	No_of_Prescription 8. Start_taking_date mmddyy10. Day_supply 8. Last_taking_date mmddyy10. drug $1.;
	format Start_taking_date Last_taking_date mmddyy10.;
	input ID No_of_Prescription Start_taking_date Day_supply Last_taking_date drug;
datalines;
1	1	1/1/2020	21	1/21/2020	A
1	2	1/5/2020	10	1/14/2020	B
1	3	1/18/2020	4	1/21/2020	B
1	4	1/21/2020	10	1/30/2020	B
2	1	1/2/2020	15	1/16/2020	A
2	2	1/9/2020	10	1/18/2020	B
2	3	1/12/2020	15	1/26/2020	B
2	4	2/2/2020	15	2/16/2020	B
3	1	1/3/2020	10	1/12/2020	B
3	2	1/5/2020	10	1/14/2020	A
3	3	1/9/2020	10	1/18/2020	B
3	4	1/11/2020	20	1/30/2020	A
3	5	1/1/2020	20	1/20/2020	B
4	1	1/4/2020	6	1/9/2020	B
4	2	1/4/2020	10	1/13/2020	A
4	3	1/7/2020	15	1/21/2020	B
4	4	1/13/2020	11	1/23/2020	B
;
run;

* Roll out on all days between Start_taking_date and Last_taking_date;
data w1 (keep=ID Date Drug);  
	set have;
	format Date mmddyy10.;
	do Date = Start_taking_date to Last_taking_date;
		output;
	end;
run;

* Get rid of double prescriptions (ex. 2 x B at same Date for same ID);
proc sql;
	create table w2 as 
		select distinct ID, Date, Drug
		from w1;
quit;

* Keep only dates with more than one drug pr. ID;
proc sql;
	create table w3 as
		select distinct 
			ID, Date, count(*) as DrugCount
		from w2
		group by ID, Date
		having DrugCount > 1;
quit;
	
* Compress to date intervals;
data want (drop=Date  NextID NextDate); 
	set w3 (drop=DrugCount) end=eof;
	by ID;
	retain Overlap_first_date Length;
	format Overlap_first_date mmddyy10. Length 8. Overlap_last_date mmddyy10.;

	* Read ID and Date from next record to determine last row in a sequence;
	if not eof then set w3 (keep = ID Date rename=(ID=NextID Date=NextDate) firstobs=2);

	* Initiate Overlap_first_date and Length in first record;
	if _N_ = 1 then do;
		Overlap_first_date = Date;
		Length = 0;
	end;
	Length = Length + 1;

	* Output end-of-sequence and reset Overlap_first_date and Length;
	if last.ID or (NextDate - Date) > 1 then do;
		Overlap_last_date = Date;
		output; 
		Overlap_first_date = nextDate;
		Length = 0;
	end;
run;

Result:

How to create dummy variable for every single date during a period