BookmarkSubscribeRSS Feed
yongheelee1212
Calcite | Level 5

 

Dear SAS users, 

 

I am new to longitudinal data analysis and especially when it comes to manipulating date related issue. 

 

I have fairly complicated issue, and any tip along the way is much appreciated. 

I have a data that contains ID, date, and therapy, and my ultimate goal is to create course of treatment. 

Course of treatment inclement with certain condition:

1. if drug that is not part of previous COT is observed and

2. the drug is observed within 30 days

 

So for instance, drug2 is observed at 10/16/2012 and observed twice within 30 days. So, this is the new course of treatment. 

Then all_drug within that course_of_treatment would be all drugs observed during 30 days from first day of course_of_treatment, 

which would be drug 2 and 3. 

Course_of_treatment continues until drug4 is observed.

 

Any comment would be great and thank you! 

The output I want looks as below

 

Untitled picture.png

20 REPLIES 20
Reeza
Super User

Which variables there do you start with and which do you need to calculate?

Can you please provide sample data as text, especially if want example code.

 

You would use a SQL merge here because you can merge on a range then to find occurrences within 30 days. You would first merge the dataset with itself to determine that. Then once you find the counts, you could create episodes/groups. Then use CATX() to combine them to find the course of treatment over time. You'll also want to think about how exactly you want that output data set to look like. 

 


@yongheelee1212 wrote:

 

Dear SAS users, 

 

I am new to longitudinal data analysis and especially when it comes to manipulating date related issue. 

 

I have fairly complicated issue, and any tip along the way is much appreciated. 

I have a data that contains ID, date, and therapy, and my ultimate goal is to create course of treatment. 

Course of treatment inclement with certain condition:

1. if drug that is not part of previous COT is observed and

2. the drug is observed within 30 days

 

So for instance, drug2 is observed at 10/16/2012 and observed twice within 30 days. So, this is the new course of treatment. 

Then all_drug within that course_of_treatment would be all drugs observed during 30 days from first day of course_of_treatment, 

which would be drug 2 and 3. 

Course_of_treatment continues until drug4 is observed.

 

Any comment would be great and thank you! 

The output I want looks as below

 

Untitled picture.png


 

yongheelee1212
Calcite | Level 5

Yes, thank you. 

Attached is the date I have. 

Ultimately, I would like to create variable: Course_of_treatment and if possible, drugs_within_COT. 

 

ID Date Therapy
1 10/1/2011   drug1
1 10/16/2012 drug1
1 10/30/2012 drug2
1 10/30/2012 drug2
1 11/13/2012 drug3
1 11/27/2012 drug2
1 12/11/2012 drug2
1 12/11/2012 drug3
1 12/18/2012 drug2
1 12/24/2012 drug3
1 12/31/2012 drug2
1 1/8/2013 drug3
1 1/15/2013 drug2
1 1/22/2013 drug2
1 1/29/2013 drug3
1 2/5/2013 drug3
1 2/12/2013 drug1
1 2/19/2013 drug4
1 2/26/2013 drug4
1 3/5/2013 drug4

Reeza
Super User

Why isn't drug1 the first COT for ID1?

 

Unfortunately I likely don't have enough time today to wade into a problem this complex but this should get you (or anyone else started at least 🙂

 

data have;
informat ID $1. date mmddyy10. therapy $8.;
format date date9.;
input ID $ Date Therapy $;
cards;
1 10/1/2011   drug1
1 10/16/2012 drug1
1 10/30/2012 drug2
1 10/30/2012 drug2
1 11/13/2012 drug3
1 11/27/2012 drug2
1 12/11/2012 drug2
1 12/11/2012 drug3
1 12/18/2012 drug2
1 12/24/2012 drug3
1 12/31/2012 drug2
1 1/8/2013 drug3
1 1/15/2013 drug2
1 1/22/2013 drug2
1 1/29/2013 drug3
1 2/5/2013 drug3
1 2/12/2013 drug1
1 2/19/2013 drug4
1 2/26/2013 drug4
1 3/5/2013 drug4
2 10/1/2011   drug1
2 10/16/2012 drug1
2 10/30/2012 drug2
2 10/30/2012 drug2
2 11/13/2012 drug3
2 11/27/2012 drug2
2 12/11/2012 drug2
2 12/11/2012 drug3
2 12/18/2012 drug2
2 12/24/2012 drug3
3 12/31/2012 drug2
3 1/8/2013 drug3
3 1/15/2013 drug2
3 1/22/2013 drug2
3 1/29/2013 drug3
3 2/5/2013 drug3
3 2/12/2013 drug1
3 2/19/2013 drug4
3 2/26/2013 drug4
3 3/5/2013 drug4
;;;;

proc sql;
create table counts as
select *, 
	(select count(*) as num 
	from have as h1_x 
	where h1_x.id=h1.id 
	and h1.therapy = h1_x.therapy 
	and h1_x.date between h1.date and (h1.date+30))
		as count
from have as h1
order by 1, 3, 2;
quit;


@yongheelee1212 wrote:

Yes, thank you. 

Attached is the date I have. 

Ultimately, I would like to create variable: Course_of_treatment and if possible, drugs_within_COT. 

 

ID Date Therapy
1 10/1/2011   drug1
1 10/16/2012 drug1
1 10/30/2012 drug2
1 10/30/2012 drug2
1 11/13/2012 drug3
1 11/27/2012 drug2
1 12/11/2012 drug2
1 12/11/2012 drug3
1 12/18/2012 drug2
1 12/24/2012 drug3
1 12/31/2012 drug2
1 1/8/2013 drug3
1 1/15/2013 drug2
1 1/22/2013 drug2
1 1/29/2013 drug3
1 2/5/2013 drug3
1 2/12/2013 drug1
1 2/19/2013 drug4
1 2/26/2013 drug4
1 3/5/2013 drug4


 

yongheelee1212
Calcite | Level 5

thank you so much!

drug1, the first observation is not COT1 because one of the criteria is that it should've been given at least twice within 30 days. 

First one was given at 10/1/2011 and the next one is at 10/16/2012. So it doesn't meet the criteria.

 

much appreciated for your help with start of the approach!

time to do some research and dig into the problem!

Reeza
Super User
I know that this question has been asked and answered on here many times. If you do some searching you should find better answers that are much closer to what you need than this.
yongheelee1212
Calcite | Level 5

Thank you for the response!

 

I actually tried spending many hours looking up SAS communities and other help websites. I was able to find similar looking questions but not as same as mine. 

 

So my intended strategy is to do:

*** identify whether the therapy given on specific date is part of COT***

part_of_therapy = yes 

if (therapy in(therapy) BETWEEN first.date of COT) and first.date of COT+30ays

*** then based on that do something like ***

course of treatment +1 

if 

course_30_days>=2 (happens at least twice in 30 days) and part_of_therapy=yes (if the therapy is part of the treatment). 

 

To do this, I think I need to know how to identify observations in a variable within specific date range and perhaps create variable based on observations from other variable in certain timeframe, but that's something I just couldn't figure out no matter what. 

 

 

 

Reeza
Super User

@yongheelee1212 wrote:

 

 

To do this, I think I need to know how to identify observations in a variable within specific date range and perhaps create variable based on observations from other variable in certain timeframe, but that's something I just couldn't figure out no matter what. 


The in line sql above shows how to look up data based on a range.

novinosrin
Tourmaline | Level 20

Hello @yongheelee1212  Looks like you are new to SAS communities. So hearty welcome!

 

One request is, you could explain your requirement in a more comprehensible detail for anyone to understand and attempt. 

 

I have a data that contains ID, date, and therapy,

 

Requirement goal is to create course of treatment.  so that is COT variable, so what's in with count_30days and  part_of_previous_COT variable???

 

Objective : To increment Course of treatment values with certain condition

1. if drug that is not part of previous COT/*what is this first? How is this computed or derived?*/ is observed and

2. the drug is observed within 30 days  /*within 30 days of what????from when/where? */

 

Please explain the logic relating to sample OUTPUT(variables) how is it derived? The flow should be convenient to grasp even for some dumb folks like me. Thank you!

 

 

 

 

 

 

 

yongheelee1212
Calcite | Level 5

Dear novinosrin

 

Objective : To increment Course of treatment values with certain condition

1. if drug that is not part of previous COT/*what is this first? How is this computed or derived?*/ is observed and

2. the drug is observed within 30 days  /*within 30 days of what????from when/where? */

"

Course of Treatment is defined if the therapy given is not part of the previous course of treatment and if the drug is given at least twice in 30 days. So for instance, at the beginning, course of treatment is none. so first requirement is already satisfied but the drug has to be given at least twice in 30 days. Drug 1 from first and second observation would not be first course of treatment because it does not meet the second requirement. 

So as you can see from the top picture, drug 2(10/30/2012) would be beginning of course of treatment, hence cot1. The drugs observed from 10/30/2012 to 11/30/2012 is drug 2 and drug 3, hence drugs associated with cot1 would be drug2 and drug3). 

cot1 continues until new drug is observed consecutively in 30 days, which is drug4 on 3/5/2013. hence, cot2 starts on 3/5/2013. 

 

I hope that explains better, let me know if you have more question. 

 

So far, I tried to do this, 

1. create Count_30_days (which shows number of drug given within 30 day of the date that drug is given.

 

So my intended strategy is to do:

*** identify whether the therapy given on specific date is part of COT***

part_of_therapy = yes 

if (therapy in(therapy) BETWEEN first.date of COT) and first.date of COT+30ays

*** then based on that do something like ***

course of treatment +1 

if 

course_30_days>=2 (happens at least twice in 30 days) and part_of_therapy=yes (if the therapy is part of the treatment). 

 

To do this, I think I need to know how to identify observations in a variable within specific date range, but that's something I just couldn't figure out no matter what. 

 

 

 

novinosrin
Tourmaline | Level 20

Hi @yongheelee1212   Nothing to apologize. You didn't know. Sorry, I'm about to run to catch my Bus to get home. I'll see if I can get online from home and attempt from home. 

yongheelee1212
Calcite | Level 5

Thank you! 

If you have any other suggestions on how I should approach this, I am open to all ideas! 

have a wonderful Friday!

 

KachiM
Rhodochrosite | Level 12

Hi @yongheelee1212 

 

Something I am missing in ....

 

Course of Treatment is defined if the therapy given is not part of the previous course of treatment and if the drug is given at least twice in 30 days. So for instance, at the beginning, course of treatment is none. so first requirement is already satisfied but the drug has to be given at least twice in 30 days. Drug 1 from first and second observation would not be first course of treatment because it does not meet the second requirement. 

So as you can see from the top picture, drug 2(10/30/2012) would be beginning of course of treatment, hence cot1. The drugs observed from 10/30/2012 to 11/30/2012 is drug 2 and drug 3, hence drugs associated with cot1 would be drug2 and drug3). 

cot1 continues until new drug is observed consecutively in 30 days, which is drug4 on 3/5/2013. hence, cot2 starts on 3/5/2013. 

 

COT1 starts with drug1 on 10/16/2012 continues with drug2 and drug3 with 30-day gaps through 2/5/2013. On 2/12/2013 drug1 comes back. Drug4 joins with drug1 on 2/19/2013.

Questions are:

[1] Hence, COT1 joining with drug4 is deemed to be a new sequence. Is this right?

[2] why do you say Cot2 starts on 3/5/2013 when Drug4 was started earlier on 2/19/2013?

 

yongheelee1212
Calcite | Level 5

Dear datasp. 

 

1, you are correct. 

2. my apology. you are right, cot2 should start on 2/19/2013,, there was an mismatch in original picture and data in text I presented

novinosrin
Tourmaline | Level 20

Hello @yongheelee1212 

 

Let me try to pen my understanding in points and see the flow. 

 

My observation

 

Let's start by  reading  record(obs)  1.

 

1. First off, Date column has the dates when a drug or sometimes more than a 1 drug administered?

2. Point 1 would ideally make sense for the Course_of_treatment sequence to start right from record 1 or in other words from the date of the start of treatment, which you have blank.

3. Drug1 of obs 1 will of course not be part of previous treatment because that's the start.

4. Look up(Look ahead in this case) 30 days from 10/1/2011(inclusive) to (10/1/2011 +30 days) i.e date to  intnx('days',date,30)   precisely. This should be done for each record within a patient ID i.e BY GROUP

a.id =b.id and a.date<=b.date<=intnx('days',a.date,30) in a would be SQL syntax and get all the drugs.

5. Since <= makes it inclusive, if  a.date<=b.date<=intnx('days',a.date,30) is found true 2 times(twice) or more then conditional check 1 is satisfied. Then if any of the Drug found true happens to be part of the previous treatment, club all of the found unique drugs to the previous treatment to which it is found.

6. Repeat the same process for every record until last.patient_id i.e last record of the BY GROUP patient id. 

7. Follow same suit for all Patients or Patient ID aka exclusive BY GROUP.

 

Does this make sense closer to your  need?.

 

Part Logic for count_30 days of a drug for each observation -->

Lets do some initial findings using SQL applying the LOOK UP to get the dates and see if the Drug was found twice or more. This will help us understand whether our thought process in applying the logic for LOOK UP is right or not

 

data have;
input ID Date :mmddyy10. Therapy $;
format date mmddyy10.;
cards;
1 10/1/2011   drug1
1 10/16/2012 drug1
1 10/30/2012 drug2
1 10/30/2012 drug2
1 11/13/2012 drug3
1 11/27/2012 drug2
1 12/11/2012 drug2
1 12/11/2012 drug3
1 12/18/2012 drug2
1 12/24/2012 drug3
1 12/31/2012 drug2
1 1/8/2013 drug3
1 1/15/2013 drug2
1 1/22/2013 drug2
1 1/29/2013 drug3
1 2/5/2013 drug3
1 2/12/2013 drug1
1 2/19/2013 drug4
1 2/26/2013 drug4
1 3/5/2013 drug4
;
/*To Test the look up by remvoe the duplicate dates for the same therapy to 
know if our expected count is right. of course later in our solution code, 
we can include the full data.
Example 1 10/30/2012 drug2
1 10/30/2012 drug2  occurs twice in have 
*/

proc sort data=have out=_have nodupkey;
by id therapy date;
run;

/*please notice the below in the code*/
/*on a.id =b.id and a.date<=b.date<=intnx('days',a.date,30)*/
/*a.therapy=b.therapy as c_to_count.This is boolean expression resulting in 1,0's*/
/*If you sum(c_to_count), you would get count */

proc sql;
create table Count_30 as
select a.*,a.date as start_date format=mmddyy10.,intnx('days',a.date,30) as end_date format=mmddyy10., a.therapy=b.therapy as c_to_count
from _have a left join _have b
on a.id =b.id and a.date<=b.date<=intnx('days',a.date,30)
group by a.id,a.date,a.Therapy
order by a.id,a.date,b.date;
quit;

Now, run the above, look at the results. If this study gets us the right direction, we can apply whether the found Drugs within the 30 period was part of the existing course_of_treatment. That will involve a a look up in all_drug_30 and club these if true or increment a new one. 

 

All these can be done neatly using Hash for a final complete solution. But the above  initial study is just  to see if our understanding of the logic is in the same page if that  helps. In essence, I get the feeling that while I am not understanding well or taking time, the coding part is merely gonna take 5 or 10 mins. 

 

 

 

 

 

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 3400 views
  • 1 like
  • 4 in conversation