BookmarkSubscribeRSS Feed
Alireza_Boloori
Fluorite | Level 6

Hello everyone,

 

I have a dataset where each patient (represented by ID) is prescribed a medication (represented by dose) over different time periods. Here is an example of my data, where for particular purposes each patient does not have a fixed dose on each day:

iddatedose
11/17/20087.5
12/5/20088
15/7/20085
16/15/20087
22/2/200810
23/17/200810

I also have another dataset of patients experiencing events (i.e., hospital readmission) at different times. Here is an example:

iddate1date2date3date4date5date20
12/20/20084/17/2008
24/27/20086/7/20089/11/2008
       

where

  • each patient might experience different number of events, and
  • the maximum number of events occurred for any patient in my database is 20 (that's why I have "data20" as my last variable in the second table).

Now, for each patient, I want to measure the average dose used between each two consecutive events. Here are a couple of points:

  • The average is measured by taking the cumulative dose in that period and dividing it by the total number of days in that period.
  • If an event occurs for the first time, then two conditions may happen: whether the event occurs before or after the time of the first dose. For the former, the average dose in that period will be 0. For the latter, the average will be measured from the time of the first dose until that event.
  • It's possible that, for a patient, no event occurs. In this case, I take the average for the period between the first and last dose.

Finally, I'd be very thankful if the code can accommodate reading two separate data files (like the tables above).

 

Thank you very much in advance!

12 REPLIES 12
Reeza
Super User

Please post data that we can use and the expected output that lines up with that data. 

Basically an replicable example that we can use to write the code. I'm not making up data AND answering your question 😉

Alireza_Boloori
Fluorite | Level 6
@Reeza Thanks! The dataset is very large (50+ million observations). So, I was wondering how much of it would be required for you? Please let me know.
Reeza
Super User

The bare minimum needed to replicate your issue. You know your data, I don't. 

 

It might be a single record for one person - make sure you de-identify it and don't include variables that aren't relevant to the problem. 


See this post on how to post sample data as well:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Alireza_Boloori
Fluorite | Level 6

Here is a part of the data (I changed it for more clarification). The data that includes the time along with dose for each patient looks like this:

 

data have1;
input id date drug_dose;
datalines;
1 2008-01-17 7.5
1 2008-01-18 14.7
1 2008-01-19 21.5
1 2008-01-20 27.3
1 2008-01-21 32.1
1 2008-01-22 35.9
1 2008-01-23 38.6
1 2008-01-24 40.6
1 2008-01-25 41.8
1 2008-01-26 42.6
1 2008-03-01 129.4
1 2008-03-02 107.7
1 2008-03-03 86.2
1 2008-03-04 34.4
1 2008-03-05 9.1
2 2008-07-07 30
2 2008-07-08 59.1
2 2008-07-09 86
2 2008-07-10 109.4
2 2008-07-11 128.6
2 2008-07-12 143.6
2 2008-10-23 15
2 2008-10-24 29.5
2 2008-10-25 43
2 2008-10-26 54.7
2 2008-10-27 64.3
;

Also, the second dataset which contains the information about time of different events looks like this:

 

data have2;
input id date1 date2 date3 date3 date4 date5;
datalines;
1 2008-2-6 2008-3-19
2 2008-5-9 2008-8-1 2008-9-5
3 2008-4-3 2008-6-5
;

 

Now, the data I want to have looks like this:

 

data want;
input id avg_dose time_start time_stop;
datalines;
1 15.13 2008-1-17 2008-2-6 
1 13.585 2008-2-7 2008-3-5
1 0 2008-3-6 2008-3-19
2 0 2008-5-9 2008-7-6
2 12.268 2008-7-7 2008-8-1
2 0 2008-8-2 2008-9-5
2 4.05 2008-9-6 2008-10-27
;

For example, 15.13 in the first row is obtained as 302.6/20, where

  • 302.6 is the sum of dose values for patient #1 from 2008-1-17 until 2008-1-26, and
  • 20 is the total number of days from 2008-1-17 until 2008-2-6.

I hope that this would be helpful. Once again, thank you for your help!

 

 

Alireza_Boloori
Fluorite | Level 6

Since there hasn't been any solution for this question, can I repost it?

Reeza
Super User

Feel free. Your question isn't clear, which is why I didn't try and answer it. 

 

  • 302.6 is the sum of dose values for patient #1 from 2008-1-17 until 2008-1-26, <- where do the dates come from?

What are the rules for determining the dates?

Alireza_Boloori
Fluorite | Level 6

For patient #1, the first event happened on 2008-2-6. Also, for this patient, the medication was prescribed for the first time on 2008-1-17 and lasted until 2008-1-26. So, the cumulative dose in these 10 days was 302.6. However, as I said in my original post, I want to return the average dose used between each two consecutive event dates. That's why I divide 302.6 by 20 (which is the time period from 2008-1-17 until 2008-2-6).

Reeza
Super User

Ok, that clarifies things. The explanation made it seem more complex. 

 

1. Create an episode variable that defines an episode for each ID. For example, 2008-1-17 to 2008-1-26 would be a single episode. 

You can use the DIF() function for this to calculate the EPISODE. 

 

2. Use PROC MEANS with ID and EPISODE (from #1)

 

3. Merge it with your second data set

 

If you can't figure out a step, post your code and log and we can help from there. 

Alireza_Boloori
Fluorite | Level 6

Thanks Reeza! I came up with this approach (which may be slightly different with what you recommended).

 

I merged two data sets like this (I'm showing the results for only patient #1):

data have;
input id date drug_dose indicator;
datalines;
1 2008-01-17 7.5 . 1 2008-01-18 14.7 . 1 2008-01-19 21.5 . 1 2008-01-20 27.3 . 1 2008-01-21 32.1 . 1 2008-01-22 35.9 . 1 2008-01-23 38.6 . 1 2008-01-24 40.6 . 1 2008-01-25 41.8 . 1 2008-01-26 42.6 .
1 2008-02-06 . event 1 2008-03-01 129.4 . 1 2008-03-02 107.7 . 1 2008-03-03 86.2 . 1 2008-03-04 34.4 . 1 2008-03-05 9.1 .
1 2008-03-19 . event
;

where I added a new variable (i.e., indicator) that puts "event" whenever an event occurs. 

 

Now, my questions is that how can I get an output like this:

data want;
input id avg_dose time_start time_stop;
datalines;
1 15.13 2008-1-17 2008-2-6 1 13.585 2008-2-7 2008-3-5 1 0 2008-3-6 2008-3-19
;

 where "avg_dose" is obtained as the ratio of total cumulative dose between each two events over the total number of days spanned between these two events.

Reeza
Super User

Your data structure doesn't lend itself very well to that type of summary, ergo my recommendation. 

 

With your approach you'll need a data step and to calculate the averages and such manually - this is prone to error and not very flexible for adding further statistics. If you decide you want the standard deviation and max/min dose that's all a manual calculation. 

 

Here's an example of this with the sample data you provided above.

 

delete_sample_summary.JPG

 

 

Alireza_Boloori
Fluorite | Level 6

I don't understand. Why should I merge "episodes" with the second data? What is the purpose of "proc means"?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4234 views
  • 0 likes
  • 2 in conversation