Hello,
I'm looking for some advice on how to go about coding this. I have a dataset with longitudinal measurements of heart rate (HR) and respirations (resp). Each individual (ID) has multiple measurements. The three main things I need to do are:
I've been able to write code for steps 2 and 3a, but having issues where step 1 doesn't run completely correct when there are >2 measurements within a 5 minute interval. And that also affects step 3b. Code for an example dataset, as well as the code I wrote, are provided below. I've also provided a description of "what's going on" for each individual below the code (sorry I can't get the datetime22.3 variable to import into SAS correctly using this mock dataset):
data have;
input id datetime hr resp;
format datetime datetime22.3;
cards;
1 19MAR2009:13:15:00.000 90 16
1 19MAR2009:13:17:00.000 70 18
1 19MAR2009:13:18:00.000 80 17
1 19MAR2009:13.22:00.000 100 22
1 23MAR2009:10:36:00.000 83 21
2 29OCT2009:10:36:00.000 121 13
2 29OCT2009:10:38:00.000 83 14
2 08JUL2011:13:04:00.000 131 18
3 07SEP2016:14:26:59.000 50 12
3 06APR2017:13:39:00.000 76 18
3 08JUL2017:13:04:00.000 98 17
3 08JUL2017:13:08:00.000 96 18
3 24OCT2017:15:11:00.000 80 19
4 18APR2012:08:42:00.000 80 17
4 31OCT2012:10:36:00.000 96 19
4 31OCT2012:10:37:00.000 103 21
4 31OCT2012:10:38:00.000 105 25
4 31OCT2012:10:39:00.000 110 28
5 06JAN2008:05:37:00.000 98 17
5 15FEB2010:10:24:00.000 110 18
;
/*Take avg of measurements taken within 5 minutes of each other*/
DATA have2;
SET have;
_n_ = lag (datetime);
IF nmiss (_n_) THEN result = _n_;
ELSE diff_datetime = _n_ - datetime;
hr2 = lag(hr);
resp2 = lag(resp);
IF -300 <= diff_datetime <=0 THEN hr = ((hr + hr2)/2);
ELSE IF diff_datetime <= -300 THEN hr_final = hr;
IF -300 <= diff_datetime <=0 THEN resp_final = ((resp + resp2)/2);
ELSE IF diff_datetime <= -300 THEN resp_final = resp;
DROP result hr2 resp2;
RUN;
data have3; set have2(keep=id diff_datetime); run;
/*Delete the preceding value after taking the average*/
data have2;
set have3;
by id;
if not first.id and -300 <= diff_datetime <=0 then do;
point = _n_-1;
modify have2 point=point;
remove;
end;
run;
Applying these rules, this is what I envision the final dataset looking like:
ID | Datetime (format=datetime22.3) | HR_final | Resp_final |
1 | 19MAR2009:13:17:30.000 | 75 | 17.5 |
1 | 19MAR2009:13.19:00.000 | 83.3 | 19 |
1 | 23MAR2009:10:36:00.000 | 83 | 21 |
2 | 29OCT2009:10:37:00.000 | 102 | 13.5 |
2 | 08JUL2011:13:04:00.000 | 131 | 18 |
3 | 07SEP2016:14:26:59.000 | 50 | 12 |
3 | 06APR2017:13:39:00.000 | 76 | 18 |
3 | 08JUL2017:13:06:00.000 | 97 | 17.5 |
3 | 24OCT2017:15:11:00.000 | 80 | 19 |
4 | 18APR2012:08:42:00.000 | 80 | 17 |
4 | 31OCT2012:10:37:30.000 | 103.5 | 23.25 |
5 | 06JAN2008:05:37:00.000 | 98 | 17 |
5 | 15FEB2010:10:24:00.000 | 110 |
18 |
Any advice on how to code this would be greatly appreciated! If that means taking my code in a different direction, that's totally okay. Thank you in advance!
First thing I would be tempted to do if this were my data would be to remove showing the artificial precision of the time portion of that value. You are recording to 1000ths of a second. Since your data is heart rate and respiration neither of those measures is started and completed within an interval that short. Basically that would mean changing the format to a datetime with no decimal portion at all. Using INFORMATS with decimals indicated may be a poor idea as well.
Your data step doesn't run as posted as you did not provide and Informat to read the data and the 4th record has a . instead of : after the hour component.
I would suggest that since the data step wants to use _n_ as an automatic variable it can lead to odd things when you create a variable by that name.
See if this creates the basic 5 minute intervals. All I am doing is incrementing a counter at the start of each new interval when the time is greater than 5 minutes from the previous measure for the same ID.
data have; input id datetime : datetime23. hr resp; format datetime datetime22.; cards; 1 19MAR2009:13:15:00.000 90 16 1 19MAR2009:13:17:00.000 70 18 1 19MAR2009:13:18:00.000 80 17 1 19MAR2009:13:22:00.000 100 22 1 23MAR2009:10:36:00.000 83 21 2 29OCT2009:10:36:00.000 121 13 2 29OCT2009:10:38:00.000 83 14 2 08JUL2011:13:04:00.000 131 18 3 07SEP2016:14:26:59.000 50 12 3 06APR2017:13:39:00.000 76 18 3 08JUL2017:13:04:00.000 98 17 3 08JUL2017:13:08:00.000 96 18 3 24OCT2017:15:11:00.000 80 19 4 18APR2012:08:42:00.000 80 17 4 31OCT2012:10:36:00.000 96 19 4 31OCT2012:10:37:00.000 103 21 4 31OCT2012:10:38:00.000 105 25 4 31OCT2012:10:39:00.000 110 28 5 06JAN2008:05:37:00.000 98 17 5 15FEB2010:10:24:00.000 110 18 ; data want; set have; by id ; retain timegroup; interval = intck('minute',lag(datetime),datetime,'C'); if first.id then timegroup=1; else if interval>5 then timegroup+1; run;
If you agree this is marking the intervals correctly then we can get to the next bits.
Note the use of the Function INTCK to deal with calculating the interval between successive records.
A similar counter could be used to count number of values within a timegroup. You apparently want a "look ahead counter" but I am not sure that you have actually adequately defined your "5 minute interval" to implement such.
Consider this data example and then tell us where the "5 minute interval boundaries" actually are.
data have2; input id datetime : datetime23. hr resp; format datetime datetime22.; cards; 1 19MAR2009:13:15:00.000 90 16 1 19MAR2009:13:17:00.000 70 18 1 19MAR2009:13:18:00.000 80 17 1 19MAR2009:13:22:00.000 100 22 1 19MAR2009:13:23:00.000 100 22 1 19MAR2009:13:24:00.000 100 22 1 19MAR2009:13:25:00.000 100 22 1 19MAR2009:13:26:00.000 100 22 1 19MAR2009:13:27:00.000 100 22 1 19MAR2009:10:28:00.000 83 21 ;
I suspect once you get the timegroup values identified it would likely be a tad easier to us Proc Means to the actual averages instead of mucking about with data step code. Use the ID and Timegroup as CLASS variables and get the means of the measures.
Hi @ballardw , thank you so much for the help! Yes, the code works to identify the intervals, but I think your data example shows why it's a bit more complicated. In your data example, there are actually multiple 5-minute windows. Observations 1, 2, 3 make up a single interval (times 13:15, 13:17, 13:18) and could be thought of as a timegroup=1. And then after that, another 5-minute interval begins (since the first started with time 13:15). Observations 4-8 make up another time interval (timegroup=2). And then observations 9-10 make up a timegroup=3 interval. Totally agree that after the timegroup variable is created, I can count the number of observations within a unique timegroup variable and then take the mean.
(As a note: I initially had a bit more complex situation in my first post where the time intervals could overlap, but I think this method would help simplify things. So my initial example won't directly align with this.)
Thank you again!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.