BookmarkSubscribeRSS Feed
luch25
Obsidian | Level 7

Hello,

 

I'm looking for some advice on how to go about coding this. I have a dataset with longitudinal measurements of heart rate (HR) and respirations (resp). Each individual (ID) has multiple measurements. The three main things I need to do are:

  1. By unique ID, identify measurements that are taken within 5 minutes of each other
  2. For measurements that are NOT taken within 5 minutes of another measurement, retain the original value of HR and resp.
  3. For measurements that are taken within 5 minutes of each other, apply these rules to produce the final dataset
    1. If there are 2 measurements within the 5-minute interval, take the average of the measurements, as well as the datetime value to produce a new value
    2. If there are >2 measurements within the 5-minute interval, delete the first value, and take the average fo the remaining measurements. 

I've been able to write code for steps 2 and 3a, but having issues where step 1 doesn't run completely correct when there are >2 measurements within a 5 minute interval. And that also affects step 3b. Code for an example dataset, as well as the code I wrote, are provided below. I've also provided a description of "what's going on" for each individual below the code (sorry I can't get the datetime22.3 variable to import into SAS correctly using this mock dataset):

 

data have;
      input id datetime hr resp;
      format datetime datetime22.3;
      cards;
1 19MAR2009:13:15:00.000 90 16
1 19MAR2009:13:17:00.000 70 18
1 19MAR2009:13:18:00.000 80 17
1 19MAR2009:13.22:00.000 100 22
1 23MAR2009:10:36:00.000 83 21
2 29OCT2009:10:36:00.000 121 13
2 29OCT2009:10:38:00.000 83 14
2 08JUL2011:13:04:00.000 131 18
3 07SEP2016:14:26:59.000 50 12
3 06APR2017:13:39:00.000 76 18
3 08JUL2017:13:04:00.000 98 17
3 08JUL2017:13:08:00.000 96 18
3 24OCT2017:15:11:00.000 80 19
4 18APR2012:08:42:00.000 80 17
4 31OCT2012:10:36:00.000 96 19
4 31OCT2012:10:37:00.000 103 21
4 31OCT2012:10:38:00.000 105 25
4 31OCT2012:10:39:00.000 110 28
5 06JAN2008:05:37:00.000 98 17
5 15FEB2010:10:24:00.000 110 18
;
 
/*Take avg of measurements taken within 5 minutes of each other*/
DATA have2;                
SET have;                       
  _n_ = lag (datetime);               
  IF nmiss (_n_) THEN result = _n_;
    ELSE diff_datetime = _n_ - datetime;
  hr2 = lag(hr);
  resp2 = lag(resp);
  IF -300 <= diff_datetime <=0 THEN hr = ((hr + hr2)/2);
    ELSE IF diff_datetime <= -300 THEN hr_final = hr;
  IF -300 <= diff_datetime <=0 THEN resp_final = ((resp + resp2)/2);
    ELSE IF diff_datetime <= -300 THEN resp_final = resp;
  DROP result hr2 resp2;
RUN;  
 
data have3; set have2(keep=id diff_datetime); run;
 
/*Delete the preceding value after taking the average*/
data have2;
set have3;
  by id;
  if not first.id and -300 <= diff_datetime <=0 then do;
  point = _n_-1;
  modify have2 point=point;
  remove;
end;
run;

 

  • ID=1 has two "5 minute intervals". The first 3 measurements make up one interval (delete the one at 13:15 and take the average of 13:17 and 13:18. Then the 2nd, 3rd, and 4th measurements make up a second interval (delete the one at 13:17, and take the average of 13:18 and 13:22). 
  • ID=2 has one "5 minute interval" with 2 measurements, so take the average of these two
  • ID=3 has two measurements within a "5 minute interval"
  • ID=4 has 4 measurements within a "5 minute interval". Delete the one at 10:36 and take the average of the remaining ones
  • ID=5 does not have any measurements within a 5 minute interval
  • And as you can see, I want to retain measurements even if they are not in the 5 minute interval!

Applying these rules, this is what I envision the final dataset looking like:

ID Datetime (format=datetime22.3) HR_final Resp_final
1 19MAR2009:13:17:30.000 75 17.5
1 19MAR2009:13.19:00.000 83.3 19
1 23MAR2009:10:36:00.000 83 21
2 29OCT2009:10:37:00.000 102 13.5
2 08JUL2011:13:04:00.000 131 18
3 07SEP2016:14:26:59.000 50 12
3 06APR2017:13:39:00.000 76 18
3 08JUL2017:13:06:00.000 97 17.5
3 24OCT2017:15:11:00.000 80 19
4 18APR2012:08:42:00.000 80 17
4 31OCT2012:10:37:30.000 103.5 23.25
5 06JAN2008:05:37:00.000 98 17
5 15FEB2010:10:24:00.000 110

18

 

Any advice on how to code this would be greatly appreciated! If that means taking my code in a different direction, that's totally okay. Thank you in advance!

2 REPLIES 2
ballardw
Super User

First thing I would be tempted to do if this were my data would be to remove showing the artificial precision of the time portion of that value. You are recording to 1000ths of a second. Since your data is heart rate and respiration neither of those measures is started and completed within an interval that short. Basically that would mean changing the  format to a datetime with no decimal portion at all. Using INFORMATS with decimals indicated may be a poor idea as well.

Your data step doesn't run as posted as you did not provide and Informat to read the data and the 4th record has a . instead of : after the hour component.

 

I would suggest that since the data step wants to use _n_ as an automatic variable it can lead to odd things when you create a variable by that name.

 

See if this creates the basic 5 minute intervals. All I am doing is incrementing a counter at the start of each new interval when the time is greater than 5 minutes from the previous measure for the same ID.

data have;
      input id datetime : datetime23. hr resp;
      format datetime datetime22.;
      cards;
1 19MAR2009:13:15:00.000 90 16
1 19MAR2009:13:17:00.000 70 18
1 19MAR2009:13:18:00.000 80 17
1 19MAR2009:13:22:00.000 100 22
1 23MAR2009:10:36:00.000 83 21
2 29OCT2009:10:36:00.000 121 13
2 29OCT2009:10:38:00.000 83 14
2 08JUL2011:13:04:00.000 131 18
3 07SEP2016:14:26:59.000 50 12
3 06APR2017:13:39:00.000 76 18
3 08JUL2017:13:04:00.000 98 17
3 08JUL2017:13:08:00.000 96 18
3 24OCT2017:15:11:00.000 80 19
4 18APR2012:08:42:00.000 80 17
4 31OCT2012:10:36:00.000 96 19
4 31OCT2012:10:37:00.000 103 21
4 31OCT2012:10:38:00.000 105 25
4 31OCT2012:10:39:00.000 110 28
5 06JAN2008:05:37:00.000 98 17
5 15FEB2010:10:24:00.000 110 18
;

data want;
   set have;
   by id ;
   retain timegroup;
   interval = intck('minute',lag(datetime),datetime,'C');
   if first.id then timegroup=1;
   else if interval>5 then timegroup+1;
run;
   

If you agree this is marking the intervals correctly then we can get to the next bits.

Note the use of the Function INTCK to deal with calculating the interval between successive records.

 

A similar counter could be used to count number of values within a timegroup. You apparently want a "look ahead counter" but I am not sure that you have actually adequately defined your "5 minute interval" to implement such.

 

Consider this data example and then tell us where the "5 minute interval boundaries" actually are.

data have2;
      input id datetime : datetime23. hr resp;
      format datetime datetime22.;
      cards;
1 19MAR2009:13:15:00.000 90 16
1 19MAR2009:13:17:00.000 70 18
1 19MAR2009:13:18:00.000 80 17
1 19MAR2009:13:22:00.000 100 22
1 19MAR2009:13:23:00.000 100 22
1 19MAR2009:13:24:00.000 100 22
1 19MAR2009:13:25:00.000 100 22
1 19MAR2009:13:26:00.000 100 22
1 19MAR2009:13:27:00.000 100 22
1 19MAR2009:10:28:00.000 83 21
;

I suspect once you get the timegroup values identified it would likely be a tad easier to us Proc Means to the actual averages instead of mucking about with data step code. Use the ID and Timegroup as CLASS variables and get the means of the measures.

 

 

luch25
Obsidian | Level 7

Hi @ballardw , thank you so much for the help! Yes, the code works to identify the intervals, but I think your data example shows why it's a bit more complicated. In your data example, there are actually multiple 5-minute windows. Observations 1, 2, 3 make up a single interval (times 13:15, 13:17, 13:18) and could be thought of as a timegroup=1. And then after that, another 5-minute interval begins (since the first started with time 13:15). Observations 4-8 make up another time interval (timegroup=2). And then observations 9-10 make up a timegroup=3 interval. Totally agree that after the timegroup variable is created, I can count the number of observations within a unique timegroup variable and then take the mean. 

 

(As a note: I initially had a bit more complex situation in my first post where the time intervals could overlap, but I think this method would help simplify things. So my initial example won't directly align with this.)

 

Thank you again!!

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 517 views
  • 1 like
  • 2 in conversation