Hello all,
I'm looking for some advice on how to go about coding this. I have a dataset with longitudinal measurements of heart rate (HR) and respirations (resp). Each individual (ID) has multiple measurements. The three main things I need to do are:
By unique ID, flag measurements that are taken within 5 minutes of each other (flag=1)
For measurements that are NOT taken within 5 minutes of another measurement, retain the original value of HR and resp.
For measurements that are taken within 5 minutes of each other, apply these rules to produce the final dataset
If there are 2 measurements within the 5-minute interval, take the average of the measurements, as well as the datetime value to produce a new value
If there are >2 measurements within the 5-minute interval, delete the first value, and take the average fo the remaining measurements.
Code for an example dataset are provided below. I've also provided a description of "what's going on" for each individual below the code.
data have;
input id datetime hr resp;
format datetime datetime22.3.;
cards;
1 19MAR2009:13:15:00.000 90 16
1 19MAR2009:13:17:00.000 70 18
1 19MAR2009:13:18:00.000 80 17
1 19MAR2009:13.22:00.000 100 22
1 23MAR2009:10:36:00.000 83 21
2 29OCT2009:10:36:00.000 121 13
2 29OCT2009:10:38:00.000 83 14
2 08JUL2011:13:04:00.000 131 18
3 07SEP2016:14:26:59.000 50 12
3 06APR2017:13:39:00.000 76 18
3 08JUL2017:13:04:00.000 98 17
3 08JUL2017:13:08:00.000 96 18
3 24OCT2017:15:11:00.000 80 19
4 18APR2012:08:42:00.000 80 17
4 31OCT2012:10:36:00.000 96 19
4 31OCT2012:10:37:00.000 103 21
4 31OCT2012:10:38:00.000 105 25
4 31OCT2012:10:39:00.000 110 28
5 06JAN2008:05:37:00.000 98 17
5 15FEB2010:10:24:00.000 110 18
;
ID=1 has two "5 minute intervals". The first 3 measurements make up one interval (delete the one at 13:15 and take the average of 13:17 and 13:18. Then the 2nd, 3rd, and 4th measurements make up a second interval (delete the one at 13:17, and take the average of 13:18 and 13:22).
ID=2 has one "5 minute interval" with 2 measurements, so take the average of these two
ID=3 has two measurements within a "5 minute interval"
ID=4 has 4 measurements within a "5 minute interval". Delete the one at 10:36 and take the average of the remaining ones
ID=5 does not have any measurements within a 5 minute interval
And as you can see, I want to retain measurements even if they are not in the 5 minute interval (flag=0)!
Applying these rules, this is what I envision the final dataset looking like:
ID
Datetime (format=datetime22.3)
HR2
Resp2
Flag
1
19MAR2009:13:17:30.000
75
17.5
1
1
19MAR2009:13.19:00.000
83.3
19
1
1
23MAR2009:10:36:00.000
83
21
0
2
29OCT2009:10:37:00.000
102
13.5
1
2
08JUL2011:13:04:00.000
131
18
0
3
07SEP2016:14:26:59.000
50
12
0
3
06APR2017:13:39:00.000
76
18
0
3
08JUL2017:13:06:00.000
97
17.5
1
3
24OCT2017:15:11:00.000
80
19
0
4
18APR2012:08:42:00.000
80
17
0
4
31OCT2012:10:37:30.000
103.5
23.25
1
5
06JAN2008:05:37:00.000
98
17
0
5
15FEB2010:10:24:00.000
110
18
0
Any advice on how to handle this would be hugely appreciated! Thank you in advance!
... View more