Hi all,
I'm working on a dataset where some IDs have repeated measurements over time. A template example is shown below. The data is displayed in chronological order (i.e. time 1, time 2, time 3, etc...):
ID | Data |
AAA | 14.00 |
AAA | 13.00 |
BB | 18.00 |
BB | 21.00 |
CC | 19.00 |
CC | 14.00 |
CC | 10.00 |
DDD | 9.00 |
DDD | 12.00 |
EE | 11.00 |
EE | 17.00 |
EE | 10.00 |
Each data obtained per ID was collected longitudinally, meaning that the data is in chronological order. This is important since I want to only obtain those ID's where the data was < 15.00 between two-points back-to-back.
As you can see, "AAA" would count since between time 1 and time 2, the data taken indicates that its < 15.00.
The ID "CC" on the other hand has 3 data points (19.00, 14.00, and 10.00). In this case, the cut-off (< 15.00) would apply for this subject since at Time 2 (data = 14.00) and Time 3 (10.00), were obtained right after the other.
On the other hand, the ID "EE" also has 3 data points, but the cut-off (<15.00) applies for Time 1 (data = 11.00) and Time 3 (data = 10.00), therefore, this ID would not meet the requirement of (< 15.00) AND that they have to be right after each other.
I would like to subset the subjects so that I end up with something like this:
ID | Data |
AAA | 14.00 |
AAA | 13.00 |
CC | 19.00 |
CC | 14.00 |
CC | 10.00 |
DDD | 9.00 |
DDD | 12.00 |
As you can see, only those ID's where the cut-off was <15.00 and where two observations were identified as <15.00 right after the other (in the case of "CC") were drawn. Important that in the case for "CC", the first time point where data = 19.00 was still drawn since I'm less concerned about the raw numbers per se, but more so that this ID at any point met the criteria of <15.00 in two subsequent time points.
I've attempted to run the code below, but it seems that this doesn't achieve the restriction that I wanted to do:
data test;
set master;
if ID ne (0 < data < 15.0) then delete;
run;
Any help would be very much appreciated!
Thanks,
AG