Hi all,
I have some repeated measures data where each ID could potentially have 2 rows of data. A sample is shown below.
ID
Time
Data
1234
T1
1
1234
T2
1
5555
T1
0
6777
T2
.
9876
T1
0
9876
T2
0
1000
T1
1
2000
T2
1
8888
T1
1
8888
T2
.
9000
T1
0
9000
T2
.
1010
T1
.
1010
T2
.
I'm trying to reduce each ID down to 1 observation each. The criteria is that I'm prioritizing their data on Time 2 (Time=T2). So if they have non-missing data on T2, I pick that row as that ID's observation.
I used the code below to re-sort the data in descending "Time", so that the row where Time=T2 appears first. After that, it's easy for me to just take the first row (which should be the T2 row if they have 2 rows of data) as the observation for each ID.
proc sort data=test;
by ID descending time;
run;
data want;
set test;
by ID;
if first.ID;
run;
My issue now are the data where they have missing data on T2, but not on T1. Looking at the sample data, if I proceeded with my code above ^, the ID "9000" (as well as ID 8888), would have their T2 observation picked where "Data" is missing. However, when you look at their T1 observation, ID 9000 has existing Data in T1 that's "0".
Non-missing data would mean like any value that's not ".". Therefore, Data=0 would count as non-missing data.
Basically, if the ID has 2 rows, prioritize picking their T2 row - if there's data. If there isn't, look at their T1 row - if there is data, pick that row instead. If all rows for Data are missing for that ID, then stick with picking their T2 row.
I'd like to hopefully produce a table like the one below:
ID
Time
Data
1234
T2
1
5555
T1
0
6777
T2
.
9876
T2
0
1000
T1
1
2000
T2
1
8888
T1
1
9000
T1
0
1010
T2
.
Any help would be very much appreciated, thanks.
... View more