I have a messy dataset where I have up to 16 iterations of a variable Result1-Result16. Some people in the dataset only have a few iterations, some have 16 and everywhere in between. I'm trying to create a result summary variable that takes into account each of these versions. The Result variable is in number form and I have a format for it that interprets what the numbers mean, I'm not sure if that matters for the coding. When interpreting the result options into the SummaryResult variable I'd like to create, some only match up with one interpretation but others should be grouped together since they mean essentially the same thing for the SummaryResult variable. For example:
A result of 100 means UNKNOWN
A result of 200 means NO
But then a result of 300, 400, 500, etc means YES
Many are also missing with . as the result, so the ResultSummary should be . also
To complicate matters, I can't just look at the last/highest iteration of Result because the correct/best ResultSummary may not be there, it could be the first or middle or last. The ResultSummary should be determined by a hierarchy:
If there are any results that mean YES, the ResultSummary variable should be YES.
If there are no YES results, but there is at least one NO, the ResultSummary variable should be NO.
If there are no YES or NO results, but there is at least one UNKNOWN, the ResultSummary variable should be UNKNOWN.
If . is the only result for all iterations of Result for that person, the ResultSummary should be . (missing).
Here is a fake example:
Person
Result1
Result2
Result3
Result4
Result5
SummaryResult
111
.
.
100
.
100
UNKNOWN
112
200
200
.
.
500
YES
113
.
.
100
200
200
NO
114
.
.
.
.
.
.
115
.
300
100
100
400
YES
I know only basics about assays from a class about 6 years ago and I just can't wrap my head around how I could get it to check each iteration while not keeping the last version instead of the best summary based on the hierarchy.
Thank you for any help!!
... View more