Hello, I'm hoping someone can point me in the right direction. I have a dataset containing up to 10 test types (in no particular order) and the corresponding test dates and test results. Unfortunately, something went wrong when our participants were uploading the data, and many tests were uploaded multiple times. I need to identify the records with duplicate entries (where the same test/date/result was entered more than once), and it would be good if I could identify how many duplicates (a count) for each record. As an example, I have: patID testID_1 date_1 result_1 testID_2 date_2 result_2 testID_3 date_3 result_3 testID_4 date_4 result_4 1243 1 01/01/20 P 2 01/12/20 N 1 01/01/20 P 1 01/01/20 P 6495 4 05/23/20 N 1 03/04/20 P 1 03/13/20 N 6 10/12/20 P 4712 3 07/23/20 P 3 07/23/20 P 1 06/15/20 P 6 05/17/20 N 3021 2 06/09/20 N 1 06/12/20 N 4 12/01/20 P 1 06/12/20 N 8537 5 04/01/20 P 5 04/01/20 P 1 07/18/20 N 3 06/05/20 P I want: patID testID_1 date_1 result_1 testID_2 date_2 result_2 testID_3 date_3 result_3 testID_4 date_4 result_4 dup_cnt 1243 1 01/01/20 P 2 01/12/20 N 1 01/01/20 P 1 01/01/20 P 3 6495 4 05/23/20 N 1 03/04/20 P 1 03/13/20 N 6 10/12/20 P 0 4712 3 07/23/20 P 3 07/23/20 P 1 06/15/20 P 6 05/17/20 N 2 3021 2 06/09/20 N 1 06/12/20 N 4 12/01/20 P 1 06/12/20 N 2 8537 5 04/01/20 P 5 04/01/20 P 1 07/18/20 N 3 06/05/20 P 2 I've created concatenated strings for each test (e.g., 1_01/01/20_P or 1_01/12/20_N) for comparison; however, that's still 10 variables that have to be compared in combination. I started to try proc compare, but I would have to list out 45 combinations (I think) and I'm still not sure how to get the count for each record. Are there better, more efficient, approaches to do this? There are lots of posts on looking for duplicates across rows, but I'm struggling to find information on identifying duplicates across columns, but within the row. Can anyone point me in the right direction? Thank you, -EM Code for example "have" table above if anyone wants it. data test;
input patID testID_1 date_1 $ result_1 $
testID_2 date_2 $ result_2 $
testID_3 date_3 $ result_3 $
testID_4 date_4 $ result_4 $;
datelines;
1243 1 01/01/2022 P 2 01/12/2022 N 1 01/01/2022 P 1 01/01/2022 P
6495 4 05/23/2022 N 1 03/04/2022 P 1 03/13/2022 N 6 10/12/2022 P
4712 3 07/23/2022 P 3 07/23/2022 P 1 06/15/2022 P 6 05/17/2022 N
3021 2 06/09/2022 N 1 06/12/2022 N 4 12/01/2022 P 1 06/12/2022 N
8537 5 04/01/2022 P 5 04/01/2022 P 1 07/18/2022 N 3 06/05/2022 P
;
run;
data test_comp; set test;
test_str_1=catx('_', OF testID_1--result_1);
test_str_2=catx('_', OF testID_2--result_2);
test_str_3=catx('_', OF testID_3--result_3);
test_str_4=catx('_', OF testID_4--result_4);
run;
... View more