Is there an easy way to create a variable called match which will list all the IDs that are the same? The data is in the format below. Any help would be greatly appreciated.
ID_A ID_B ID_C ID_D MATCH
1 1 1 1 A,B,C,D
2 2 2 3 A,B,C
3 4 4 4 B,C,D
1 1 3 5 A.B
What if you have a combination of
1 1 3 3
?
Steps to follow:
Create a variable to uniquely identify the observations (e.g. derived from _N_).
Transpose the ID: variables, use the above variable as BY.
Sort by this unique id, COL1 and _NAME_;
Then
data want;
set long;
by unique_id col1;
length match $10;
retain match count;
if first.col1
then do;
match = scan(_name_,2,"_");
count = 1;
end;
else do;
match = catx(",",match,scan(_name_,2,"_"));
count + 1;
end;
if last.col1 and count > 1;
drop count _name_ /* col1 */;
run;
Like this?
data SAMPLE;
A=1; B=1; C=1; D=3; output;
A=1; B=2; C=2; D=2; output;
A=1; B=2; C=3; D=4; output;
run;
data WANT;
set SAMPLE;
_POS = prxmatch('/(.)\1/', cats(A,B,C,D)); * Find the position of a repeated value;
if _POS then do; * If found then analyse;
_CHAR = char(cats(A,B,C,D), _POS); * Extract repeated value;
MATCH = catx(',', ifc(put(A,1.)=_CHAR,'A','')
, ifc(put(B,1.)=_CHAR,'B','')
, ifc(put(C,1.)=_CHAR,'C','')
, ifc(put(D,1.)=_CHAR,'D','')
); * Extract occurrences;
end;
drop _:;
run;
This creates this table:
A | B | C | D | MATCH |
---|---|---|---|---|
1 | 1 | 1 | 3 | A,B,C |
1 | 2 | 2 | 2 | B,C,D |
1 | 2 | 3 | 4 |
If the values are longer than a digit, or not contiguous:
data SAMPLE;
A=11; B=11; C=11; D= 3; output;
A= 1; B=22; C= 4; D=22; output;
A= 1; B= 2; C= 3; D= 4; output;
run;
data WANT;
set SAMPLE;
call sort (of A--D);
_STR = catx('|', of A--D, .);
_VALUE = prxchange('s/.*((\d+)\|)\1.*/\2/',1, _STR);
set SAMPLE;
if _VALUE ne _STR then do;
MATCH = catx(',', ifc(put(A,8. -l)=_VALUE, 'A', '')
, ifc(put(B,8. -l)=_VALUE, 'B', '')
, ifc(put(C,8. -l)=_VALUE, 'C', '')
, ifc(put(D,8. -l)=_VALUE, 'D', '')
);
end;
drop _: ;
run;
A | B | C | D | MATCH |
---|---|---|---|---|
11 | 11 | 11 | 3 | A,B,C |
1 | 22 | 4 | 22 | B,D |
1 | 2 | 3 | 4 |
The regular expression
.*((\d+)\|)\1.*/\2/
means:
/ start match pattern
.* any character(s), then
( start a group, then
(\d+) another ,group with one digit or more, then
\| a pipe character, then
) close group, then
/1 repeat the group just matched (so we want the same value twice), then
.* any character(s)
/ end match pattern, start replace pattern
\1 replace matched characters with 2nd (inner) group: (\d+)
/ end of replace pattern
You can shorten to something more compact if you want:
data WANT;
set SAMPLE;
call sort (of A--D);
_VALUE = prxchange('s/.*((\d+)\|)\1.*/\2/',1, catx('|', of A--D, .));
set SAMPLE;
MATCH = catx(',', ifc(put(A, 8. -l)=_VALUE, 'A', '')
, ifc(put(B, 8. -l)=_VALUE, 'B', '')
, ifc(put(C, 8. -l)=_VALUE, 'C', '')
, ifc(put(D, 8. -l)=_VALUE, 'D', '')
);
drop _: ;
run;
or
data WANT;
set SAMPLE;
call sort(of A--D);
_VALUE = input(prxchange('s/.*((\d+)\|)\1.*/\2/', 1, catx('|', of A--D, .)), ?? 8.);
set SAMPLE;
MATCH = catx(',', ifc(A=_VALUE, 'A', '')
, ifc(B=_VALUE, 'B', '')
, ifc(C=_VALUE, 'C', '')
, ifc(D=_VALUE, 'D', '')
);
drop _: ;
run;
Assuming data like this (I added the ROW_ID variable to identify rows):
data have;
row_id=_N_;
input ID_A ID_B ID_C ID_D;
cards;
1 1 1 1
2 2 2 3
3 4 4 4
1 1 3 5
2 2 6 6
1 2 3 4
;run;
I would first convert to a long format and sort:
proc transpose data=have out=long;
var ID_:;
by row_id;
run;
proc sort data=long;
by row_id col1;
run;
Then it is easy to find the matches:
data match;
do until(last.col1);
set long;
by row_id col1;
length match $10;
call catx(',',match,substr(_name_,4));
end;
if index(match,','); /* not single values */
keep row_id match;
run;
And finally merge with the original data:
data want;
merge have match;
by row_id;
run;
The answer to @Kurt_Bremser 's question is in this case that if there are two matching pairs, we will create an output row for each.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.