data have;
input (var1-var4) ($);
cards;
hi hi hi ho
add add hu hi
h j l o
su su su su
;
data want;
if _n_=1 then do;
length k $ 100 count 8;
declare hash h();
declare hiter hi('h');
h.definekey('k');
h.definedata('count');
h.definedone();
end;
set have;
h.clear();
array x{*} $ _character_;
do i=1 to dim(x);
if h.find(key:x{i})=0 then count=count+1;
else count=1;
if not missing(x{i}) then h.replace(key:x{i},data:count);
end;
do while(hi.next()=0);
desired_count=max(desired_count,count);
end;
drop i count k;
run;
Edited.
are you always comparing to the value in VAR1? Or is there some other logic?
@Emma2021 wrote:
Across all 4 variables because the value can be anything as in the examples
So you are just checking to see the maximum number of matches anywhere in these 4 variables?
What @Reeza said
Convert to long
data have2;
set have;
n=_n_;
run;
proc transpose data=have2 out=long prefix=values;
by n;
var var1-var4;
run;
Count frequencies
proc freq data=long;
by n;
table values/out=_freqs_ noprint;
run;
Find max frequency for each value of N
proc summary data=_freqs_;
class n;
var count;
output out=max_freq max=max_freq;
run;
From here you can do whatever you want with the max_freq values.
Hello @Emma2021,
Alternatively, you can add the six Boolean values vari=varj (i,j = 1, 2, 3, 4; i<j) and then assign the desired count to the possible sums, which are 0, 1, 2, 3 and 6. That is, 0 → 1, 1 → 2, 2 → 2 (unless you want to distinguish cases like "A A B B" [two pairs] from cases like "A A B C" [only one pair], which would make things even easier), 3 → 3, 6 → 4.
Example:
data want;
set have;
dc=whichn(int((var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)-1.5),-1,0,1,4);
run;
The INT function and subtraction of 1.5 help to map both sums 1 and 2 to the desired count dc=2 (as mentioned above [edit: 2, not 1]).
Here are two variants of my first suggestion, using different ways to perform the mapping from the sum of Boolean values to the desired counts (dc):
data want;
set have;
dc=choosen((var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)+1, 1,2,2,3,.,.,4);
run;
data want;
array c[0:6] _temporary_ (1 2 2 3 . . 4);
set have;
dc=c[(var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)];
run;
I think both are easier to understand and also easier to adapt if you want to implement a different mapping "sum → dc" than the one used so far:
sum dc 0 1 1 2 2 2 3 3 6 4
data have;
input (var1-var4) ($);
cards;
hi hi hi ho
add add hu hi
h j l o
su su su su
;
data want;
if _n_=1 then do;
length k $ 100 count 8;
declare hash h();
declare hiter hi('h');
h.definekey('k');
h.definedata('count');
h.definedone();
end;
set have;
h.clear();
array x{*} $ _character_;
do i=1 to dim(x);
if h.find(key:x{i})=0 then count=count+1;
else count=1;
if not missing(x{i}) then h.replace(key:x{i},data:count);
end;
do while(hi.next()=0);
desired_count=max(desired_count,count);
end;
drop i count k;
run;
Edited.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.