data have;
input (var1-var4) ($);
cards;
hi hi hi ho
add add hu hi
h j l o
su su su su
;
data want;
if _n_=1 then do;
length k $ 100 count 8;
declare hash h();
declare hiter hi('h');
h.definekey('k');
h.definedata('count');
h.definedone();
end;
set have;
h.clear();
array x{*} $ _character_;
do i=1 to dim(x);
if h.find(key:x{i})=0 then count=count+1;
else count=1;
if not missing(x{i}) then h.replace(key:x{i},data:count);
end;
do while(hi.next()=0);
desired_count=max(desired_count,count);
end;
drop i count k;
run;
Edited.
are you always comparing to the value in VAR1? Or is there some other logic?
@Emma2021 wrote:
Across all 4 variables because the value can be anything as in the examples
So you are just checking to see the maximum number of matches anywhere in these 4 variables?
What @Reeza said
Convert to long
data have2;
set have;
n=_n_;
run;
proc transpose data=have2 out=long prefix=values;
by n;
var var1-var4;
run;
Count frequencies
proc freq data=long;
by n;
table values/out=_freqs_ noprint;
run;
Find max frequency for each value of N
proc summary data=_freqs_;
class n;
var count;
output out=max_freq max=max_freq;
run;
From here you can do whatever you want with the max_freq values.
Hello @Emma2021,
Alternatively, you can add the six Boolean values vari=varj (i,j = 1, 2, 3, 4; i<j) and then assign the desired count to the possible sums, which are 0, 1, 2, 3 and 6. That is, 0 → 1, 1 → 2, 2 → 2 (unless you want to distinguish cases like "A A B B" [two pairs] from cases like "A A B C" [only one pair], which would make things even easier), 3 → 3, 6 → 4.
Example:
data want;
set have;
dc=whichn(int((var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)-1.5),-1,0,1,4);
run;
The INT function and subtraction of 1.5 help to map both sums 1 and 2 to the desired count dc=2 (as mentioned above [edit: 2, not 1]).
Here are two variants of my first suggestion, using different ways to perform the mapping from the sum of Boolean values to the desired counts (dc):
data want;
set have;
dc=choosen((var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)+1, 1,2,2,3,.,.,4);
run;
data want;
array c[0:6] _temporary_ (1 2 2 3 . . 4);
set have;
dc=c[(var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)];
run;
I think both are easier to understand and also easier to adapt if you want to implement a different mapping "sum → dc" than the one used so far:
sum dc 0 1 1 2 2 2 3 3 6 4
data have;
input (var1-var4) ($);
cards;
hi hi hi ho
add add hu hi
h j l o
su su su su
;
data want;
if _n_=1 then do;
length k $ 100 count 8;
declare hash h();
declare hiter hi('h');
h.definekey('k');
h.definedata('count');
h.definedone();
end;
set have;
h.clear();
array x{*} $ _character_;
do i=1 to dim(x);
if h.find(key:x{i})=0 then count=count+1;
else count=1;
if not missing(x{i}) then h.replace(key:x{i},data:count);
end;
do while(hi.next()=0);
desired_count=max(desired_count,count);
end;
drop i count k;
run;
Edited.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.