Solved: Re: Counting number of the same values

Emma2021 · Posted 08-24-2021 04:49 PM

I have 4 variables and would like to count the same values, for example,

Var1 var2 var3 var4 desired_count
hi hi hi ho 3
add add hu hi 2
h j l o 1
su su su su 4

How can I do that? Thank you!

Ksharp · Posted 08-25-2021 08:33 AM

data have;
input (var1-var4) ($);
cards;
hi hi hi ho 
add add hu hi 
h j l o 
su su su su
;
data want;
 if _n_=1 then do;
   length k $ 100 count 8;
   declare hash h();
   declare hiter hi('h');
   h.definekey('k');
   h.definedata('count');
   h.definedone();
 end;
set have;
h.clear();
array x{*} $ _character_;
do i=1 to dim(x);
 if h.find(key:x{i})=0 then count=count+1;
  else count=1;
if not missing(x{i}) then  h.replace(key:x{i},data:count);
end;
do while(hi.next()=0);
 desired_count=max(desired_count,count);
end;
drop i count k;
run;

Edited.

View solution in original post

PaigeMiller · Posted 08-24-2021 05:11 PM

are you always comparing to the value in VAR1? Or is there some other logic?

--
Paige Miller

Emma2021 · Posted 08-24-2021 05:13 PM

Across all 4 variables because the value can be anything as in the examples

PaigeMiller · Posted 08-24-2021 05:15 PM

@Emma2021 wrote:
Across all 4 variables because the value can be anything as in the examples

So you are just checking to see the maximum number of matches anywhere in these 4 variables?

--
Paige Miller

Emma2021 · Posted 08-24-2021 05:15 PM

Yes

PaigeMiller · Posted 08-24-2021 06:09 PM

What @Reeza said

Convert to long

data have2;
    set have;
    n=_n_;
run;
proc transpose data=have2 out=long prefix=values;
    by n;
    var var1-var4;
run;

Count frequencies

proc freq data=long;
    by n;
    table values/out=_freqs_ noprint;
run;

Find max frequency for each value of N

proc summary data=_freqs_;
    class n;
    var count;
    output out=max_freq max=max_freq;
run;

From here you can do whatever you want with the max_freq values.

--
Paige Miller

Reeza · Posted 08-24-2021 05:13 PM

What would the results be from this data:

Var1 var2 var3 var4 desired_count
hi hi ho ho
add add hu hi
h j l o

Emma2021 · Posted 08-24-2021 05:14 PM

2
2
1

Reeza · Posted 08-24-2021 05:16 PM

Transpose your data, take a proc freq getting the most frequent count and merge that back in is the simplest solution IMO.

FreelanceReinh · Posted 08-24-2021 06:47 PM

Hello @Emma2021,

Alternatively, you can add the six Boolean values vari=varj (i,j = 1, 2, 3, 4; i<j) and then assign the desired count to the possible sums, which are 0, 1, 2, 3 and 6. That is, 0 → 1, 1 → 2, 2 → 2 (unless you want to distinguish cases like "A A B B" [two pairs] from cases like "A A B C" [only one pair], which would make things even easier), 3 → 3, 6 → 4.

Example:

data want;
set have;
dc=whichn(int((var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)-1.5),-1,0,1,4);
run;

The INT function and subtraction of 1.5 help to map both sums 1 and 2 to the desired count dc=2 (as mentioned above [edit: 2, not 1]).

FreelanceReinh · Posted 08-25-2021 03:26 AM

Here are two variants of my first suggestion, using different ways to perform the mapping from the sum of Boolean values to the desired counts (dc):

data want;
set have;
dc=choosen((var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)+1, 1,2,2,3,.,.,4);
run;

data want;
array c[0:6] _temporary_ (1 2 2 3 . . 4);
set have;
dc=c[(var1=var2)+(var1=var3)+(var1=var4)+(var2=var3)+(var2=var4)+(var3=var4)];
run;

I think both are easier to understand and also easier to adapt if you want to implement a different mapping "sum → dc" than the one used so far:

Ksharp · Posted 08-25-2021 08:33 AM

data have;
input (var1-var4) ($);
cards;
hi hi hi ho 
add add hu hi 
h j l o 
su su su su
;
data want;
 if _n_=1 then do;
   length k $ 100 count 8;
   declare hash h();
   declare hiter hi('h');
   h.definekey('k');
   h.definedata('count');
   h.definedone();
 end;
set have;
h.clear();
array x{*} $ _character_;
do i=1 to dim(x);
 if h.find(key:x{i})=0 then count=count+1;
  else count=1;
if not missing(x{i}) then  h.replace(key:x{i},data:count);
end;
do while(hi.next()=0);
 desired_count=max(desired_count,count);
end;
drop i count k;
run;

Edited.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away