Count value matches across columns in same row

Accepted Solution Solved
Reply
Contributor kbk
Contributor
Posts: 29
Accepted Solution

Count value matches across columns in same row

Hi All,

I would like to count the maximum number of value columns with the same value across any number of columns in the same row. For a small number of variables, this can be done with a bunch of conditional statements. However, this approach quickly falls apart as the number of variables increases. Is there a more elegant, dynamic approach?  To illustrate:

data have;

input col1 $ col2 $ col3 $ col4 $;

cards;

a b b b

a b b d

c b c b

d d d d

a d a a

;

data want;

     set have;

     if col1 = col2 = col3 = col4 then count = 4;

     else if col1 = col2 = col3   then count = 3;

     else if col1 = col2 = col4       then count = 3;

     else if col1 = col3 = col4   then count = 3;

     else if col2 = col3 = col4   then count = 3;

     else if col2 = col3 = col4   then count = 3;

     else if col1 = col2        then count = 2;

     else if col1 = col3        then count = 2;

     else if col1 = col4        then count = 2;

     else if col2 = col3        then count = 2;

     else if col2 = col4        then count = 2;

     else if col3 = col4        then count = 2;

     else count = 1;

run;


I found this code from  Messr. Tabachneck to be close but it counts the number of matching pairs rather than the number of columns with the same value.

data want (drop=i j);

set have;

array vars{*} col1-col4;

count=0;

  do i=1 to dim(vars)-1;

     do j=i+1 to dim(vars);

    count=sum(count, vars(i) eq vars(j));

     end;

  end;

run;


Best,

kbk


Accepted Solutions
Solution
‎04-03-2013 03:14 PM
Respected Advisor
Posts: 3,124

Re: Count value matches across columns in same row

Another brute force approach using Array():

data have;

input col1 $ col2 $ col3 $ col4 $;

cards;

a b b b

a b b d

c b c b

d d d d

a d a a

;

data want;

  set have;

  array col col1-col4;

  length _cat $100.;

  _cat=catx(',',of col(*));

do over col;

do _i=1 to dim(col);

_ct+(col=scan(_cat,_i));

end;

count=max(count,_ct);

_ct=0;

end;

drop _:;

run;

Haikuo

View solution in original post


All Replies
Super User
Posts: 17,813

Re: Count value matches across columns in same row

What happens when the following is true:

else if col1 = col2        then count = 2;

  

     else if col3 = col4        then count = 2;


Except if you have more than 4 columns?

Contributor kbk
Contributor
Posts: 29

Re: Count value matches across columns in same row

Reeza, I am assuming you are asking if the column values consist of [a a b b]. In this case the number needed would be 2. Column values of [a a a b b b b] would yield a count of 4, due to the fact of having 4 b's.

Super User
Posts: 17,813

Re: Count value matches across columns in same row

Why not transpose the data, run a proc freq, and take the top one?

Contributor kbk
Contributor
Posts: 29

Re: Count value matches across columns in same row

Reeza, thanks. I have transposed the dataset but am not sure the best way to "run a proc freq and take the top one." 


PROC TRANSPOSE DATA=WORK.have  OUT=WORK.have_tran PREFIX=Column;

     VAR col1-col4;

run;

PROC FREQ DATA=WORK.have_tran;

     TABLES Column1 /     OUT=WORK.Col1WANT;

     TABLES Column2 /     OUT=WORK.Col2WANT;

     TABLES Column3 /     OUT=WORK.Col3WANT;

     TABLES Column4 /     OUT=WORK.Col4WANT;

     TABLES Column5 /     OUT=WORK.Col5WANT;

RUN;


Super User
Posts: 17,813

Re: Count value matches across columns in same row

You have some more efficient suggestions above, but the following is what I had in mind. More of a beginner method I suppose Smiley Happy

data have;

input col1 $ col2 $ col3 $ col4 $;

cards;

a b b b

a b b d

c b c b

d d d d

a d a a

;

*add in a row count;

data have_transposed;

set have;

row=_n_;

array col(4) $ col1-col4;

do i=1 to 4;

    column=i;

    value=col(i);

    output;

end;

keep row column value;

run;

*count the results and order them descending;

proc freq data=have_transposed order=freq noprint;

by row;

table value/out=counts;

run;

*take the top one;

data want;

set counts;

by row;

if first.row;

run;

Respected Advisor
Posts: 4,644

Re: Count value matches across columns in same row

Use call routine sortc. Like this :


data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;

data want(keep=best bestCount);
set have;
array c{*} col:;
call sortc(of c{*});
last = c{1};

best = last; /* So that best as the same length as last and col1 */
do i = 1 to dim(c);
     same = c{i} = last;
     count = sum(count, same);
     if count > bestCount then do;
          best = coalescec(last, c{i});
          bestCount = count;
          end;
     if not same then count = 0;
     last = c{i};
     end;
output;
run;

PG

PG
Respected Advisor
Posts: 3,777

Re: Count value matches across columns in same row

I had a similar idea but I don't know how it works. :smileyshocked:


I used peek and poke to save the array and put it back.

data abcd;
   array col[4] $1;
  
input col
  • ;
  •    length a $4;
      
    retain addr;
       if _n_ eq 1 then addr = addrlong(col1);
       a = peekclong(addr,4);
       call sortC(of Col
  • );
  •    j = col1;
       k =
    0;
       c =
    .;
      
    do i = 1 to 4;
         
    if j = col then k+1;
         
    else do; j=col; k=1; end;
          c=c max k;
         
    end;
      
    call pokelong(a,addr,4);
       drop i j k addr;
       cards;
    a b b b
    a b b d
    c b c b
    d d d d
    a d a a
    ;;;;
    Contributor kbk
    Contributor
    Posts: 29

    Re: Count value matches across columns in same row

    Thanks for the help everyone!

    Solution
    ‎04-03-2013 03:14 PM
    Respected Advisor
    Posts: 3,124

    Re: Count value matches across columns in same row

    Another brute force approach using Array():

    data have;

    input col1 $ col2 $ col3 $ col4 $;

    cards;

    a b b b

    a b b d

    c b c b

    d d d d

    a d a a

    ;

    data want;

      set have;

      array col col1-col4;

      length _cat $100.;

      _cat=catx(',',of col(*));

    do over col;

    do _i=1 to dim(col);

    _ct+(col=scan(_cat,_i));

    end;

    count=max(count,_ct);

    _ct=0;

    end;

    drop _:;

    run;

    Haikuo

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 9 replies
    • 3305 views
    • 3 likes
    • 5 in conversation