BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

Hi All,

I would like to count the maximum number of value columns with the same value across any number of columns in the same row. For a small number of variables, this can be done with a bunch of conditional statements. However, this approach quickly falls apart as the number of variables increases. Is there a more elegant, dynamic approach?  To illustrate:

data have;

input col1 $ col2 $ col3 $ col4 $;

cards;

a b b b

a b b d

c b c b

d d d d

a d a a

;

data want;

     set have;

     if col1 = col2 = col3 = col4 then count = 4;

     else if col1 = col2 = col3   then count = 3;

     else if col1 = col2 = col4       then count = 3;

     else if col1 = col3 = col4   then count = 3;

     else if col2 = col3 = col4   then count = 3;

     else if col2 = col3 = col4   then count = 3;

     else if col1 = col2        then count = 2;

     else if col1 = col3        then count = 2;

     else if col1 = col4        then count = 2;

     else if col2 = col3        then count = 2;

     else if col2 = col4        then count = 2;

     else if col3 = col4        then count = 2;

     else count = 1;

run;


I found this code from  Messr. Tabachneck to be close but it counts the number of matching pairs rather than the number of columns with the same value.

data want (drop=i j);

set have;

array vars{*} col1-col4;

count=0;

  do i=1 to dim(vars)-1;

     do j=i+1 to dim(vars);

    count=sum(count, vars(i) eq vars(j));

     end;

  end;

run;


Best,

kbk

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Another brute force approach using Array():

data have;

input col1 $ col2 $ col3 $ col4 $;

cards;

a b b b

a b b d

c b c b

d d d d

a d a a

;

data want;

  set have;

  array col col1-col4;

  length _cat $100.;

  _cat=catx(',',of col(*));

do over col;

do _i=1 to dim(col);

_ct+(col=scan(_cat,_i));

end;

count=max(count,_ct);

_ct=0;

end;

drop _:;

run;

Haikuo

View solution in original post

9 REPLIES 9
Reeza
Super User

What happens when the following is true:

else if col1 = col2        then count = 2;

  

     else if col3 = col4        then count = 2;


Except if you have more than 4 columns?

kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

Reeza, I am assuming you are asking if the column values consist of [a a b b]. In this case the number needed would be 2. Column values of [a a a b b b b] would yield a count of 4, due to the fact of having 4 b's.

Reeza
Super User

Why not transpose the data, run a proc freq, and take the top one?

kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

Reeza, thanks. I have transposed the dataset but am not sure the best way to "run a proc freq and take the top one." 


PROC TRANSPOSE DATA=WORK.have  OUT=WORK.have_tran PREFIX=Column;

     VAR col1-col4;

run;

PROC FREQ DATA=WORK.have_tran;

     TABLES Column1 /     OUT=WORK.Col1WANT;

     TABLES Column2 /     OUT=WORK.Col2WANT;

     TABLES Column3 /     OUT=WORK.Col3WANT;

     TABLES Column4 /     OUT=WORK.Col4WANT;

     TABLES Column5 /     OUT=WORK.Col5WANT;

RUN;


Reeza
Super User

You have some more efficient suggestions above, but the following is what I had in mind. More of a beginner method I suppose Smiley Happy

data have;

input col1 $ col2 $ col3 $ col4 $;

cards;

a b b b

a b b d

c b c b

d d d d

a d a a

;

*add in a row count;

data have_transposed;

set have;

row=_n_;

array col(4) $ col1-col4;

do i=1 to 4;

    column=i;

    value=col(i);

    output;

end;

keep row column value;

run;

*count the results and order them descending;

proc freq data=have_transposed order=freq noprint;

by row;

table value/out=counts;

run;

*take the top one;

data want;

set counts;

by row;

if first.row;

run;

PGStats
Opal | Level 21

Use call routine sortc. Like this :


data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;

data want(keep=best bestCount);
set have;
array c{*} col:;
call sortc(of c{*});
last = c{1};

best = last; /* So that best as the same length as last and col1 */
do i = 1 to dim(c);
     same = c{i} = last;
     count = sum(count, same);
     if count > bestCount then do;
          best = coalescec(last, c{i});
          bestCount = count;
          end;
     if not same then count = 0;
     last = c{i};
     end;
output;
run;

PG

PG
data_null__
Jade | Level 19

I had a similar idea but I don't know how it works. :smileyshocked:


I used peek and poke to save the array and put it back.

data abcd;
   array col[4] $1;
  
input col
  • ;
  •    length a $4;
      
    retain addr;
       if _n_ eq 1 then addr = addrlong(col1);
       a = peekclong(addr,4);
       call sortC(of Col
  • );
  •    j = col1;
       k =
    0;
       c =
    .;
      
    do i = 1 to 4;
         
    if j = col then k+1;
         
    else do; j=col; k=1; end;
          c=c max k;
         
    end;
      
    call pokelong(a,addr,4);
       drop i j k addr;
       cards;
    a b b b
    a b b d
    c b c b
    d d d d
    a d a a
    ;;;;
    kbk
    Fluorite | Level 6 kbk
    Fluorite | Level 6

    Thanks for the help everyone!

    Haikuo
    Onyx | Level 15

    Another brute force approach using Array():

    data have;

    input col1 $ col2 $ col3 $ col4 $;

    cards;

    a b b b

    a b b d

    c b c b

    d d d d

    a d a a

    ;

    data want;

      set have;

      array col col1-col4;

      length _cat $100.;

      _cat=catx(',',of col(*));

    do over col;

    do _i=1 to dim(col);

    _ct+(col=scan(_cat,_i));

    end;

    count=max(count,_ct);

    _ct=0;

    end;

    drop _:;

    run;

    Haikuo

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 9 replies
    • 7812 views
    • 3 likes
    • 5 in conversation