DATA Step, Macro, Functions and more

how to get number of distinct elements in each row without nested if-else loops

Accepted Solution Solved
Reply
N/A
Posts: 1
Accepted Solution

how to get number of distinct elements in each row without nested if-else loops

I have a table which has multiple columns and large number of rows.  A sub sample give below , in this case Name1-Name4 are different column names which coded with two character values. 

Name1Name2Name3Name4
AABBCCDD
AAAABBEE
FFFFCCCC
AAAAAAAA
BBBBBBZZ

I am interested in finding distinct in number of characters in each row . that is final output table has to look like this

Name1Name2Name3Name4Count
AABBCCDD4
AAAABBEE3
FFFFCCCC2
AAAAAAAA1
BBBBBBZZ2

In the first row all the elements are distinct , so the count is 4. In second row there are 3 distinct values [AA,BB,EE]. Similarly for others

Can you please help to solve this issues without using multiple if-else statements.

Thanks,

Yoonus


Accepted Solutions
Solution
‎10-14-2014 01:37 PM
Super User
Posts: 5,082

Re: how to get number of distinct elements in each row without nested if-else loops

Art,

It's conceivable that blanks should be treated as a valid level in the counting.  But if not, here's how I would modify the solution:

array names {*} name1-name4;

count=1;

decrement=0;

do _J_=1 to dim(names)-1;

   increment=1;

   if names{_J_}=' ' then decrement=1;

   do _K_=_J_+1 to dim(names);

      if names{_J_} = names{_K_} then increment=0;

   end;

   count + increment;

end;

if names{dim(names)}=' ' then decrement=1;

count = count - decrement;

View solution in original post


All Replies
Super User
Posts: 5,256

Re: how to get number of distinct elements in each row without nested if-else loops

Your data is not in third normal form. If it would, the task is simple.

If you lack of a natural key, set a row id, transpose your data, then use select count(distinct name) group by rowid.

Data never sleeps
Trusted Advisor
Posts: 1,204

Re: how to get number of distinct elements in each row without nested if-else loops

data have;

id+1;

input (Name1 Name2 Name3 Name4)($);

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

proc transpose data=have out=have1;

by id;

var name1-name4;

run;

proc sql;

select a.name1,a.name2,a.name3,a.name4,b.distinct_count from have a

left join (select id,count(distinct col1) as distinct_count from have1 group by id) b

on a.id=b.id;

quit;

Super User
Posts: 5,082

Re: how to get number of distinct elements in each row without nested if-else loops

Here's some code that could be stuck in a DATA step to help.

array names {*} name1-name4;

count=1;

do _J_=1 to dim(names)-1;

   increment=1;

   do _K_=_J_+1 to dim(names);

      if names{_J_} = names{_K_} then increment=0;

   end;

   count + increment;

end;

You could hard-code 4 instead of using the DIM function.  But it looked like the number of elements might actually be much larger in real life.

Good luck.

PROC Star
Posts: 7,363

Re: how to get number of distinct elements in each row without nested if-else loops

's suggestion won't work correctly if you have missing values. I think that the following would get around that:

data want (drop=_Smiley Happy;

  set have;

  array names(*) name1-name4;

  array _names(4) $;

  do _i=1 to 4;

    _names(_i)=names(_i);

  end;

  call sortc(of _names(*));

  count=ifn(missing(_names(dim(names))),0,1);

  do _i=1 to dim(names)-1;

    if not missing(_names(_i)) then do;

      if _names(_i) ne _names(_i+1) then count+1;

    end;

  end;

run;

Solution
‎10-14-2014 01:37 PM
Super User
Posts: 5,082

Re: how to get number of distinct elements in each row without nested if-else loops

Art,

It's conceivable that blanks should be treated as a valid level in the counting.  But if not, here's how I would modify the solution:

array names {*} name1-name4;

count=1;

decrement=0;

do _J_=1 to dim(names)-1;

   increment=1;

   if names{_J_}=' ' then decrement=1;

   do _K_=_J_+1 to dim(names);

      if names{_J_} = names{_K_} then increment=0;

   end;

   count + increment;

end;

if names{dim(names)}=' ' then decrement=1;

count = count - decrement;

Super Contributor
Posts: 254

Re: how to get number of distinct elements in each row without nested if-else loops

Here is one way:

data have;

input (Name1 Name2 Name3 Name4)($);

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

run;

data want;

   set have;

   array k

  • $ Name1 - Name4;
  •    count = 1;

       do i = 1 to dim(k) - 1;

          if k ^= k[i + 1] then count + 1;

       end;

    drop i;

    run;

    The output:

    Obs    Name1    Name2    Name3    Name4    count

      1      AA       BB       CC       DD        4

      2      AA       AA       BB       EE        3

      3      FF       FF       CC       CC        2

      4      AA       AA       AA       AA        1

      5      BB       BB       BB       ZZ        2

    Regular Contributor
    Posts: 180

    Re: how to get number of distinct elements in each row without nested if-else loops

    Hi Muthia,

    Your solution does not work for the following case that give wrong result:

    AA BB CC DD   4

    AA AA BB EE   3

    FF CC FF CC   4

    AA AA AA AA   1

    BB BB ZZ BB   3

    CTorres

    Super Contributor
    Posts: 254

    Re: how to get number of distinct elements in each row without nested if-else loops

    I overlooked the ordering of Values.

    Thanks to Gergely Bathó.


    Using call sortc() will help in this case, but the values will be  in ascending order in the output.


    data want;

       set have;

       array k

  • $ Name1 - Name4;
  •    call sortc(of k

  • );
  •    count = 1;

       do i = 1 to dim(k) - 1;

          if k ^= k[i + 1] then count + 1;

       end;

    drop i;

    run;


    Super User
    Posts: 9,681

    Re: how to get number of distinct elements in each row without nested if-else loops

    data have;
    input (Name1 Name2 Name3 Name4)($);
    datalines;
    AA BB CC DD
    AA AA BB EE
    FF FF CC CC
    AA AA AA AA
    BB BB BB ZZ
    ;
    run;
    data have(drop=i);
    set have;
    array t{100} $32 _temporary_ ;
    array x{*} $ name1-name4;
    n=0;call missing(of t{*});
    do i=1 to dim(x);
    if x{i} not in t then do;n+1;t{n}=x{i};end;
    end;
    run;
    
    

    Xia Keshan

    Contributor
    Posts: 74

    Re: how to get number of distinct elements in each row without nested if-else loops

    [ Edited ]

    Ksharp, thank you for a very thoughtful and efficient solution.

    I modified the array references a little bit -- to make it work in a PROC FCMP.

    Since SAS doesn't have a built-in function like this, I wanted to make my own.

     

    proc datasets nolist lib=work; delete functions; quit;

    options cmplib=_null_;

    proc fcmp outlib=work.functions.smd;

       function count_dist_val(arr_val{*} $ ) varargs ;

       array t{100} $100 _temporary_ ;

       n = 0;

       call missing( of t1-t100);

       do i=1 to dim(arr_val);

          if arr_val{i} not in (t1-t100) then

             do;

                n + 1 ;

                t{n} = arr_val{i} ;

             end;

       end;

       return (n);

       endsub;

    run;

    options cmplib=work.functions;

     

    data have;

    input (name1 name2 name3 name4)($);

       array x{*} $ name1-name4;

       dist_val = count_dist_val(x) ;

       put dist_val= ;

    datalines;

    AA BB CC DD

    AA AA BB EE

    FF FF CC CC

    AA AA AA AA

    BB BB BB ZZ

    ;

    run;

     

     

    Valued Guide
    Posts: 765

    Re: how to get number of distinct elements in each row without nested if-else loops

    [ Edited ]

    Hi, another (belated) idea though you end up with sorted names ...

     

    data x;
    input (name1-name4) (:$2.);
    datalines;
    AA BB CC DD
    AA AA BB EE
    FF FF CC CC
    AA AA AA AA
    BB BB BB ZZ
    AA CC CC AA
    JJ KK KK KK
    ZZ YY ZZ YY
    QQ QQ EE FF
    ;

     

    data y (drop=names);
    length names $30;
    set x;
    call sortc(of name1-name4);
    names = catx(' ',of name: );
    howmany = (count(tranwrd(names,name1,''),name2) ne 0) +
              (count(tranwrd(names,name2,''),name3) ne 0) +
              (count(tranwrd(names,name3,''),name4) ne 0) + 1;
    run;

     

    Or ...

     

    data y (drop=j);
    set x;
    array name(4);
    call sortc(of name1-name4);
    howmany = 1;
    do j=1 to 3;
       howmany + ^^find(tranwrd(catx(' ',of name: ),name(j),''),name(j+1));
    end;
    run;

     

    (^^find ... changes result from location to 1 if found, 0 if not ... thanks to Howard Schreier) 

     

    DATA SET: y
    
    name1    name2    name3    name4    howmany
    
     AA       BB       CC       DD         4
     AA       AA       BB       EE         3
     CC       CC       FF       FF         2
     AA       AA       AA       AA         1
     BB       BB       BB       ZZ         2
     AA       AA       CC       CC         2
     JJ       KK       KK       KK         2
     YY       YY       ZZ       ZZ         2
     EE       FF       QQ       QQ         3
    

     

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 11 replies
    • 617 views
    • 7 likes
    • 10 in conversation