BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_Dude
Calcite | Level 5

I have a table which has multiple columns and large number of rows.  A sub sample give below , in this case Name1-Name4 are different column names which coded with two character values. 

Name1Name2Name3Name4
AABBCCDD
AAAABBEE
FFFFCCCC
AAAAAAAA
BBBBBBZZ

I am interested in finding distinct in number of characters in each row . that is final output table has to look like this

Name1Name2Name3Name4Count
AABBCCDD4
AAAABBEE3
FFFFCCCC2
AAAAAAAA1
BBBBBBZZ2

In the first row all the elements are distinct , so the count is 4. In second row there are 3 distinct values [AA,BB,EE]. Similarly for others

Can you please help to solve this issues without using multiple if-else statements.

Thanks,

Yoonus

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Art,

It's conceivable that blanks should be treated as a valid level in the counting.  But if not, here's how I would modify the solution:

array names {*} name1-name4;

count=1;

decrement=0;

do _J_=1 to dim(names)-1;

   increment=1;

   if names{_J_}=' ' then decrement=1;

   do _K_=_J_+1 to dim(names);

      if names{_J_} = names{_K_} then increment=0;

   end;

   count + increment;

end;

if names{dim(names)}=' ' then decrement=1;

count = count - decrement;

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

Your data is not in third normal form. If it would, the task is simple.

If you lack of a natural key, set a row id, transpose your data, then use select count(distinct name) group by rowid.

Data never sleeps
stat_sas
Ammonite | Level 13

data have;

id+1;

input (Name1 Name2 Name3 Name4)($);

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

proc transpose data=have out=have1;

by id;

var name1-name4;

run;

proc sql;

select a.name1,a.name2,a.name3,a.name4,b.distinct_count from have a

left join (select id,count(distinct col1) as distinct_count from have1 group by id) b

on a.id=b.id;

quit;

Astounding
PROC Star

Here's some code that could be stuck in a DATA step to help.

array names {*} name1-name4;

count=1;

do _J_=1 to dim(names)-1;

   increment=1;

   do _K_=_J_+1 to dim(names);

      if names{_J_} = names{_K_} then increment=0;

   end;

   count + increment;

end;

You could hard-code 4 instead of using the DIM function.  But it looked like the number of elements might actually be much larger in real life.

Good luck.

art297
Opal | Level 21

's suggestion won't work correctly if you have missing values. I think that the following would get around that:

data want (drop=_:);

  set have;

  array names(*) name1-name4;

  array _names(4) $;

  do _i=1 to 4;

    _names(_i)=names(_i);

  end;

  call sortc(of _names(*));

  count=ifn(missing(_names(dim(names))),0,1);

  do _i=1 to dim(names)-1;

    if not missing(_names(_i)) then do;

      if _names(_i) ne _names(_i+1) then count+1;

    end;

  end;

run;

Astounding
PROC Star

Art,

It's conceivable that blanks should be treated as a valid level in the counting.  But if not, here's how I would modify the solution:

array names {*} name1-name4;

count=1;

decrement=0;

do _J_=1 to dim(names)-1;

   increment=1;

   if names{_J_}=' ' then decrement=1;

   do _K_=_J_+1 to dim(names);

      if names{_J_} = names{_K_} then increment=0;

   end;

   count + increment;

end;

if names{dim(names)}=' ' then decrement=1;

count = count - decrement;

KachiM
Rhodochrosite | Level 12

Here is one way:

data have;

input (Name1 Name2 Name3 Name4)($);

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

run;

data want;

   set have;

   array k

  • $ Name1 - Name4;
  •    count = 1;

       do i = 1 to dim(k) - 1;

          if k ^= k[i + 1] then count + 1;

       end;

    drop i;

    run;

    The output:

    Obs    Name1    Name2    Name3    Name4    count

      1      AA       BB       CC       DD        4

      2      AA       AA       BB       EE        3

      3      FF       FF       CC       CC        2

      4      AA       AA       AA       AA        1

      5      BB       BB       BB       ZZ        2

    CTorres
    Quartz | Level 8

    Hi Muthia,

    Your solution does not work for the following case that give wrong result:

    AA BB CC DD   4

    AA AA BB EE   3

    FF CC FF CC   4

    AA AA AA AA   1

    BB BB ZZ BB   3

    CTorres

    KachiM
    Rhodochrosite | Level 12

    I overlooked the ordering of Values.

    Thanks to Gergely Bathó.


    Using call sortc() will help in this case, but the values will be  in ascending order in the output.


    data want;

       set have;

       array k

  • $ Name1 - Name4;
  •    call sortc(of k

  • );
  •    count = 1;

       do i = 1 to dim(k) - 1;

          if k ^= k[i + 1] then count + 1;

       end;

    drop i;

    run;


    Ksharp
    Super User
    data have;
    input (Name1 Name2 Name3 Name4)($);
    datalines;
    AA BB CC DD
    AA AA BB EE
    FF FF CC CC
    AA AA AA AA
    BB BB BB ZZ
    ;
    run;
    data have(drop=i);
    set have;
    array t{100} $32 _temporary_ ;
    array x{*} $ name1-name4;
    n=0;call missing(of t{*});
    do i=1 to dim(x);
    if x{i} not in t then do;n+1;t{n}=x{i};end;
    end;
    run;
    
    

    Xia Keshan

    agoldma
    Pyrite | Level 9

    Ksharp, thank you for a very thoughtful and efficient solution.

    I modified the array references a little bit -- to make it work in a PROC FCMP.

    Since SAS doesn't have a built-in function like this, I wanted to make my own.

     

    proc datasets nolist lib=work; delete functions; quit;

    options cmplib=_null_;

    proc fcmp outlib=work.functions.smd;

       function count_dist_val(arr_val{*} $ ) varargs ;

       array t{100} $100 _temporary_ ;

       n = 0;

       call missing( of t1-t100);

       do i=1 to dim(arr_val);

          if arr_val{i} not in (t1-t100) then

             do;

                n + 1 ;

                t{n} = arr_val{i} ;

             end;

       end;

       return (n);

       endsub;

    run;

    options cmplib=work.functions;

     

    data have;

    input (name1 name2 name3 name4)($);

       array x{*} $ name1-name4;

       dist_val = count_dist_val(x) ;

       put dist_val= ;

    datalines;

    AA BB CC DD

    AA AA BB EE

    FF FF CC CC

    AA AA AA AA

    BB BB BB ZZ

    ;

    run;

     

     

    MikeZdeb
    Rhodochrosite | Level 12

    Hi, another (belated) idea though you end up with sorted names ...

     

    data x;
    input (name1-name4) (:$2.);
    datalines;
    AA BB CC DD
    AA AA BB EE
    FF FF CC CC
    AA AA AA AA
    BB BB BB ZZ
    AA CC CC AA
    JJ KK KK KK
    ZZ YY ZZ YY
    QQ QQ EE FF
    ;

     

    data y (drop=names);
    length names $30;
    set x;
    call sortc(of name1-name4);
    names = catx(' ',of name: );
    howmany = (count(tranwrd(names,name1,''),name2) ne 0) +
              (count(tranwrd(names,name2,''),name3) ne 0) +
              (count(tranwrd(names,name3,''),name4) ne 0) + 1;
    run;

     

    Or ...

     

    data y (drop=j);
    set x;
    array name(4);
    call sortc(of name1-name4);
    howmany = 1;
    do j=1 to 3;
       howmany + ^^find(tranwrd(catx(' ',of name: ),name(j),''),name(j+1));
    end;
    run;

     

    (^^find ... changes result from location to 1 if found, 0 if not ... thanks to Howard Schreier) 

     

    DATA SET: y
    
    name1    name2    name3    name4    howmany
    
     AA       BB       CC       DD         4
     AA       AA       BB       EE         3
     CC       CC       FF       FF         2
     AA       AA       AA       AA         1
     BB       BB       BB       ZZ         2
     AA       AA       CC       CC         2
     JJ       KK       KK       KK         2
     YY       YY       ZZ       ZZ         2
     EE       FF       QQ       QQ         3
    

     

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 11 replies
    • 3413 views
    • 7 likes
    • 10 in conversation