Solved: Count distinct values in a string

a_ramos · Posted 03-20-2019 05:33 PM

I would like to know how I can count distinct characters in a string

example:

NAME PHONE CHAR_DISTINCT

Ana 21212 2

Pablo 234234 3

Hana 1111111 1

PeterClemmensen · Posted 03-20-2019 06:53 PM

Or using PRXCHANGE Function

data want;
   set have;
   array t(100) $  _temporary_;
   call missing(of t(*));
   do _n_=1 to length(PHONE);
      t(_n_)=char(PHONE,_n_);
   end;
   call sortc(of t(*));
   CHAR_DISTINCT=length(prxchange('s/([0-9])\1+/$1/', -1, cats(of t(*))));
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 03-20-2019 06:08 PM

data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;

data want(keep=NAME PHONE CHAR_DISTINCT);
   set have;
   newphone=substr(PHONE, 1, 1);
   do i=2 to length(PHONE);
      found=find(newphone, substr(PHONE, i, 1), 'it');
      if found=0 then newphone=cats(newphone, substr(PHONE, i, 1));
   end;
   CHAR_DISTINCT=length(newphone);
run;

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 03-20-2019 06:53 PM

Or using PRXCHANGE Function

data want;
   set have;
   array t(100) $  _temporary_;
   call missing(of t(*));
   do _n_=1 to length(PHONE);
      t(_n_)=char(PHONE,_n_);
   end;
   call sortc(of t(*));
   CHAR_DISTINCT=length(prxchange('s/([0-9])\1+/$1/', -1, cats(of t(*))));
run;

The DATA to DATA Step Macro
Blog: SASnrd

a_ramos · Posted 03-22-2019 09:36 AM

Thank you! 🙂

Kurt_Bremser · Posted 03-21-2019 03:52 AM

Just for completeness, a solution with SQL.

data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;
run;

data trans;
set have;
length digit $1;
do i = 1 to length(phone);
  digit = substr(phone,i,1);
  output;
end;
drop i;
run;

proc sql;
create table want as
select
  name,
  phone,
  count(distinct digit) as char_distinct
from trans
group by name, phone;
quit;

proc print data=want noobs;
run;

Result:

                      char_
NAME      PHONE     distinct

Ana      21212          2   
Hana     1111111        1   
Pablo    234234         3

@PeterClemmensen's first solution is the best, as it solves everything in one sequential pass through have.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 03-21-2019 07:57 AM

Just for completeness, a solution with Hash Table.

data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;
run;
data want;
 if _n_=1 then do;
  length k $ 1;
  declare hash h();
  h.definekey('k');
  h.definedone();
 end;
set have;
 do i = 1 to length(phone);
  k = char(phone,i);
  h.ref(); 
end;
count=h.num_items;
h.clear();
drop k i;
run;

KachiM · Posted 03-21-2019 10:34 AM

Here is another version. Look at the first character of PHONE. Count it. Compress the first character from the Phone. Repeat the above step until the phone has left with one or no character.

data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;
run;

data want;
   set have;
   ph = phone;
   count = 0;
   do until(len <= 1);
      d = char(ph,1);
      count + 1;
      ph = compress(ph,d);
      len = length(ph);
   end;
drop ph len d;
run;

Obs 	NAME 	PHONE 	count
1 	Ana 	21212 	2
2 	Pablo 	234234 	3
3 	Hana 	1111111 	1

FreelanceReinh · Posted 03-21-2019 11:42 AM

Note that among the previous solutions only @Kurt_Bremser's produces a count of zero in the case of an empty input string. Of course, you could adapt the other solutions correspondingly if needed. For example, using COMPRESS (like in @KachiM's solution):

data want(drop=_t);
set have;
_t=phone;
do CHAR_DISTINCT=0 by 1 while(_t ne ' ');
  _t=compress(_t,first(_t));
end;
run;

To obtain CHAR_DISTINCT=1 for missing PHONE replace the DO statement with:

do CHAR_DISTINCT=1 by 1 until(_t=' ');

Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Re: Count distinct values in a string

Registration is open