I would like to know how I can count distinct characters in a string
example:
NAME PHONE CHAR_DISTINCT
Ana 21212 2
Pablo 234234 3
Hana 1111111 1
Or using PRXCHANGE Function
data want;
set have;
array t(100) $ _temporary_;
call missing(of t(*));
do _n_=1 to length(PHONE);
t(_n_)=char(PHONE,_n_);
end;
call sortc(of t(*));
CHAR_DISTINCT=length(prxchange('s/([0-9])\1+/$1/', -1, cats(of t(*))));
run;
data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;
data want(keep=NAME PHONE CHAR_DISTINCT);
set have;
newphone=substr(PHONE, 1, 1);
do i=2 to length(PHONE);
found=find(newphone, substr(PHONE, i, 1), 'it');
if found=0 then newphone=cats(newphone, substr(PHONE, i, 1));
end;
CHAR_DISTINCT=length(newphone);
run;
Or using PRXCHANGE Function
data want;
set have;
array t(100) $ _temporary_;
call missing(of t(*));
do _n_=1 to length(PHONE);
t(_n_)=char(PHONE,_n_);
end;
call sortc(of t(*));
CHAR_DISTINCT=length(prxchange('s/([0-9])\1+/$1/', -1, cats(of t(*))));
run;
Just for completeness, a solution with SQL.
data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;
run;
data trans;
set have;
length digit $1;
do i = 1 to length(phone);
digit = substr(phone,i,1);
output;
end;
drop i;
run;
proc sql;
create table want as
select
name,
phone,
count(distinct digit) as char_distinct
from trans
group by name, phone;
quit;
proc print data=want noobs;
run;
Result:
char_ NAME PHONE distinct Ana 21212 2 Hana 1111111 1 Pablo 234234 3
@PeterClemmensen's first solution is the best, as it solves everything in one sequential pass through have.
Just for completeness, a solution with Hash Table.
data have;
input NAME $ PHONE $;
datalines;
Ana 21212
Pablo 234234
Hana 1111111
;
run;
data want;
if _n_=1 then do;
length k $ 1;
declare hash h();
h.definekey('k');
h.definedone();
end;
set have;
do i = 1 to length(phone);
k = char(phone,i);
h.ref();
end;
count=h.num_items;
h.clear();
drop k i;
run;
Here is another version. Look at the first character of PHONE. Count it. Compress the first character from the Phone. Repeat the above step until the phone has left with one or no character.
data have; input NAME $ PHONE $; datalines; Ana 21212 Pablo 234234 Hana 1111111 ; run; data want; set have; ph = phone; count = 0; do until(len <= 1); d = char(ph,1); count + 1; ph = compress(ph,d); len = length(ph); end; drop ph len d; run; Obs NAME PHONE count 1 Ana 21212 2 2 Pablo 234234 3 3 Hana 1111111 1
Note that among the previous solutions only @Kurt_Bremser's produces a count of zero in the case of an empty input string. Of course, you could adapt the other solutions correspondingly if needed. For example, using COMPRESS (like in @KachiM's solution):
data want(drop=_t);
set have;
_t=phone;
do CHAR_DISTINCT=0 by 1 while(_t ne ' ');
_t=compress(_t,first(_t));
end;
run;
To obtain CHAR_DISTINCT=1 for missing PHONE replace the DO statement with:
do CHAR_DISTINCT=1 by 1 until(_t=' ');
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.