Desktop productivity for business analysts and programmers

count the number of successive repeating characters in string

Accepted Solution Solved
Reply
Contributor ak2
Contributor
Posts: 27
Accepted Solution

count the number of successive repeating characters in string

Hi! I have a string made up of digits which is 18 characters long. I need to count the number of repeating digits starting from the beginning of the string. That is, given substr(myString,1,1), how many repeating characters are in the rest of the string starting at substr(myString,2,18).

Example:

'00000' should output 4 'coz there are 4 0s following the one at the beginning of the string

'100000' should output 0 'coz there are no following 1s after the first one

'1100000222' should output 1 'coz there is one following 1 after the first one

'02200000' should output 0 'coz there are no 0s following the one at the beginning of the string

'002200000' should output 1 'coz there is one 0 following the one at the beginning of the string

Can anyone help me with this? I'm in sas eg 7 12. 


Accepted Solutions
Solution
‎11-08-2016 08:33 AM
Super User
Posts: 5,366

Re: count the number of successive repeating characters in string

It can get tricky if you have 18 identical digits.  Here's a way to work with that:

 

data want;

set have;

length firstchar $ 1;

firstchar = myString;

count=0;

do i=2 to 18;

   if substr(myString, i, 1) = firstchar then count+1;

   else i=20;   /* could also try leave instead */

end;

drop firstchar i;

run;

View solution in original post


All Replies
Super User
Posts: 19,156

Re: count the number of successive repeating characters in string

Useful functions CHAR() and COUNTC()

 

This may help you get started:

https://communities.sas.com/t5/SAS-Enterprise-Guide/count-the-number-of-most-occuring-consecutive-ch...

Super User
Posts: 7,431

Re: count the number of successive repeating characters in string

data have;
input string $18.;
cards;
00000
100000
1100000222
02200000
002200000
;
run;

data want;
set have;
count = 0;
do while (substr(string,count+2,1) = substr(string,1,1));
  count + 1;
end;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎11-08-2016 08:33 AM
Super User
Posts: 5,366

Re: count the number of successive repeating characters in string

It can get tricky if you have 18 identical digits.  Here's a way to work with that:

 

data want;

set have;

length firstchar $ 1;

firstchar = myString;

count=0;

do i=2 to 18;

   if substr(myString, i, 1) = firstchar then count+1;

   else i=20;   /* could also try leave instead */

end;

drop firstchar i;

run;

Super User
Posts: 7,431

Re: count the number of successive repeating characters in string

Since both the substr() and the char() functions simply return a blank when the index lies outside the size of the string variable, a complete sequence of identical digits poses no problem for my code; run this for a test:

data have;
input string $18.;
cards;
00000
100000
1100000222
02200000
002200000
111111111111111111
;
run;

data want;
set have;
count = 0;
do while (char(string,count+2) = char(string,1));
  count + 1;
end;
run;

The result of the 18-digit string is 17, as (IMO) intended by the OP.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor ak2
Contributor
Posts: 27

Re: count the number of successive repeating characters in string

This solution is very intuitive and it gave me correct answer on s super small data set, but implementing this on a larger data set turned out to have a bad performance.
Super User
Posts: 5,366

Re: count the number of successive repeating characters in string

Kurt,

 

I didn't test CHAR, but SUBSTR can have problems.

 

         
4          data test1;
5          test='12345';
6          test2 = substr(test, 6, 1);
7          put _all_;
8          run;

NOTE: Invalid second argument to function SUBSTR at line 6 column 9.
test=12345 test2=  _ERROR_=1 _N_=1
test=12345 test2=  _ERROR_=1 _N_=1
NOTE: The data set WORK.TEST1 has 1 observations and 2 variables.

Super User
Posts: 7,431

Re: count the number of successive repeating characters in string

@Astounding you're right, char() is more robust than substr() when the index flows over.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor ak2
Contributor
Posts: 27

Re: count the number of successive repeating characters in string

Thank you so much for your help! This worked fine. No performance issue.

Super User
Super User
Posts: 7,720

Re: count the number of successive repeating characters in string

A pure function orientated (i.e.not looping) version:

data have;
input string $18.;
cards;
00000
100000
1100000222
02200000
002200000
111111111111111111
;
run;

data want;
  set have;
  num_before=lengthn(scan(string,1,char(string,1),"k"));
  num_after=lengthn(string)-num_before;
run;

It simply uses the first character as a delimiter and length's scan(1).

Super User
Posts: 9,867

Re: count the number of successive repeating characters in string


data have;
input string $18.;
cards;
00000
100000
1100000222
02200000
002200000
;
run;
data want;
 set have;
 pid=prxparse('/^(\d)\1*/');
 call prxsubstr(pid,string,p,l);
 want=l-1;
 drop pid p l;
run;

Contributor ak2
Contributor
Posts: 27

Re: count the number of successive repeating characters in string

This one works fine as well. Thanks!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 424 views
  • 2 likes
  • 6 in conversation