Hi! I have a string made up of digits which is 18 characters long. I need to count the number of repeating digits starting from the beginning of the string. That is, given substr(myString,1,1), how many repeating characters are in the rest of the string starting at substr(myString,2,18).
Example:
'00000' should output 4 'coz there are 4 0s following the one at the beginning of the string
'100000' should output 0 'coz there are no following 1s after the first one
'1100000222' should output 1 'coz there is one following 1 after the first one
'02200000' should output 0 'coz there are no 0s following the one at the beginning of the string
'002200000' should output 1 'coz there is one 0 following the one at the beginning of the string
Can anyone help me with this? I'm in sas eg 7 12.
It can get tricky if you have 18 identical digits. Here's a way to work with that:
data want;
set have;
length firstchar $ 1;
firstchar = myString;
count=0;
do i=2 to 18;
if substr(myString, i, 1) = firstchar then count+1;
else i=20; /* could also try leave instead */
end;
drop firstchar i;
run;
Useful functions CHAR() and COUNTC()
This may help you get started:
data have;
input string $18.;
cards;
00000
100000
1100000222
02200000
002200000
;
run;
data want;
set have;
count = 0;
do while (substr(string,count+2,1) = substr(string,1,1));
count + 1;
end;
run;
It can get tricky if you have 18 identical digits. Here's a way to work with that:
data want;
set have;
length firstchar $ 1;
firstchar = myString;
count=0;
do i=2 to 18;
if substr(myString, i, 1) = firstchar then count+1;
else i=20; /* could also try leave instead */
end;
drop firstchar i;
run;
Since both the substr() and the char() functions simply return a blank when the index lies outside the size of the string variable, a complete sequence of identical digits poses no problem for my code; run this for a test:
data have;
input string $18.;
cards;
00000
100000
1100000222
02200000
002200000
111111111111111111
;
run;
data want;
set have;
count = 0;
do while (char(string,count+2) = char(string,1));
count + 1;
end;
run;
The result of the 18-digit string is 17, as (IMO) intended by the OP.
Kurt,
I didn't test CHAR, but SUBSTR can have problems.
4 data test1;
5 test='12345';
6 test2 = substr(test, 6, 1);
7 put _all_;
8 run;
NOTE: Invalid second argument to function SUBSTR at line 6 column 9.
test=12345 test2= _ERROR_=1 _N_=1
test=12345 test2= _ERROR_=1 _N_=1
NOTE: The data set WORK.TEST1 has 1 observations and 2 variables.
@Astounding you're right, char() is more robust than substr() when the index flows over.
Thank you so much for your help! This worked fine. No performance issue.
A pure function orientated (i.e.not looping) version:
data have; input string $18.; cards; 00000 100000 1100000222 02200000 002200000 111111111111111111 ; run; data want; set have; num_before=lengthn(scan(string,1,char(string,1),"k")); num_after=lengthn(string)-num_before; run;
It simply uses the first character as a delimiter and length's scan(1).
data have; input string $18.; cards; 00000 100000 1100000222 02200000 002200000 ; run; data want; set have; pid=prxparse('/^(\d)\1*/'); call prxsubstr(pid,string,p,l); want=l-1; drop pid p l; run;
This one works fine as well. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.