How to identify consecutively repeating characters

pr1 · Posted 04-02-2015 03:15 PM

This code is giving me false positive.

I have character data that has ID Numbers. It is 9 character long string and has numbers as char data from 0 to 9. I am trying to identify if 5 or more consecutive characters are the same. If yes, then I will create a flag.

I have this code below. It works most of the time but also gives me false positive. For example, it will pick up something like 121341111 – where the ‘1’ is within the string 5 or more times.

I want to identify only if a character is present consecutively 5 or more times. 121341111 should not be flagged as 1 repeated consecutively only 4 times.

Any idea?

data want(drop = i) ;

set have ;

length ssn_char ssn_rept_chars $9;

ssn_char = ssn;

do i=1 to 6 until (flag=1);

if substr(ssn_char, i, 1) = substr(ssn_char, i+1, 1) = substr(ssn_char, i+2, 1) = substr(ssn_char, i+3, 1)

then flag=1;

if flag = 1 then ssn_rept_chars = ssn_char;

end;

run;

jwillis · Posted 04-02-2015 03:33 PM

This works. I believe there is a slicker, elegant way to value checkit.

data have;

ssn = 123456789;

output;

ssn = 111116789;

output;

ssn = 123455555;

output;

ssn = 123333339;

output;

run;

data want(drop = i) ;

set have ;

length ssn_char ssn_rept_chars $9;

ssn_char = ssn;

do i=1 to 5 until (flag=1);

checkit = substr(ssn_char, i, 1)||substr(ssn_char, i, 1)||substr(ssn_char, i, 1)||

substr(ssn_char, i, 1)||substr(ssn_char, i, 1) ;

if checkit = substr(ssn_char, i, 1)||substr(ssn_char, i+1, 1)||

substr(ssn_char, i+2, 1)||substr(ssn_char, i+3, 1)||substr(ssn_char, i+4, 1)

then do;

flag=1;

put i= checkit= flag=;

ssn_rept_chars = ssn_char;

end;

run;

ballardw · Posted 04-02-2015 04:23 PM

data want;

set have;

array a{10}$5 _temporary_ ('00000' '11111' '22222' '33333' '44444' '55555' '66666' '77777' '88888' '99999');

_i_=1;

do until (flag=1 or _i_=11);

flag= (index(ssn,a[_i_])>0);

_i_+1;

end;

run;

Perhaps.

Would have to get slick if looking for any character repeated though

Astounding · Posted 04-02-2015 04:26 PM

It's giving you the false positives because you are only comparing 4 characters, not comparing 5 characters. To compare 5 characters, two changes would be needed. First, i should go from 1 to 5, not 1 to 6:

do i=1 to 5 until (flag=1);

Second, add another character to the list of comparisons:

... = substr(ssn_char, i+4, 1) then flag=1;

Good luck.

gergely_batho · Posted 04-03-2015 01:54 AM

data want;

set have;

flag=prxmatch('/.*(\d)\1{4,4}.*/',ssn);

run;

Patrick · Posted 04-03-2015 04:28 AM

Clearly a case for a Regular Expression. As a small variation to

data want;

set have;

flag=prxmatch('/.*(\d)\1{4,4}.*/',ssn);;

flag2= prxmatch('/(\d)\1{4}/',ssn)>0;

run;

MarkWik · Posted 04-03-2015 05:24 AM

Hi Can you please help in letting me know the best and easiest PRX Functions documentation for a novice or first time user to comfortably understand. Many thanks.

Patrick · Posted 04-03-2015 05:32 AM

Hi

Perl Regular Expressions are not SAS specific so I'm sure there is a lot of stuff around. I don't know something specific I could recommend.

Within SAS:

SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

...and once you understand which SAS functions allow you to use Perl Regular Expressions (functions starting with "prx..") then the most important page is: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Because Perl Regular Expressions are not SAS specific there are a lot of expressions published and searching the Internet will very often allow to find something which comes close what you need.

Oh, and the Tip Sheet can also be useful in the beginning: https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

MarkWik · Posted 04-03-2015 05:38 AM

Thanks very much, so does that mean it is generally aimed for people who are already proficient in the use of Perl scripting language?Hmm if yes, I wonder how many languages a person like me with average to below average intelligence can learn:smileyconfused:. I appreciate your very quick response. Cheers

Patrick · Posted 04-03-2015 05:40 AM

I could learn it with "Googling" and "try and error" - so you can too!

You don't need to learn Perl for RegEx - Perl just implemented a syntax for Regular Expression which became a quasi standard.

How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Re: How to identify consecutively repeating characters

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away