Help using Base SAS procedures

How to identify consecutively repeating characters

Reply
Occasional Contributor pr1
Occasional Contributor
Posts: 15

How to identify consecutively repeating characters

This code is giving me false positive. 

I have character data that has ID Numbers.  It is 9 character long string and has numbers as char data from 0 to 9.  I am trying to identify if 5 or more consecutive characters are the same.  If yes, then I will create a flag. 

I have this code below.  It works most of the time but also gives me false positive.  For example, it will pick up something like 121341111 – where the ‘1’ is within the string 5 or more times.

I want to identify only if a character is present consecutively 5 or more times.  121341111 should not be flagged as 1 repeated consecutively only 4 times. 

Any idea?

data want(drop = i) ;

  set have ;

  length ssn_char ssn_rept_chars $9;

  ssn_char = ssn;

  do i=1 to 6 until (flag=1);

  if substr(ssn_char, i, 1) = substr(ssn_char, i+1, 1) = substr(ssn_char, i+2, 1) = substr(ssn_char, i+3, 1

      then flag=1;

if flag = 1 then ssn_rept_chars = ssn_char;

  end;

run;

Regular Contributor
Posts: 217

Re: How to identify consecutively repeating characters

This works.  I believe there is a slicker, elegant way to value checkit.

data have;

  ssn = 123456789;

  output;

ssn = 111116789;

  output;

ssn = 123455555;

  output;

ssn = 123333339;

  output;

  run;

data want(drop = i) ;

  set have ;

  length ssn_char ssn_rept_chars $9;

  ssn_char = ssn;

  do i=1 to 5 until (flag=1);

  checkit = substr(ssn_char, i, 1)||substr(ssn_char, i, 1)||substr(ssn_char, i, 1)||

                          substr(ssn_char, i, 1)||substr(ssn_char, i, 1) ;

  if checkit = substr(ssn_char, i, 1)||substr(ssn_char, i+1, 1)||

     substr(ssn_char, i+2, 1)||substr(ssn_char, i+3, 1)||substr(ssn_char, i+4, 1)

      then do;

           flag=1;

    put i= checkit= flag=;

           ssn_rept_chars = ssn_char;

  end;

end;

run;

Super User
Posts: 11,343

Re: How to identify consecutively repeating characters

data want;

        set have;

      array a{10}$5  _temporary_ ('00000' '11111' '22222' '33333' '44444' '55555' '66666' '77777' '88888' '99999');

      _i_=1;

      do until (flag=1 or _i_=11);

            flag= (index(ssn,a[_i_])>0);

            _i_+1;

        end;

run;

Perhaps.

Would have to get slick if looking for any character repeated though

Super User
Posts: 5,516

Re: How to identify consecutively repeating characters

It's giving you the false positives because you are only comparing 4 characters, not comparing 5 characters.  To compare 5 characters, two changes would be needed.  First, i should go from 1 to 5, not 1 to 6:

do i=1 to 5 until (flag=1);

Second, add another character to the list of comparisons:

... = substr(ssn_char, i+4, 1) then flag=1;

Good luck.

SAS Employee
Posts: 340

Re: How to identify consecutively repeating characters

data want;

  set have;

  flag=prxmatch('/.*(\d)\1{4,4}.*/',ssn);

run;

Respected Advisor
Posts: 4,173

Re: How to identify consecutively repeating characters

Clearly a case for a Regular Expression. As a small variation to

data want;

  set have;

  flag=prxmatch('/.*(\d)\1{4,4}.*/',ssn);;

  flag2= prxmatch('/(\d)\1{4}/',ssn)>0;

run;

Frequent Contributor
Posts: 75

Re: How to identify consecutively repeating characters

Hi Can you please help in letting me know the best and easiest PRX Functions documentation for a novice or first time user to comfortably understand.  Many thanks.

Respected Advisor
Posts: 4,173

Re: How to identify consecutively repeating characters

Hi

Perl Regular Expressions are not SAS specific so I'm sure there is a lot of stuff around. I don't know something specific I could recommend.

Within SAS:

SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

...and once you understand which SAS functions allow you to use Perl Regular Expressions (functions starting with "prx..") then the most important page is: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Because Perl Regular Expressions are not SAS specific there are a lot of expressions published and searching the Internet will very often allow to find something which comes close what you need.

Oh, and the Tip Sheet can also be useful in the beginning: https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

Frequent Contributor
Posts: 75

Re: How to identify consecutively repeating characters

Thanks very much, so does that mean it is generally aimed for people who are already proficient in the use of Perl scripting language?Hmm if yes, I wonder how many languages a person like me with average to below average intelligence can learn:smileyconfused:. I appreciate your very quick response. Cheers

Respected Advisor
Posts: 4,173

Re: How to identify consecutively repeating characters

I could learn it with "Googling" and "try and error" - so you can too!

You don't need to learn Perl for RegEx - Perl just implemented a syntax for Regular Expression which became a quasi standard.

Ask a Question
Discussion stats
  • 9 replies
  • 670 views
  • 2 likes
  • 7 in conversation