BookmarkSubscribeRSS Feed
pr1
Calcite | Level 5 pr1
Calcite | Level 5

This code is giving me false positive. 

I have character data that has ID Numbers.  It is 9 character long string and has numbers as char data from 0 to 9.  I am trying to identify if 5 or more consecutive characters are the same.  If yes, then I will create a flag. 

I have this code below.  It works most of the time but also gives me false positive.  For example, it will pick up something like 121341111 – where the ‘1’ is within the string 5 or more times.

I want to identify only if a character is present consecutively 5 or more times.  121341111 should not be flagged as 1 repeated consecutively only 4 times. 

Any idea?

data want(drop = i) ;

  set have ;

  length ssn_char ssn_rept_chars $9;

  ssn_char = ssn;

  do i=1 to 6 until (flag=1);

  if substr(ssn_char, i, 1) = substr(ssn_char, i+1, 1) = substr(ssn_char, i+2, 1) = substr(ssn_char, i+3, 1

      then flag=1;

if flag = 1 then ssn_rept_chars = ssn_char;

  end;

run;

9 REPLIES 9
jwillis
Quartz | Level 8

This works.  I believe there is a slicker, elegant way to value checkit.

data have;

  ssn = 123456789;

  output;

ssn = 111116789;

  output;

ssn = 123455555;

  output;

ssn = 123333339;

  output;

  run;

data want(drop = i) ;

  set have ;

  length ssn_char ssn_rept_chars $9;

  ssn_char = ssn;

  do i=1 to 5 until (flag=1);

  checkit = substr(ssn_char, i, 1)||substr(ssn_char, i, 1)||substr(ssn_char, i, 1)||

                          substr(ssn_char, i, 1)||substr(ssn_char, i, 1) ;

  if checkit = substr(ssn_char, i, 1)||substr(ssn_char, i+1, 1)||

     substr(ssn_char, i+2, 1)||substr(ssn_char, i+3, 1)||substr(ssn_char, i+4, 1)

      then do;

           flag=1;

    put i= checkit= flag=;

           ssn_rept_chars = ssn_char;

  end;

end;

run;

ballardw
Super User

data want;

        set have;

      array a{10}$5  _temporary_ ('00000' '11111' '22222' '33333' '44444' '55555' '66666' '77777' '88888' '99999');

      _i_=1;

      do until (flag=1 or _i_=11);

            flag= (index(ssn,a[_i_])>0);

            _i_+1;

        end;

run;

Perhaps.

Would have to get slick if looking for any character repeated though

Astounding
PROC Star

It's giving you the false positives because you are only comparing 4 characters, not comparing 5 characters.  To compare 5 characters, two changes would be needed.  First, i should go from 1 to 5, not 1 to 6:

do i=1 to 5 until (flag=1);

Second, add another character to the list of comparisons:

... = substr(ssn_char, i+4, 1) then flag=1;

Good luck.

gergely_batho
SAS Employee

data want;

  set have;

  flag=prxmatch('/.*(\d)\1{4,4}.*/',ssn);

run;

Patrick
Opal | Level 21

Clearly a case for a Regular Expression. As a small variation to

data want;

  set have;

  flag=prxmatch('/.*(\d)\1{4,4}.*/',ssn);;

  flag2= prxmatch('/(\d)\1{4}/',ssn)>0;

run;

MarkWik
Quartz | Level 8

Hi Can you please help in letting me know the best and easiest PRX Functions documentation for a novice or first time user to comfortably understand.  Many thanks.

Patrick
Opal | Level 21

Hi

Perl Regular Expressions are not SAS specific so I'm sure there is a lot of stuff around. I don't know something specific I could recommend.

Within SAS:

SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

...and once you understand which SAS functions allow you to use Perl Regular Expressions (functions starting with "prx..") then the most important page is: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Because Perl Regular Expressions are not SAS specific there are a lot of expressions published and searching the Internet will very often allow to find something which comes close what you need.

Oh, and the Tip Sheet can also be useful in the beginning: https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

MarkWik
Quartz | Level 8

Thanks very much, so does that mean it is generally aimed for people who are already proficient in the use of Perl scripting language?Hmm if yes, I wonder how many languages a person like me with average to below average intelligence can learn:smileyconfused:. I appreciate your very quick response. Cheers

Patrick
Opal | Level 21

I could learn it with "Googling" and "try and error" - so you can too!

You don't need to learn Perl for RegEx - Perl just implemented a syntax for Regular Expression which became a quasi standard.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2965 views
  • 2 likes
  • 7 in conversation