This code is giving me false positive.
I have character data that has ID Numbers. It is 9 character long string and has numbers as char data from 0 to 9. I am trying to identify if 5 or more consecutive characters are the same. If yes, then I will create a flag.
I have this code below. It works most of the time but also gives me false positive. For example, it will pick up something like 121341111 – where the ‘1’ is within the string 5 or more times.
I want to identify only if a character is present consecutively 5 or more times. 121341111 should not be flagged 1 repeated consecutively only 4 times.
Any idea?
data want(drop = i) ;
set have ;
length ssn_char ssn_rept_chars $9;
ssn_char = ssn;
do i=1 to 6 until (flag=1);
if substr(ssn_char, i, 1) = substr(ssn_char, i+1, 1) = substr(ssn_char, i+2, 1) = substr(ssn_char, i+3, 1)
then flag=1;
if flag = 1 then ssn_rept_chars = ssn_char;
end;
run;