Help using Base SAS procedures

How to identify repeating characters in the string.

Accepted Solution Solved
Reply
Occasional Contributor pr1
Occasional Contributor
Posts: 15
Accepted Solution

How to identify repeating characters in the string.

Hello, I have a 9 character data field and I need to identify if any 3 of the consecutive characters in that string are the same.  Any ideas on how to achieve this?

Thanks.

Pramodini


Accepted Solutions
Solution
‎03-26-2015 10:46 AM
Super User
Posts: 5,081

Re: How to identify repeating characters in the string.

Assuming  you want an exact match on any possible character, here is a way:

data want;

   set have;

   do i=1 to 7 until (flag=1);

      if substr(var, i, 1) = substr(var, i+1, 1) = substr(var, i+2, 1) then flag=1;

   end;

run;

Good luck.

View solution in original post


All Replies
Trusted Advisor
Posts: 1,128

Re: How to identify repeating characters in the string.

could you please give an example data

Thanks,
Jag
Super User
Posts: 10,486

Re: How to identify repeating characters in the string.

Does case matter? For example does "aAa" count as the "same character"? Are any of the characters special characters such as punctuation, ()!@#$%^&*-_=+/><\|][{}

Do multiple spaces count?

Solution
‎03-26-2015 10:46 AM
Super User
Posts: 5,081

Re: How to identify repeating characters in the string.

Assuming  you want an exact match on any possible character, here is a way:

data want;

   set have;

   do i=1 to 7 until (flag=1);

      if substr(var, i, 1) = substr(var, i+1, 1) = substr(var, i+2, 1) then flag=1;

   end;

run;

Good luck.

Contributor
Posts: 24

Re: How to identify repeating characters in the string.

Hey. I was getting the correct result but I'm getting a warning too. So I modified a bit and tried but can't figure it out.

Would it be possible for you to tell me why I am getting the below warning and how can I fix it.

Thanks.

data have;

var="Sasexampleee";output;

var="Sasexampplee";output;

var="Sasexammmppl";output;

var="Sasexampl";output;

run;

data want;

   set have;

   leng=length(var);

   do i=1 to leng;

      if substr(var, i, 1) = substr(var, i+1, 1) = substr(var, i+2, 1) then flag=1;

   end;

run;

NOTE: Invalid second argument to function SUBSTR at line 27 column 52.

NOTE: Invalid second argument to function SUBSTR at line 27 column 30.

NOTE: Invalid second argument to function SUBSTR at line 27 column 52.

var=Sasexampplee leng=12 i=13 flag=. _ERROR_=1 _N_=2

NOTE: Invalid second argument to function SUBSTR at line 27 column 52.

var=Sasexamplee leng=11 i=12 flag=. _ERROR_=1 _N_=4

NOTE: There were 4 observations read from the data set WORK.HAVE.

Respected Advisor
Posts: 4,644

Re: How to identify repeating characters in the string.

When i = leng in your loop, you look at character position i+2 which is beyond the length of the variable. You can simply stop the loop sooner:

data want;

   set have;

   leng=length(var) - 2;

   do i=1 to leng;

      if substr(var, i, 1) = substr(var, i+1, 1) = substr(var, i+2, 1) then flag=1;

   end;

run;

PG

PG
Contributor
Posts: 24

Re: How to identify repeating characters in the string.

Worked well. Thanks.

Trusted Advisor
Posts: 1,128

Re: How to identify repeating characters in the string.

Also you may try,

data have;

  char='a b b a c c c';

do     i = 1 to 10;

new2=scan(char,i,' ');

output;

end;

run;

data want;

set have;

by notsorted new2;

retain count 0;

if first.new2 then count=1;

else count+1;

if last.new2 and count=3 and new2 ne '';

run;

Thanks,
Jag

Thanks,
Jag
Occasional Contributor pr1
Occasional Contributor
Posts: 15

Re: How to identify repeating characters in the string.

THANK YOU all!!!  This works.  

Appreciate all the help. 

Respected Advisor
Posts: 3,124

Re: How to identify repeating characters in the string.

Come on, there has to be PRX solution for this Smiley Happy

data test;

infile cards truncover;

input str $ 100.;

flag=prxmatch('m/(\S)\1{2}/o',str)>0;

cards;

adlsfkj888adklfj

alkjahfkldjhklajdhfklj

akljsd******alkdfkj

;

Trusted Advisor
Posts: 1,300

Re: How to identify repeating characters in the string.

Here is an explaination of the regular expression:

(\S)\1{2}

Match the regular expression below and capture its match into backreference number 1 «(\S)»

   Match a single character that is a “non-whitespace character” «\S»

Match the same text as most recently matched by capturing group number 1 «\1{2}»

   Exactly 2 times «{2}»

The m before the starting delimiter (/) I would say is not relevant here.

The o after the ending delimiter (/) is an optimizer that tells SAS the expression can be held and reused without recompilation throughout the execution of the data step.

See the documentation:

     http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002295977.htm

Compiling a Perl Regular Expression

If perl-regular-expression is a constant or if it uses the /o option, the Perl regular expression is compiled only once. Successive calls to PRXPARSE will not cause a recompile, but will return the regular-expression-id for the regular expression that was already compiled. This behavior simplifies the code because you do not need to use an initialization block (IF _N_ =1) to initialize Perl regular expressions.

Note:   If you have a Perl regular expression that is a constant, or if the regular expression uses the /o option, then calling PRXFREE to free the memory allocation results in the need to recompile the regular expression the next time that it is called by PRXPARSE.

The compile-once behavior occurs when you use PRXPARSE in a DATA step. For all other uses, the perl-regular-expression is recompiled for each call to PRXPARSE.   [cautionend]

Frequent Contributor
Posts: 115

Re: How to identify repeating characters in the string.

Hi A small request and sorry it's off topic, I have noticed you using APPC functions extremely well as opposed many other major contributors. Can you please provide me a link documentation that explains pretty much well in detail for that one too? I'd appreciate it so much. Thanks.

Trusted Advisor
Posts: 1,300

Re: How to identify repeating characters in the string.

naveen_srini,

Rather than taking this post off-topic, it would be more prudent to post a new question.  I will gladly answer you to the best that I can, as will others, I'm sure.

Respected Advisor
Posts: 4,644

Re: How to identify repeating characters in the string.

Does anyone know if the "compile-once" behavior also occurs in SQL expressions?

Pg

PG
Trusted Advisor
Posts: 1,300

Re: How to identify repeating characters in the string.

PG, it does have a similar behavior in SQL

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 1325 views
  • 13 likes
  • 9 in conversation