DATA Step, Macro, Functions and more

Creating a Perl Expression

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

Creating a Perl Expression

I am a newbie to Perl discussions and hoping that someone can help me with this question.  I am trying to identfiy and remove Patient ids in a free form text string using Perl expression - the conditions are as follows:

1. The ids are always 8 bytes and can contains letters(upper or lowercase) and numbers.

2. The ids will always contain at least 1 digit.

Here's the expression I tried to build but I end up picking all 8 letter words.  The negated class does not seem to work either. 

Here's my code:

id_re=prxparse('s/\b[a-zA-Z0-9]{8}\b/ &id removed& /')

Any ideas - how can find the 8 byte ids.


Accepted Solutions
Solution
‎09-06-2011 09:21 AM
PROC Star
Posts: 7,366

Re: Creating a Perl Expression

I crossposted your request on a similar forum (i.e., SAS-L) and a friend/sas/Perl expert (i.e., Toby Dunn) offered the following solution to the problem you raised:

data have;

length stuff $ 80;

input Stuff & ;

cards;

Now is the time for all good men and women

to come to the gh5567AA aid of their party

Or, was it 4567890 or 45678901 that caused

the problems problems.

4567890

45678901

1234_678

1234/678

ABC4EFGH

;

data want;

  set Have ;

  stuff2 = PrxChange( 's/(?=\b[A-Z0-9]{8}\b)\b[A-Z0-9]*\d[A-Z0-9]*\b//oi' , -1 , Stuff ) ;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,366

Creating a Perl Expression

I'm just starting to learn regular expressions, thus can't be of much help.

I think the expression you want is: ^(?=[a-zA-z0-9]*\d).{8,8}$

That would match an eight character string that contained only letters and or numbers, and contained at least one number.

Unfortunately, I don't know how to implement it.

SAS Employee
Posts: 104

Re: Creating a Perl Expression

Try this:

 
data ID_REDACTED;
   infile datalines truncover;
   input text1 $100.;
   text2 = prxchange('s/\s\w1\d+\s/ *REDACTED* /', -1, text1);
   datalines;
This patent ID abcdef01 should be removed.
This is NOT a patent ID abcdefgh and should remain.
This is NOT a patent ID 123455 and should remain.
;
run;

Message was edited by: Mark Jordan to improve the Regular Expression

PROC Star
Posts: 7,366

Creating a Perl Expression

I don't know if the following is quite what you had in mind, but it does appear to eliminate the unwanted IDs:

data have;

  length stuff $20;

  input;

  i=1;

  do until (scan(_infile_,i," ") eq "");

    stuff=scan(_infile_,i," ");

    i+1;

    output;

  end;

  cards;

Now is the time for all good men and women

to come to the GG5567AA aid of their party

Or, was it 4567890 or 45678901 that caused

the problems.

4567890

45678901

ABC4EFGH

;

data want notwant;

  set have;

  want = PrxChange( 's/^\b(?=[A-Z0-9]*\d).{8,8}\b//io' ,

         -1 , Stuff );

  if want ne "" then output want;

  else output notwant;

run;

Frequent Contributor
Frequent Contributor
Posts: 94

Creating a Perl Expression

I find this page to be quite helpful in building/testing regex: http://gskinner.com/RegExr/ (uses Flash).

Sometimes SAS needs a few adjustments to work, either for non-standard stuff or it's own quirks - but it's a good site nonetheless.  There's allot of examples, and the tool will help show you the meanings of the different codes.

Solution
‎09-06-2011 09:21 AM
PROC Star
Posts: 7,366

Re: Creating a Perl Expression

I crossposted your request on a similar forum (i.e., SAS-L) and a friend/sas/Perl expert (i.e., Toby Dunn) offered the following solution to the problem you raised:

data have;

length stuff $ 80;

input Stuff & ;

cards;

Now is the time for all good men and women

to come to the gh5567AA aid of their party

Or, was it 4567890 or 45678901 that caused

the problems problems.

4567890

45678901

1234_678

1234/678

ABC4EFGH

;

data want;

  set Have ;

  stuff2 = PrxChange( 's/(?=\b[A-Z0-9]{8}\b)\b[A-Z0-9]*\d[A-Z0-9]*\b//oi' , -1 , Stuff ) ;

run;

Contributor
Posts: 27

Creating a Perl Expression

Thank you so much!! This seems to do it! 

Thanks to everyone who replied - I have learnt a lot from all of you suggestions!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 214 views
  • 6 likes
  • 4 in conversation