BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS09
Calcite | Level 5

I am a newbie to Perl discussions and hoping that someone can help me with this question.  I am trying to identfiy and remove Patient ids in a free form text string using Perl expression - the conditions are as follows:

1. The ids are always 8 bytes and can contains letters(upper or lowercase) and numbers.

2. The ids will always contain at least 1 digit.

Here's the expression I tried to build but I end up picking all 8 letter words.  The negated class does not seem to work either. 

Here's my code:

id_re=prxparse('s/\b[a-zA-Z0-9]{8}\b/ &id removed& /')

Any ideas - how can find the 8 byte ids.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I crossposted your request on a similar forum (i.e., SAS-L) and a friend/sas/Perl expert (i.e., Toby Dunn) offered the following solution to the problem you raised:

data have;

length stuff $ 80;

input Stuff & ;

cards;

Now is the time for all good men and women

to come to the gh5567AA aid of their party

Or, was it 4567890 or 45678901 that caused

the problems problems.

4567890

45678901

1234_678

1234/678

ABC4EFGH

;

data want;

  set Have ;

  stuff2 = PrxChange( 's/(?=\b[A-Z0-9]{8}\b)\b[A-Z0-9]*\d[A-Z0-9]*\b//oi' , -1 , Stuff ) ;

run;

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

I'm just starting to learn regular expressions, thus can't be of much help.

I think the expression you want is: ^(?=[a-zA-z0-9]*\d).{8,8}$

That would match an eight character string that contained only letters and or numbers, and contained at least one number.

Unfortunately, I don't know how to implement it.

SASJedi
SAS Super FREQ

Try this:

 
data ID_REDACTED;
   infile datalines truncover;
   input text1 $100.;
   text2 = prxchange('s/\s\w1\d+\s/ *REDACTED* /', -1, text1);
   datalines;
This patent ID abcdef01 should be removed.
This is NOT a patent ID abcdefgh and should remain.
This is NOT a patent ID 123455 and should remain.
;
run;

Message was edited by: Mark Jordan to improve the Regular Expression

Check out my Jedi SAS Tricks for SAS Users
art297
Opal | Level 21

I don't know if the following is quite what you had in mind, but it does appear to eliminate the unwanted IDs:

data have;

  length stuff $20;

  input;

  i=1;

  do until (scan(_infile_,i," ") eq "");

    stuff=scan(_infile_,i," ");

    i+1;

    output;

  end;

  cards;

Now is the time for all good men and women

to come to the GG5567AA aid of their party

Or, was it 4567890 or 45678901 that caused

the problems.

4567890

45678901

ABC4EFGH

;

data want notwant;

  set have;

  want = PrxChange( 's/^\b(?=[A-Z0-9]*\d).{8,8}\b//io' ,

         -1 , Stuff );

  if want ne "" then output want;

  else output notwant;

run;

DF
Fluorite | Level 6 DF
Fluorite | Level 6

I find this page to be quite helpful in building/testing regex: http://gskinner.com/RegExr/ (uses Flash).

Sometimes SAS needs a few adjustments to work, either for non-standard stuff or it's own quirks - but it's a good site nonetheless.  There's allot of examples, and the tool will help show you the meanings of the different codes.

art297
Opal | Level 21

I crossposted your request on a similar forum (i.e., SAS-L) and a friend/sas/Perl expert (i.e., Toby Dunn) offered the following solution to the problem you raised:

data have;

length stuff $ 80;

input Stuff & ;

cards;

Now is the time for all good men and women

to come to the gh5567AA aid of their party

Or, was it 4567890 or 45678901 that caused

the problems problems.

4567890

45678901

1234_678

1234/678

ABC4EFGH

;

data want;

  set Have ;

  stuff2 = PrxChange( 's/(?=\b[A-Z0-9]{8}\b)\b[A-Z0-9]*\d[A-Z0-9]*\b//oi' , -1 , Stuff ) ;

run;

SAS09
Calcite | Level 5

Thank you so much!! This seems to do it! 

Thanks to everyone who replied - I have learnt a lot from all of you suggestions!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1265 views
  • 6 likes
  • 4 in conversation