DATA Step, Macro, Functions and more

Regular Expression Python to SAS

Reply
Super User
Posts: 23,683

Regular Expression Python to SAS

I'm converting some code from Python to SAS and stuck at one point - regular expressions. Can someone confirm my assumption is correct please?

 

I think this line:

df['VALUE'] = df['VALUE'].replace(r'([a-z,A-Z])\w+',0)

 

Basically removes all character values and replaces it with 0?

Is this interpretation correct?


I can't run the program on the file otherwise I'd test it by running each program and comparing the results. 

PROC Star
Posts: 2,339

Re: Regular Expression Python to SAS

My python is basic, but as far as I know the syntax is:

NEW_STR=re.replace(REGEX, REPLACEMENT , OLD_STR)

 

1. You need the re. call (what's the value of df['VALUE'] ?)  for the RegEx parser to be used

2. You need 3 parameters

3. I don't see how numbers are allowed

 

Regardless, the regular expression here matches: 

- one lower case letter or comma or upper case letter

- followed by word characters (this includes digits and underscore)

 

Frequent Contributor
Posts: 99

Re: Regular Expression Python to SAS

 

Are these pandas dataframes?  If so I would do it like this:

 

import re


...


df['VAL'] = df['VAL'].apply(lambda x: re.sub('[A-Z]+', '0', x, flags=re.I))

 

 

Super User
Posts: 23,683

Re: Regular Expression Python to SAS

Posted in reply to tomcmacdonald

It is panda's but I'm going in the opposite direction, from Pandas/Python to SAS code. The data is too big to be handled well in python. 

Respected Advisor
Posts: 4,736

Re: Regular Expression Python to SAS

@Reeza

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter, the 2nd to n can be alphanumeric or an underscore.

data sample;
  infile datalines truncover;
  input source $char10.;
  format target $char10.;
  target=prxchange('s/[a-z,A-Z]\w+/0/i',1,source);
  datalines;
Abc9xy
a
a9
   _a
   a_
   _a_
123_a
  123_ab
123a_
 a123a_
 X_123a_
 X_123a_ bb
123_abc de
;
run;
PROC Star
Posts: 2,339

Re: Regular Expression Python to SAS

@Patrick

 

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter, the 2nd to n can be alphanumeric or an underscore.

 

You mean:

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter or a comma, the 2nd one can be alphanumeric or an underscore.

Respected Advisor
Posts: 4,736

Re: Regular Expression Python to SAS

@ChrisNZ

True, missed the comma.

Ask a Question
Discussion stats
  • 6 replies
  • 102 views
  • 0 likes
  • 4 in conversation