BookmarkSubscribeRSS Feed
Reeza
Super User

I'm converting some code from Python to SAS and stuck at one point - regular expressions. Can someone confirm my assumption is correct please?

 

I think this line:

df['VALUE'] = df['VALUE'].replace(r'([a-z,A-Z])\w+',0)

 

Basically removes all character values and replaces it with 0?

Is this interpretation correct?


I can't run the program on the file otherwise I'd test it by running each program and comparing the results. 

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

My python is basic, but as far as I know the syntax is:

NEW_STR=re.replace(REGEX, REPLACEMENT , OLD_STR)

 

1. You need the re. call (what's the value of df['VALUE'] ?)  for the RegEx parser to be used

2. You need 3 parameters

3. I don't see how numbers are allowed

 

Regardless, the regular expression here matches: 

- one lower case letter or comma or upper case letter

- followed by word characters (this includes digits and underscore)

 

tomcmacdonald
Quartz | Level 8

 

Are these pandas dataframes?  If so I would do it like this:

 

import re


...


df['VAL'] = df['VAL'].apply(lambda x: re.sub('[A-Z]+', '0', x, flags=re.I))

 

 

Reeza
Super User

It is panda's but I'm going in the opposite direction, from Pandas/Python to SAS code. The data is too big to be handled well in python. 

Patrick
Opal | Level 21

@Reeza

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter, the 2nd to n can be alphanumeric or an underscore.

data sample;
  infile datalines truncover;
  input source $char10.;
  format target $char10.;
  target=prxchange('s/[a-z,A-Z]\w+/0/i',1,source);
  datalines;
Abc9xy
a
a9
   _a
   a_
   _a_
123_a
  123_ab
123a_
 a123a_
 X_123a_
 X_123a_ bb
123_abc de
;
run;
ChrisNZ
Tourmaline | Level 20

@Patrick

 

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter, the 2nd to n can be alphanumeric or an underscore.

 

You mean:

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter or a comma, the 2nd one can be alphanumeric or an underscore.

Patrick
Opal | Level 21

@ChrisNZ

True, missed the comma.

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1044 views
  • 0 likes
  • 4 in conversation