BookmarkSubscribeRSS Feed
Reeza
Super User

I'm converting some code from Python to SAS and stuck at one point - regular expressions. Can someone confirm my assumption is correct please?

 

I think this line:

df['VALUE'] = df['VALUE'].replace(r'([a-z,A-Z])\w+',0)

 

Basically removes all character values and replaces it with 0?

Is this interpretation correct?


I can't run the program on the file otherwise I'd test it by running each program and comparing the results. 

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

My python is basic, but as far as I know the syntax is:

NEW_STR=re.replace(REGEX, REPLACEMENT , OLD_STR)

 

1. You need the re. call (what's the value of df['VALUE'] ?)  for the RegEx parser to be used

2. You need 3 parameters

3. I don't see how numbers are allowed

 

Regardless, the regular expression here matches: 

- one lower case letter or comma or upper case letter

- followed by word characters (this includes digits and underscore)

 

tomcmacdonald
Quartz | Level 8

 

Are these pandas dataframes?  If so I would do it like this:

 

import re


...


df['VAL'] = df['VAL'].apply(lambda x: re.sub('[A-Z]+', '0', x, flags=re.I))

 

 

Reeza
Super User

It is panda's but I'm going in the opposite direction, from Pandas/Python to SAS code. The data is too big to be handled well in python. 

Patrick
Opal | Level 21

@Reeza

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter, the 2nd to n can be alphanumeric or an underscore.

data sample;
  infile datalines truncover;
  input source $char10.;
  format target $char10.;
  target=prxchange('s/[a-z,A-Z]\w+/0/i',1,source);
  datalines;
Abc9xy
a
a9
   _a
   a_
   _a_
123_a
  123_ab
123a_
 a123a_
 X_123a_
 X_123a_ bb
123_abc de
;
run;
ChrisNZ
Tourmaline | Level 20

@Patrick

 

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter, the 2nd to n can be alphanumeric or an underscore.

 

You mean:

If I read this RegEx correctly then it needs at least two characters to match; the first character needs to be a letter or a comma, the 2nd one can be alphanumeric or an underscore.

Patrick
Opal | Level 21

@ChrisNZ

True, missed the comma.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1429 views
  • 0 likes
  • 4 in conversation