BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eyp500
Calcite | Level 5

For a variable I have, I'd like to censor all addresses it contains if there is any.

e.g.

data _null_;
x = prxchange("s/(\w+) \b(STREET)\b/*LOCATION REMOVED*/",-1, 'I WAS WALKING ON QUEEN STREET IN THE MORNING');

put x=;
run;


But I also want to create exceptions, where prxchange ignores strings like 'THE STREET', as in 'I WAS DRIVING ON THE STREET'

 

Is it possible to do this?

 

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

You need to use a negative look-behind assertion.

data _null_;
  x = prxchange("s/(\w+)(?<!THE) \b(STREET)\b/*LOCATION REMOVED*/",-1, 'I WAS WALKING IN THE STREET IN THE MORNING');
  put x=;
  x = prxchange("s/(\w+)(?<!THE) \b(STREET)\b/*LOCATION REMOVED*/",-1, 'I WAS WALKING ON QUEEN STREET IN THE MORNING');
  put x=;
run;

x=I WAS WALKING IN THE STREET IN THE MORNING
x=I WAS WALKING ON *LOCATION REMOVED* IN THE MORNING

 

View solution in original post

6 REPLIES 6
Reeza
Super User

@eyp500 wrote:

 

 

Is it possible to do this?

 

 


Better question - how accurate do you need it to be? If you miss a few will it matter?

eyp500
Calcite | Level 5

Ideally we can't afford to not remove actual street names. We can censor too many and not risking showing actual addresses, but end user would prefer unnecessary censoring to be minimised. We do have a list of things we know we are safe to avoid replacing, such as 'street lamp', 'street light', 'residential street'.

PGStats
Opal | Level 21

You would be more likely to achieve this if the street names were part of a finite set (like street names within a city). When you found one of those, you could confirm that it is used as a street name by its context.

PG
eyp500
Calcite | Level 5
Unfortunately we don't have a list of street names (nationwide) and the worst part is there are many typos of street names within the data 😞
ChrisNZ
Tourmaline | Level 20

You need to use a negative look-behind assertion.

data _null_;
  x = prxchange("s/(\w+)(?<!THE) \b(STREET)\b/*LOCATION REMOVED*/",-1, 'I WAS WALKING IN THE STREET IN THE MORNING');
  put x=;
  x = prxchange("s/(\w+)(?<!THE) \b(STREET)\b/*LOCATION REMOVED*/",-1, 'I WAS WALKING ON QUEEN STREET IN THE MORNING');
  put x=;
run;

x=I WAS WALKING IN THE STREET IN THE MORNING
x=I WAS WALKING ON *LOCATION REMOVED* IN THE MORNING

 

eyp500
Calcite | Level 5

Thank you so much! This works perfectly

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 774 views
  • 0 likes
  • 4 in conversation