BookmarkSubscribeRSS Feed
milts
Pyrite | Level 9

I have a free text column which I need to delete and replace some sections. I'm able to use prxchange for those with known structures such as employee id, postal code. But I also need to mask names and other sensitive information

 

Input Free Text:

Name: John Doe

DOB: 1991-10-10

 

Output Cleansed Text:

Name: XXX

DOB: YYYY-MM-DD

 

Is there a way I can use prxmatch then if it matches a portion I will manipulate the succeeding string next to it?

 

 

2 REPLIES 2
Kurt_Bremser
Super User

My first action would be to parse the free text into dataset variables, so I'd have a dataset with separate columns like

data have;
infile cards dlm=',' truncover;
input name :$20. dob :yymmdd10.;
format dob yymmddd10.;
cards;
John Doe,1991-10-10
John Miller,2001-01-30
;
run;

Then it's very simple

data want;
set have;
if index(name,'John Doe') > 0
then do;
  name = 'XXX';
  dob = .;
end;
run;
milts
Pyrite | Level 9

Thanks for your inputs. However, my data won't be clean as such. The fields that I need to cleanse and mask will be stored in a single field (example would be this response. everything will be in one field) which is a bit challenge for SAS unlike if I do it in either python or R where I can tokenize all the strings in the field. I'm just trying to weigh in options on which tool would be the most suitable to use for my requirement.

 

Cheers!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1247 views
  • 0 likes
  • 2 in conversation