I have a free text column which I need to delete and replace some sections. I'm able to use prxchange for those with known structures such as employee id, postal code. But I also need to mask names and other sensitive information
Input Free Text:
Name: John Doe
DOB: 1991-10-10
Output Cleansed Text:
Name: XXX
DOB: YYYY-MM-DD
Is there a way I can use prxmatch then if it matches a portion I will manipulate the succeeding string next to it?
My first action would be to parse the free text into dataset variables, so I'd have a dataset with separate columns like
data have;
infile cards dlm=',' truncover;
input name :$20. dob :yymmdd10.;
format dob yymmddd10.;
cards;
John Doe,1991-10-10
John Miller,2001-01-30
;
run;
Then it's very simple
data want;
set have;
if index(name,'John Doe') > 0
then do;
name = 'XXX';
dob = .;
end;
run;
Thanks for your inputs. However, my data won't be clean as such. The fields that I need to cleanse and mask will be stored in a single field (example would be this response. everything will be in one field) which is a bit challenge for SAS unlike if I do it in either python or R where I can tokenize all the strings in the field. I'm just trying to weigh in options on which tool would be the most suitable to use for my requirement.
Cheers!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.