02-27-2017 02:24 AM
I have a free text column which I need to delete and replace some sections. I'm able to use prxchange for those with known structures such as employee id, postal code. But I also need to mask names and other sensitive information
Input Free Text:
Name: John Doe
Output Cleansed Text:
Is there a way I can use prxmatch then if it matches a portion I will manipulate the succeeding string next to it?
02-27-2017 03:50 AM
My first action would be to parse the free text into dataset variables, so I'd have a dataset with separate columns like
data have; infile cards dlm=',' truncover; input name :$20. dob :yymmdd10.; format dob yymmddd10.; cards; John Doe,1991-10-10 John Miller,2001-01-30 ; run;
Then it's very simple
data want; set have; if index(name,'John Doe') > 0 then do; name = 'XXX'; dob = .; end; run;
02-28-2017 10:47 PM
Thanks for your inputs. However, my data won't be clean as such. The fields that I need to cleanse and mask will be stored in a single field (example would be this response. everything will be in one field) which is a bit challenge for SAS unlike if I do it in either python or R where I can tokenize all the strings in the field. I'm just trying to weigh in options on which tool would be the most suitable to use for my requirement.