Dear Expert.
I have a table with Comments column which contains Names. I want to hide the names from the comments.
Here is sample table and desired out put.
The actual Table
firsname | lastname | comments |
George | John | George John is drug addicted |
James | Bond | Mr. Bond is not cooperating guy |
Rachel | Bob | Rachel is not a team leader |
Desired Out put
firsname | lastname | comments |
George | John | XXXXXX is drug addicted |
James | Bond | XXXXXXX is not Aacooperating guy |
Rachel | Bob | XXXXXXX is not a team leader |
thanks
Hi:
Something like this should do it. Once you understand how TRANWRD works, then you can get rid of Mr., Ms., Jr., etc using the same technique.
cynthia
data have;
input firstname $
lastname $
comments & $20.;
cards;
George
John
George John is drug addicted
James
Bond
Mr. Bond is not cooperating guy
Rachel
Bob
Rachel is not a team leader
;
run;
data want;
set have;
pid=cats('s/(Mr\.)?\s*\b(',firstname,'|',lastname,')\b/xxxxx/i');
want=prxchange(pid,-1,comments);
drop pid;
run;
While I found KSharp's approach quite interesting, it runs almost 30 times slower than Cynthia's approach.
I'm not sure why Cynthia added all of the checking conditions, as I don't think that they're necessary. I think that something like the following will do what you want:
data have;
informat firstname $15.;
informat lastname $15.;
informat comments $60.;
input firstname lastname comments &;
cards;
George John 1 George John is johndrug addicted
George John 2 The list of addicts includes George John
George John 3 The list of addicts includes Mr. George John
George John 4 The list of addicts includes Mr. George John and others
James Bond 5 Mr. Bond is not a cooperating guy
Rachel Bob 6 Rachel is not a team leader
Kim Jones 7 Mrs. Kim Jones isn't here
Kim Jones 8 Ms. Kim Jones isn't here
Kim Jones 9 Does Miss Kim Jones live here?
;
data want (drop=_:);
length comments $80;
set have;
comments=tranwrd(comments, trim(firstname), "xxxxx");
comments=tranwrd(comments, trim(lastname), "xxxxx");
comments=tranwrd(comments, "Mr.", "xxxxx");
comments=tranwrd(comments, "Mrs.", "xxxxx");
comments=tranwrd(comments, "Ms.", "xxxxx");
comments=tranwrd(comments, "Miss", "xxxxx");
comments=tranwrd(comments, "Dr.", "xxxxx");
comments=tranwrd(tranwrd(comments, "xxxxx xxxxx", "xxxxx"), "xxxxx xxxxx", "xxxxx");
run;
Art, CEO, AnalystFinder.com
Dear Art297;
Is there away to hide any Name's and title from the comments. Sometimes I dont have the first and last name and they can add any names inside the comments column. How can I hide if they type any Name.
Thanks.
Here is some Python code that claims to accomplish the task: https://thetokenizer.com/2013/08/25/how-to-easily-recognize-peoples-names-in-a-text/ . Of course, short of running Python, you could access the Wikipedia name file mentioned in the article, load the names in an array, and use either the in operator or find function.
Art, CEO, AnalystFinder.com
Hi, Art:
You are correct, the checking additions are really not necessary. Neither is the creation of the NEWCOMMENT variable. However, I have been doing this a LONG time and sure as I would blank out the information in the COMMENT variable, some manager would come back to me and say, "Well, we really need to be able to have the managers and adminstrators see the real comment." And then I'd have to make NEWCOMMENT. Or else, they would say we don't ALWAYS want to blank out first name if it is John or Fred, thus requiring some kind of IF condition. So I generally try to write code that has some hope of surviving the first assault of changes.
And, I really didn't need X and Y either, but wanted a way for a newer programmer to be able to see what values were returned from those functions as separate variables just in case the request came for more refinements of the blanking out.
cynthia
Arthur.T, It is nice to meet you here again. I am really sorry that I have no time to visit SAS-L. Cynthia's code has a fault. if name is not a word ,tranwrd will still change it , see this : George John George Johnis drug addicted So although prxchange is slower but more flexible .
Xia,
Nice to hear from you as well and Happy New Year! Actually, the way Cynthia originally proposed the code, it didn't have the limitation you mentioned.
Here is a revised version that corrects for the problem you mentioned AND still runs more than 30 time faster than the regular expression method:
data want (drop=_:);
length comments $80;
set have;
if findw(comments,trim(firstname)) then
comments=tranwrd(comments, trim(firstname), "xxxxx");
if findw(comments,trim(lastname)) then
comments=tranwrd(comments, trim(lastname), "xxxxx");
if findw(comments,"Mr.") then
comments=tranwrd(comments, "Mr.", "xxxxx");
if findw(comments,"Mrs.") then
comments=tranwrd(comments, "Mrs.", "xxxxx");
if findw(comments,"Ms.") then
comments=tranwrd(comments, "Ms.", "xxxxx");
if findw(comments,"Miss") then
comments=tranwrd(comments, "Miss", "xxxxx");
if findw(comments,"Dr.") then
comments=tranwrd(comments, "Dr.", "xxxxx");
comments=tranwrd(tranwrd(comments, "xxxxx xxxxx", "xxxxx"), "xxxxx xxxxx", "xxxxx");
run;
Art, CEO, AnalystFinder.com
Xia,
Upon more thought, yes, the regex solution IS more precise. The last solution i offered won't work correctly if there are both desired and undesired instances within the same record, e.g.,
George John is Johnis drug addicted
Art, CEO, AnalystFinder.com
Hi.
If any name is possible at any place inside comments, I would suggest gathering all available names in a first pass and then with a second pass doing all the substitutions based on this list.
If you want, you can also add to the list any additional names.
%let NAMES=MR. MRS. MS. MISS DR.; * add aditional names here, if needed;
* first pass, gather every firstname and lastname into a list;
data _null_;
set have;
if not index(cat(' ',symget('NAMES'),' '),FIRSTNAME) then
call symput('NAMES',catx(' ',symget('NAMES'),upcase(FIRSTNAME)));
if not index(cat(' ',symget('NAMES'),' '),LASTNAME) then
call symput('NAMES',catx(' ',symget('NAMES'),upcase(LASTNAME)));
run;
%put NAMES=&NAMES; * list names to mask;
* second pass, mask names;
data want;
length _WORD $200 _SEP $1;
set have;
drop _:; * drop aux vars;
_COMMENTS=COMMENTS; _COMMENTS=''; _SEP=''; * init aux vars;
* process one word;
do _I=1 to countw(COMMENTS);
_COMMENTS=catx(_SEP,_COMMENTS,_WORD);
_WORD=scan(COMMENTS,_I,' ');
* if match, mask word;
if index(" &NAMES ",cat(' ',strip(upcase(_WORD)),' ')) then do;
_WORD=repeat('X',lengthn(_WORD)-1);
_SEP='X';
end;
else _SEP=' '; * clear masking separator;
end;
COMMENTS=catx(_SEP,_COMMENTS,_WORD);
run;
Hope it helps.
Daniel Santos @ www.cgd.pt
Note to the solution above.
There is 64KB limit for the list (macro variable).
If it is expected to go above that, the same code could be easily modified to load the names into an hash table.
Hope it helps.
Daniel Santos @ www.cgd.pt
Daniel-Santos,
The problem is i dont have any list of names and they can enter as many names as they want. How do I hide the names when i dont have list of names.
thanks,
Hi.
Well, if you are looking for some kind of function that will identy names and surnames for you, I'm sorry I don't know any that can do that.
You'll have to priorly make the list youself.
Daniel Santos @ www.cgd.pt
I haven't experimented with it myself, but the following R package looks like it might be able to identify people:
https://cran.r-project.org/web/packages/openNLP/openNLP.pdf
If it can, you can run R from SAS/IML
HTH,
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.