DATA Step, Macro, Functions and more

Hide the name inside the text

Reply
Contributor
Posts: 57

Hide the name inside the text

Dear Expert.

 

I have  a table with Comments column which contains Names. I want to hide the names from the comments.

Here is sample table and desired out put.

 

The actual Table

firsname

lastname

comments

George

John

George John is drug addicted

James

Bond

Mr. Bond is not cooperating guy

Rachel

Bob

Rachel is not a team leader           

 

Desired Out put

firsname

lastname

comments

George

John

XXXXXX is drug addicted

James

Bond

XXXXXXX is not Aacooperating guy

Rachel

Bob

XXXXXXX is not a team leader           

 

thanks

SAS Super FREQ
Posts: 8,868

Re: Hide the name inside the text

Hi:

  Something like this should do it. Once you understand how TRANWRD works, then you can get rid of Mr., Ms., Jr., etc using the same technique.

cynthia

use_tranwrd.png

Super User
Posts: 10,035

Re: Hide the name inside the text

[ Edited ]
data have;
input firstname $
lastname $
comments & $20.;
cards;
George
John
George John is drug addicted
James
Bond
Mr. Bond is not cooperating guy
Rachel
Bob
Rachel is not a team leader   
;
run;
data want;
 set have;
 pid=cats('s/(Mr\.)?\s*\b(',firstname,'|',lastname,')\b/xxxxx/i');
 want=prxchange(pid,-1,comments);
 drop pid;
run;
PROC Star
Posts: 7,477

Re: Hide the name inside the text

While I found KSharp's approach quite interesting, it runs almost 30 times slower than Cynthia's approach.

 

I'm not sure why Cynthia added all of the checking conditions, as I don't think that they're necessary.  I think that something like the following will do what you want:

 

data have;
informat firstname $15.;
informat lastname $15.;
informat comments $60.;
input firstname lastname comments &;
cards;
George John 1 George John is johndrug addicted
George John 2 The list of addicts includes George John
George John 3 The list of addicts includes Mr. George John
George John 4 The list of addicts includes Mr. George John and others
James Bond 5 Mr. Bond is not a cooperating guy
Rachel Bob 6 Rachel is not a team leader
Kim Jones 7 Mrs. Kim Jones isn't here
Kim Jones 8 Ms. Kim Jones isn't here
Kim Jones 9 Does Miss Kim Jones live here?
;

data want (drop=_Smiley Happy;
length comments $80;
set have;
comments=tranwrd(comments, trim(firstname), "xxxxx");
comments=tranwrd(comments, trim(lastname), "xxxxx");
comments=tranwrd(comments, "Mr.", "xxxxx");
comments=tranwrd(comments, "Mrs.", "xxxxx");
comments=tranwrd(comments, "Ms.", "xxxxx");
comments=tranwrd(comments, "Miss", "xxxxx");
comments=tranwrd(comments, "Dr.", "xxxxx");
comments=tranwrd(tranwrd(comments, "xxxxx xxxxx", "xxxxx"), "xxxxx xxxxx", "xxxxx");
run;

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 57

Re: Hide the name inside the text

Dear Art297;

 

  Is there away to hide any Name's  and title from the comments. Sometimes I dont have the first and last name and they can add any names inside the comments column. How can I hide if they type any Name.

 

Thanks.

PROC Star
Posts: 7,477

Re: Hide the name inside the text

Here is some Python code that claims to accomplish the task: https://thetokenizer.com/2013/08/25/how-to-easily-recognize-peoples-names-in-a-text/ . Of course, short of running Python, you could access the Wikipedia name file mentioned in the article, load the names in an array, and use either the in operator or find function.

 

Art, CEO, AnalystFinder.com

SAS Super FREQ
Posts: 8,868

Re: Hide the name inside the text

[ Edited ]

Hi, Art:
You are correct, the checking additions are really not necessary. Neither is the creation of the NEWCOMMENT variable. However, I have been doing this a LONG time and sure as I would blank out the information in the COMMENT variable, some manager would come back to me and say, "Well, we really need to be able to have the managers and adminstrators see the real comment." And then I'd have to make NEWCOMMENT. Or else, they would say we don't ALWAYS want to blank out first name if it is John or Fred, thus requiring some kind of IF condition. So I generally try to write code that has some hope of surviving the first assault of changes.

 

  And, I really didn't need X and Y either, but wanted a way for a newer programmer to be able to see what values were returned from those functions as separate variables just in case the request came for more refinements of the blanking out.

cynthia

Super User
Posts: 10,035

Re: Hide the name inside the text

Arthur.T,
It is nice to meet  you here again.
I am really sorry that I have no time to visit SAS-L.

Cynthia's code has a fault. if name is not a word ,tranwrd will still change it , see this :
George John    George Johnis drug addicted

So although prxchange is slower but more flexible .
PROC Star
Posts: 7,477

Re: Hide the name inside the text

Xia,

 

Nice to hear from you as well and Happy New Year! Actually, the way Cynthia originally proposed the code, it didn't have the limitation you mentioned.

 

Here is a revised version that corrects for the problem you mentioned AND still runs more than 30 time faster than the regular expression method:

 

data want (drop=_Smiley Happy;
 length comments $80;
 set have;
 if findw(comments,trim(firstname)) then
  comments=tranwrd(comments, trim(firstname), "xxxxx");
 if findw(comments,trim(lastname)) then
  comments=tranwrd(comments, trim(lastname), "xxxxx");
 if findw(comments,"Mr.") then
  comments=tranwrd(comments, "Mr.", "xxxxx");
 if findw(comments,"Mrs.") then
  comments=tranwrd(comments, "Mrs.", "xxxxx");
 if findw(comments,"Ms.") then
  comments=tranwrd(comments, "Ms.", "xxxxx");
 if findw(comments,"Miss") then
  comments=tranwrd(comments, "Miss", "xxxxx");
 if findw(comments,"Dr.") then
  comments=tranwrd(comments, "Dr.", "xxxxx");
 comments=tranwrd(tranwrd(comments, "xxxxx xxxxx", "xxxxx"), "xxxxx xxxxx", "xxxxx");
run;

 

Art, CEO, AnalystFinder.com

PROC Star
Posts: 7,477

Re: Hide the name inside the text

Xia,

 

Upon more thought, yes, the regex solution IS more precise. The last solution i offered won't work correctly if there are both desired and undesired instances within the same record, e.g., 

George John is Johnis drug addicted

 Art, CEO, AnalystFinder.com

 

Contributor
Posts: 24

Re: Hide the name inside the text

Hi.

 

If any name is possible at any place inside comments, I would suggest gathering all available names in a first pass and then with a second pass doing all the substitutions based on this list.

 

If you want, you can also add to the list any additional names.

 

%let NAMES=MR. MRS. MS. MISS DR.; * add aditional names here, if needed;

 

* first pass, gather every firstname and lastname into a list;

data _null_;
     set have;
     if not index(cat(' ',symget('NAMES'),' '),FIRSTNAME) then
        call symput('NAMES',catx(' ',symget('NAMES'),upcase(FIRSTNAME)));
     if not index(cat(' ',symget('NAMES'),' '),LASTNAME) then
        call symput('NAMES',catx(' ',symget('NAMES'),upcase(LASTNAME)));
run;

 

%put NAMES=&NAMES; * list names to mask;

 

* second pass, mask names;

data want;
     length _WORD $200 _SEP $1;
     set have;
     drop _:; * drop aux vars;
     _COMMENTS=COMMENTS; _COMMENTS=''; _SEP=''; * init aux vars;

     * process one word;

     do _I=1 to countw(COMMENTS);
        _COMMENTS=catx(_SEP,_COMMENTS,_WORD);
        _WORD=scan(COMMENTS,_I,' ');                
        * if match, mask word;

        if index(" &NAMES ",cat(' ',strip(upcase(_WORD)),' ')) then do;
           _WORD=repeat('X',lengthn(_WORD)-1);
           _SEP='X';
        end;
        else _SEP=' '; * clear masking separator;
     end;
     COMMENTS=catx(_SEP,_COMMENTS,_WORD);
run;

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

Contributor
Posts: 24

Re: Hide the name inside the text

Posted in reply to Daniel-Santos

Note to the solution above.

 

There is 64KB limit for the list (macro variable). 

 

If it is expected to go above that, the same code could be easily modified to load the names into an hash table.

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

Contributor
Posts: 57

Re: Hide the name inside the text

Posted in reply to Daniel-Santos

Daniel-Santos,

 

The problem is i dont have any list of names and they can enter as many names as they want. How do I hide the names when i dont have list of names.

 

thanks,

  

Contributor
Posts: 24

Re: Hide the name inside the text

Hi.

 

Well, if you are looking for some kind of function that will identy names and surnames for you, I'm sorry I don't know any that can do that.

 

You'll have to priorly make the list youself.

 

Daniel Santos @ www.cgd.pt

PROC Star
Posts: 7,477

Re: Hide the name inside the text

Posted in reply to Daniel-Santos

I haven't experimented with it myself, but the following R package looks like it might be able to identify people:

https://cran.r-project.org/web/packages/openNLP/openNLP.pdf

 

If it can, you can run R from SAS/IML

 

HTH,

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 14 replies
  • 383 views
  • 7 likes
  • 5 in conversation