BookmarkSubscribeRSS Feed
tekish
Quartz | Level 8

Dear Expert.

 

I have  a table with Comments column which contains Names. I want to hide the names from the comments.

Here is sample table and desired out put.

 

The actual Table

firsname

lastname

comments

George

John

George John is drug addicted

James

Bond

Mr. Bond is not cooperating guy

Rachel

Bob

Rachel is not a team leader           

 

Desired Out put

firsname

lastname

comments

George

John

XXXXXX is drug addicted

James

Bond

XXXXXXX is not Aacooperating guy

Rachel

Bob

XXXXXXX is not a team leader           

 

thanks

14 REPLIES 14
Cynthia_sas
SAS Super FREQ

Hi:

  Something like this should do it. Once you understand how TRANWRD works, then you can get rid of Mr., Ms., Jr., etc using the same technique.

cynthia

use_tranwrd.png

Ksharp
Super User
data have;
input firstname $
lastname $
comments & $20.;
cards;
George
John
George John is drug addicted
James
Bond
Mr. Bond is not cooperating guy
Rachel
Bob
Rachel is not a team leader   
;
run;
data want;
 set have;
 pid=cats('s/(Mr\.)?\s*\b(',firstname,'|',lastname,')\b/xxxxx/i');
 want=prxchange(pid,-1,comments);
 drop pid;
run;
art297
Opal | Level 21

While I found KSharp's approach quite interesting, it runs almost 30 times slower than Cynthia's approach.

 

I'm not sure why Cynthia added all of the checking conditions, as I don't think that they're necessary.  I think that something like the following will do what you want:

 

data have;
informat firstname $15.;
informat lastname $15.;
informat comments $60.;
input firstname lastname comments &;
cards;
George John 1 George John is johndrug addicted
George John 2 The list of addicts includes George John
George John 3 The list of addicts includes Mr. George John
George John 4 The list of addicts includes Mr. George John and others
James Bond 5 Mr. Bond is not a cooperating guy
Rachel Bob 6 Rachel is not a team leader
Kim Jones 7 Mrs. Kim Jones isn't here
Kim Jones 8 Ms. Kim Jones isn't here
Kim Jones 9 Does Miss Kim Jones live here?
;

data want (drop=_:);
length comments $80;
set have;
comments=tranwrd(comments, trim(firstname), "xxxxx");
comments=tranwrd(comments, trim(lastname), "xxxxx");
comments=tranwrd(comments, "Mr.", "xxxxx");
comments=tranwrd(comments, "Mrs.", "xxxxx");
comments=tranwrd(comments, "Ms.", "xxxxx");
comments=tranwrd(comments, "Miss", "xxxxx");
comments=tranwrd(comments, "Dr.", "xxxxx");
comments=tranwrd(tranwrd(comments, "xxxxx xxxxx", "xxxxx"), "xxxxx xxxxx", "xxxxx");
run;

 

Art, CEO, AnalystFinder.com

 

tekish
Quartz | Level 8

Dear Art297;

 

  Is there away to hide any Name's  and title from the comments. Sometimes I dont have the first and last name and they can add any names inside the comments column. How can I hide if they type any Name.

 

Thanks.

art297
Opal | Level 21

Here is some Python code that claims to accomplish the task: https://thetokenizer.com/2013/08/25/how-to-easily-recognize-peoples-names-in-a-text/ . Of course, short of running Python, you could access the Wikipedia name file mentioned in the article, load the names in an array, and use either the in operator or find function.

 

Art, CEO, AnalystFinder.com

Cynthia_sas
SAS Super FREQ

Hi, Art:
You are correct, the checking additions are really not necessary. Neither is the creation of the NEWCOMMENT variable. However, I have been doing this a LONG time and sure as I would blank out the information in the COMMENT variable, some manager would come back to me and say, "Well, we really need to be able to have the managers and adminstrators see the real comment." And then I'd have to make NEWCOMMENT. Or else, they would say we don't ALWAYS want to blank out first name if it is John or Fred, thus requiring some kind of IF condition. So I generally try to write code that has some hope of surviving the first assault of changes.

 

  And, I really didn't need X and Y either, but wanted a way for a newer programmer to be able to see what values were returned from those functions as separate variables just in case the request came for more refinements of the blanking out.

cynthia

Ksharp
Super User
Arthur.T,
It is nice to meet  you here again.
I am really sorry that I have no time to visit SAS-L.

Cynthia's code has a fault. if name is not a word ,tranwrd will still change it , see this :
George John    George Johnis drug addicted

So although prxchange is slower but more flexible .
art297
Opal | Level 21

Xia,

 

Nice to hear from you as well and Happy New Year! Actually, the way Cynthia originally proposed the code, it didn't have the limitation you mentioned.

 

Here is a revised version that corrects for the problem you mentioned AND still runs more than 30 time faster than the regular expression method:

 

data want (drop=_:);
 length comments $80;
 set have;
 if findw(comments,trim(firstname)) then
  comments=tranwrd(comments, trim(firstname), "xxxxx");
 if findw(comments,trim(lastname)) then
  comments=tranwrd(comments, trim(lastname), "xxxxx");
 if findw(comments,"Mr.") then
  comments=tranwrd(comments, "Mr.", "xxxxx");
 if findw(comments,"Mrs.") then
  comments=tranwrd(comments, "Mrs.", "xxxxx");
 if findw(comments,"Ms.") then
  comments=tranwrd(comments, "Ms.", "xxxxx");
 if findw(comments,"Miss") then
  comments=tranwrd(comments, "Miss", "xxxxx");
 if findw(comments,"Dr.") then
  comments=tranwrd(comments, "Dr.", "xxxxx");
 comments=tranwrd(tranwrd(comments, "xxxxx xxxxx", "xxxxx"), "xxxxx xxxxx", "xxxxx");
run;

 

Art, CEO, AnalystFinder.com

art297
Opal | Level 21

Xia,

 

Upon more thought, yes, the regex solution IS more precise. The last solution i offered won't work correctly if there are both desired and undesired instances within the same record, e.g., 

George John is Johnis drug addicted

 Art, CEO, AnalystFinder.com

 

Daniel-Santos
Obsidian | Level 7

Hi.

 

If any name is possible at any place inside comments, I would suggest gathering all available names in a first pass and then with a second pass doing all the substitutions based on this list.

 

If you want, you can also add to the list any additional names.

 

%let NAMES=MR. MRS. MS. MISS DR.; * add aditional names here, if needed;

 

* first pass, gather every firstname and lastname into a list;

data _null_;
     set have;
     if not index(cat(' ',symget('NAMES'),' '),FIRSTNAME) then
        call symput('NAMES',catx(' ',symget('NAMES'),upcase(FIRSTNAME)));
     if not index(cat(' ',symget('NAMES'),' '),LASTNAME) then
        call symput('NAMES',catx(' ',symget('NAMES'),upcase(LASTNAME)));
run;

 

%put NAMES=&NAMES; * list names to mask;

 

* second pass, mask names;

data want;
     length _WORD $200 _SEP $1;
     set have;
     drop _:; * drop aux vars;
     _COMMENTS=COMMENTS; _COMMENTS=''; _SEP=''; * init aux vars;

     * process one word;

     do _I=1 to countw(COMMENTS);
        _COMMENTS=catx(_SEP,_COMMENTS,_WORD);
        _WORD=scan(COMMENTS,_I,' ');                
        * if match, mask word;

        if index(" &NAMES ",cat(' ',strip(upcase(_WORD)),' ')) then do;
           _WORD=repeat('X',lengthn(_WORD)-1);
           _SEP='X';
        end;
        else _SEP=' '; * clear masking separator;
     end;
     COMMENTS=catx(_SEP,_COMMENTS,_WORD);
run;

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

Daniel-Santos
Obsidian | Level 7

Note to the solution above.

 

There is 64KB limit for the list (macro variable). 

 

If it is expected to go above that, the same code could be easily modified to load the names into an hash table.

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

tekish
Quartz | Level 8

Daniel-Santos,

 

The problem is i dont have any list of names and they can enter as many names as they want. How do I hide the names when i dont have list of names.

 

thanks,

  

Daniel-Santos
Obsidian | Level 7

Hi.

 

Well, if you are looking for some kind of function that will identy names and surnames for you, I'm sorry I don't know any that can do that.

 

You'll have to priorly make the list youself.

 

Daniel Santos @ www.cgd.pt

art297
Opal | Level 21

I haven't experimented with it myself, but the following R package looks like it might be able to identify people:

https://cran.r-project.org/web/packages/openNLP/openNLP.pdf

 

If it can, you can run R from SAS/IML

 

HTH,

Art, CEO, AnalystFinder.com

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2824 views
  • 7 likes
  • 5 in conversation