BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
somebody
Lapis Lazuli | Level 10

I would like to clean company names. If the variable NAME contains INC, CORP, CLASS etc then I would like to remove them. Right now I use the following:

if find(name,' CORP ') then name_cleaned = tranwrd(name,' CORP ', ' ');

which works fine. However, if the name contains 2 words (e.g. ABC CORP CLASS A) that I want to delete, the above would only remove the word CORP and leave the word CLASS in the name. I can write another line of code such as 

if find(name,' CORP CLASS ') then name_cleaned = tranwrd(name,' CORP CLASS ', ' ');

This works fine, but if I have 20 words which I need to remove, doing so is long and might not be ideal. So what function can I use to tell SAS that if NAME contains these words, remove them or replace them with a space?

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Note that to find words, the findw function is better.

In this case you need to add \b in the expression:

NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);

You might also want to remove double spaces.

NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));

 

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Regular Expressions are perfect for this type of work.

This takes care of case too:

NEWNAME=prxchange('s/CORP|CLASS|LTD|INC//i',-1,NAME);

 

Shmuel
Garnet | Level 18

Adapt next (updated) tested code to your needs:

%let list_to_clean = one two three four five;
%let n2c = %sysfunc(countw(&list_to_clean)); 
data _null_;
length list_to_clean $100;
     list2clean = "&list_to_clean";
     do i=1 to &n2c;
        list_to_clean = trim(list_to_clean) || ' ' || quote(trim(scan(list2clean,i)));
     end;
     call symput('List_to_Clean',trim(list_to_clean));
run;
%put XXX= &list_to_clean;

*%let list_to_clean = "one" "two" "three" "four" "five";


data have;
  length string $30;
  infile cards truncover;
  input string $30. ;
cards;
there is one tree
I have three sons
raise two fingers
Nothing to Clean ; run; data want; set have; array w2c {&n2c} $ (&list_to_clean);
string1 = string; do i=1 to countw(string); w2check = scan(string,i); if w2check in w2c then string1 = tranwrd(string,strip(w2check),' '); end; keep string string1; run;

 

ChrisNZ
Tourmaline | Level 20

Note that to find words, the findw function is better.

In this case you need to add \b in the expression:

NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);

You might also want to remove double spaces.

NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));

 

andreas_lds
Jade | Level 19

Just one thing to add: when using tranwrd or translate it is not necessary to wrap calling those function in an if-then-statement. If those function don't find something to replace, they return the unchanged first parameter of the functions.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1966 views
  • 5 likes
  • 4 in conversation