I would like to clean company names. If the variable NAME contains INC, CORP, CLASS etc then I would like to remove them. Right now I use the following:
if find(name,' CORP ') then name_cleaned = tranwrd(name,' CORP ', ' ');
which works fine. However, if the name contains 2 words (e.g. ABC CORP CLASS A) that I want to delete, the above would only remove the word CORP and leave the word CLASS in the name. I can write another line of code such as
if find(name,' CORP CLASS ') then name_cleaned = tranwrd(name,' CORP CLASS ', ' ');
This works fine, but if I have 20 words which I need to remove, doing so is long and might not be ideal. So what function can I use to tell SAS that if NAME contains these words, remove them or replace them with a space?
Note that to find words, the findw function is better.
In this case you need to add \b in the expression:
NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);
You might also want to remove double spaces.
NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));
Regular Expressions are perfect for this type of work.
This takes care of case too:
NEWNAME=prxchange('s/CORP|CLASS|LTD|INC//i',-1,NAME);
Adapt next (updated) tested code to your needs:
%let list_to_clean = one two three four five;
%let n2c = %sysfunc(countw(&list_to_clean));
data _null_;
length list_to_clean $100;
list2clean = "&list_to_clean";
do i=1 to &n2c;
list_to_clean = trim(list_to_clean) || ' ' || quote(trim(scan(list2clean,i)));
end;
call symput('List_to_Clean',trim(list_to_clean));
run;
%put XXX= &list_to_clean;
*%let list_to_clean = "one" "two" "three" "four" "five";
data have;
length string $30;
infile cards truncover;
input string $30. ;
cards;
there is one tree
I have three sons
raise two fingers
Nothing to Clean
;
run;
data want;
set have;
array w2c {&n2c} $ (&list_to_clean);
string1 = string;
do i=1 to countw(string);
w2check = scan(string,i);
if w2check in w2c
then string1 = tranwrd(string,strip(w2check),' ');
end;
keep string string1;
run;
Note that to find words, the findw function is better.
In this case you need to add \b in the expression:
NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);
You might also want to remove double spaces.
NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));
Just one thing to add: when using tranwrd or translate it is not necessary to wrap calling those function in an if-then-statement. If those function don't find something to replace, they return the unchanged first parameter of the functions.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.