I would like to clean company names. If the variable NAME contains INC, CORP, CLASS etc then I would like to remove them. Right now I use the following:
if find(name,' CORP ') then name_cleaned = tranwrd(name,' CORP ', ' ');
which works fine. However, if the name contains 2 words (e.g. ABC CORP CLASS A) that I want to delete, the above would only remove the word CORP and leave the word CLASS in the name. I can write another line of code such as
if find(name,' CORP CLASS ') then name_cleaned = tranwrd(name,' CORP CLASS ', ' ');
This works fine, but if I have 20 words which I need to remove, doing so is long and might not be ideal. So what function can I use to tell SAS that if NAME contains these words, remove them or replace them with a space?
Note that to find words, the findw function is better.
In this case you need to add \b in the expression:
NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);
You might also want to remove double spaces.
NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));
Regular Expressions are perfect for this type of work.
This takes care of case too:
NEWNAME=prxchange('s/CORP|CLASS|LTD|INC//i',-1,NAME);
Adapt next (updated) tested code to your needs:
%let list_to_clean = one two three four five;
%let n2c = %sysfunc(countw(&list_to_clean));
data _null_;
length list_to_clean $100;
list2clean = "&list_to_clean";
do i=1 to &n2c;
list_to_clean = trim(list_to_clean) || ' ' || quote(trim(scan(list2clean,i)));
end;
call symput('List_to_Clean',trim(list_to_clean));
run;
%put XXX= &list_to_clean;
*%let list_to_clean = "one" "two" "three" "four" "five";
data have;
length string $30;
infile cards truncover;
input string $30. ;
cards;
there is one tree
I have three sons
raise two fingers
Nothing to Clean
;
run;
data want;
set have;
array w2c {&n2c} $ (&list_to_clean);
string1 = string;
do i=1 to countw(string);
w2check = scan(string,i);
if w2check in w2c
then string1 = tranwrd(string,strip(w2check),' ');
end;
keep string string1;
run;
Note that to find words, the findw function is better.
In this case you need to add \b in the expression:
NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);
You might also want to remove double spaces.
NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));
Just one thing to add: when using tranwrd or translate it is not necessary to wrap calling those function in an if-then-statement. If those function don't find something to replace, they return the unchanged first parameter of the functions.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.