BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
somebody
Lapis Lazuli | Level 10

I would like to clean company names. If the variable NAME contains INC, CORP, CLASS etc then I would like to remove them. Right now I use the following:

if find(name,' CORP ') then name_cleaned = tranwrd(name,' CORP ', ' ');

which works fine. However, if the name contains 2 words (e.g. ABC CORP CLASS A) that I want to delete, the above would only remove the word CORP and leave the word CLASS in the name. I can write another line of code such as 

if find(name,' CORP CLASS ') then name_cleaned = tranwrd(name,' CORP CLASS ', ' ');

This works fine, but if I have 20 words which I need to remove, doing so is long and might not be ideal. So what function can I use to tell SAS that if NAME contains these words, remove them or replace them with a space?

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Note that to find words, the findw function is better.

In this case you need to add \b in the expression:

NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);

You might also want to remove double spaces.

NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));

 

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Regular Expressions are perfect for this type of work.

This takes care of case too:

NEWNAME=prxchange('s/CORP|CLASS|LTD|INC//i',-1,NAME);

 

Shmuel
Garnet | Level 18

Adapt next (updated) tested code to your needs:

%let list_to_clean = one two three four five;
%let n2c = %sysfunc(countw(&list_to_clean)); 
data _null_;
length list_to_clean $100;
     list2clean = "&list_to_clean";
     do i=1 to &n2c;
        list_to_clean = trim(list_to_clean) || ' ' || quote(trim(scan(list2clean,i)));
     end;
     call symput('List_to_Clean',trim(list_to_clean));
run;
%put XXX= &list_to_clean;

*%let list_to_clean = "one" "two" "three" "four" "five";


data have;
  length string $30;
  infile cards truncover;
  input string $30. ;
cards;
there is one tree
I have three sons
raise two fingers
Nothing to Clean ; run; data want; set have; array w2c {&n2c} $ (&list_to_clean);
string1 = string; do i=1 to countw(string); w2check = scan(string,i); if w2check in w2c then string1 = tranwrd(string,strip(w2check),' '); end; keep string string1; run;

 

ChrisNZ
Tourmaline | Level 20

Note that to find words, the findw function is better.

In this case you need to add \b in the expression:

NAME = prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME);

You might also want to remove double spaces.

NAME = compbl(prxchange('s/\b(CLASS|LTD|INC)\b//i',-1,NAME));

 

andreas_lds
Jade | Level 19

Just one thing to add: when using tranwrd or translate it is not necessary to wrap calling those function in an if-then-statement. If those function don't find something to replace, they return the unchanged first parameter of the functions.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1655 views
  • 5 likes
  • 4 in conversation