- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am attempting to remove certain strings from company names in my data. That is, I would like to remove things like 'LTD', 'CO', 'INC', 'LLC', etc. I'm also looking to get rid of leading and trailing blanks and punctuation. So, I am able to do this for one use case as you see below. But I don't want to keep writing new lines for each tranwrd as I am unable to put a list of strings in tranwrd. What I'd like to do is have some sort of loop that sends a list of strings through the tranwrd function (applying to the same field - company_name). Like define a list of strings as a macro variable and send that through a loop within a macro or something like that.
data MyCompanies; set CompanyListing; Company_Name=tranwrd(strip(compress(CompanyName,,"p")),"LTD",''); run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data MYCOMPANIES;
* set COMPANYLISTING;
COMPANYNAME='Incredible Company Ltd.';
COMPANY_NAME=prxchange('s/(\b(LTD|CO|INC|LLC)\b)//i',1,left(compress(COMPANYNAME,,'p')));
putlog COMPANY_NAME=;
run;
COMPANY_NAME=Incredible Company
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data MYCOMPANIES;
* set COMPANYLISTING;
COMPANYNAME='Incredible Company Ltd.';
COMPANY_NAME=prxchange('s/(\b(LTD|CO|INC|LLC)\b)//i',1,left(compress(COMPANYNAME,,'p')));
putlog COMPANY_NAME=;
run;
COMPANY_NAME=Incredible Company
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@JediApprentice wrote:
What I'd like to do is have some sort of loop that sends a list of strings through the tranwrd function (applying to the same field - company_name). Like define a list of strings as a macro variable and send that through a loop within a macro or something like that.
data MyCompanies; set CompanyListing; Company_Name=tranwrd(strip(compress(CompanyName,,"p")),"LTD",''); run;
You would use an ARRAY, not a macro here. Use a temporary array to store the words you want to remove and then use a loop to remove them. You aren't factoring in variable CaSE though, so you may want to do that as well, eg. ltD, LTD, ltd, Ltd, Limited are all different and TRANWRD will not replace them.
data mycomp;
set listing;
array _remove(*) _temporary_ $ ('LTD', 'INC', 'CO');
do i=1 to dim(_remove);
company_name = tranwrd(strip(compress(companyName, ,"p")), _remove(i), '');
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Reeza , when I try this on my data, I get this error (the structure is the same as yours)
array rmve_strs(*) _temporary_ $ ('LTD','LLC','INC','CORPORATION','COMPANY','LP');
_
22
200
ERROR: The non-variable based array rmve_strs has been defined with zero elements.
ERROR 22-322: Syntax error, expecting one of the following: (, ;.
ERROR 200-322: The symbol is not recognized and will be ignored.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. It needs a dimension apparently so replace the * with the number of elements
2. Move the $ before the keyword _TEMPORARY_
@JediApprentice wrote:
Hi @Reeza , when I try this on my data, I get this error (the structure is the same as yours)
array rmve_strs(*) _temporary_ $ ('LTD','LLC','INC','CORPORATION','COMPANY','LP'); _ 22 200 ERROR: The non-variable based array rmve_strs has been defined with zero elements. ERROR 22-322: Syntax error, expecting one of the following: (, ;. ERROR 200-322: The symbol is not recognized and will be ignored.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza Hmm it runs without errors or warnings but still doesn't seem to be working, not removing those strings. The only thing it removes is the punctuation. In the output it shows the value of i as 7 all the way down, like it's stuck on one iteration outside the dimensions of the array. There is an info message in the log, not sure if it's relevant: Character variables have defaulted to a length of 200 at the places given by: (Line):(Column). Truncation can result. That is for company_name.
this is the current code:
data test;
set CompanyListing;
array rmve_strs(6) $ _temporary_ ('LTD','LLC','INC','CORPORATION','COMPANY','LP');
do i=1 to dim(rmve_strs);
company_name = tranwrd(strip(compress(CompanyName, ,"p")), rmve_strs(i), '');
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just like @ChrisNZ said, but taking care of spaces and punctuation. Adjust to your needs:
data MYCOMPANIES;
do COMPANYNAME = 'Big Corp., Local Division', 'Incredible Company Ltd.';
COMPANYNAME=prxchange('s/\s*\b(LTD|CO|INC|LLC|CORP)\b\.?\s*//io', 1, COMPANYNAME);
putlog COMPANYNAME=;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My understanding is that OP wants
I.B.M.
to become
IBM
and
Big Corp., Local Division
to become
Big Local Division