Hello,
How to exclude the 'place_name' from the company name?
for the following company name,
NAME |
454 LIFE SCIENCES, A ROCHE COMPANY |
A & A MATERIAL CORPORATION |
A A HISTROM CORPORATION |
A & A MANUFACTURING COMPANY |
A. AHLSTROM / A FINNISH CORPORATION |
A. STUCKI COMPANY, A DELAWARE CORPORATION |
AB VOLVO, A SWEDISH BODY CORPORATE |
ACCESS DATA CORPORATION A BROADRIDGE COMPANY |
AMERITECH PLASTICS INCORPORATED (A DELAWARE CORPORATION) |
ANEMOSTAT PRODUCTS DIVISION, DYNAMICS CORP. OF AMERCA, A NEW YORK CORPORATION |
ANEST IWATA CORPORATION (A JAPANESE CORPORATION) |
ANHUI HE AN INFORMATION TECHNOLOGY COMPANY |
ANHUI HUAI AN CHEMICAL GROUP COMPANY |
ANHUI JIN AN KANG BIOTECHNOLOGY COMPANY |
I would like to
1. change words like 'A JAPANESE CORPORATION' to 'CORP' ,
and 2 delete words like '(A JAPANESE CORPORATION)', as they are included in the '( )',.
However, as some of the word in the example are not place_name (for example 'A ROCHE COMPANY', 'A MATERIAL CORPORATION' are not place name), I try to use the following codes
data want;
set have;
NAME=prxchange('s/(\.|\s)AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\sCOMPANY$/ CO/',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/\(AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\sCOMPANY\)/ /',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/(\.|\s|\()AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\s(BODY\s)?CORPORAT(ION|E)\)?/ CORP/',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/\(AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\s(BODY\s)?CORPORAT(ION|E)\)/ /',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/(\.|\s|\()AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\s(BODY\s)?LIMITED\sCOMPANY$/ CO/',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/\(AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\s(BODY\s)?LIMITED\sCOMPANY\)/ /',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/(\.|\s|\()AN?\s(BELGIAN|BRITISH|BVI|CA(LIFORNIA)?|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|HONG\sKONG|GERMAN|IRISH|ISRAEL|JAPAN(ESE)?|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sHAMPSHIRE|NEW\sYORK|NEVADA|NJ|OREGON|PENNSYLVANIA|RHODE\sISLANDS?|RICHMOND|SINGAPORE|SPANISH|SWEDISH|SWISS|TEXAS|UTAH|VIRGINIA|WASHINGTON)\s(BODY\s)?LIMITED\sLIABILITY\sCOMPANY$/ LLC/',-1,cat(strip(compbl(NAME))));
NAME=prxchange('s/\(AN?\s(BELGIAN|BRITISH|BVI|CALIFORNIA|CAYMAN\sISLAND|DELAWARE|FINNISH|FRENCH|GERMAN|JAPANESE|KENTUCKY|LOUISIANA|MACHINERYNJ|NEW\sYORK|RHODE\sISLANDS?|SINGAPORE|SWEDISH|SWISS|UTAH)\s(BODY\s)?LIMITED\sLIABILITY\sCOMPANY\)/ /',-1,cat(strip(compbl(NAME))));
run;
but, it is too long to run the code.
I have a large number of data and expect to change many types of company suffix ( such as 'COMPANY', 'LIMITED COMPANY', 'CORPORATION','CORPORATE','COOPERATIVE','LIMITED\sLIABILITY\sCOMPANY').
I expect to get the result like
NAME | New_Name | changed |
454 LIFE SCIENCES, A ROCHE COMPANY | 455 LIFE SCIENCES, A ROCHE COMPANY | |
A & A MATERIAL CORPORATION | A & A MATERIAL CORPORATION | |
A A HISTROM CORPORATION | A A HISTROM CORPORATION | |
A & A MANUFACTURING COMPANY | A & A MANUFACTURING COMPANY | |
A. AHLSTROM / A FINNISH CORPORATION | A. AHLSTROM CORP | 1 |
A. STUCKI COMPANY, A DELAWARE CORPORATION | A. STUCKI COMPANY CORP | 1 |
AB VOLVO, A SWEDISH BODY CORPORATE | AB VOLVO CORP | 1 |
ACCESS DATA CORPORATION A BROADRIDGE COMPANY | ACCESS DATA CORPORATION A BROADRIDGE COMPANY | |
AMERITECH PLASTICS INCORPORATED (A DELAWARE CORPORATION) | AMERITECH PLASTICS INCORPORATED | |
ANEMOSTAT PRODUCTS DIVISION, DYNAMICS CORP. OF AMERCA, A NEW YORK CORPORATION | ANEMOSTAT PRODUCTS DIVISION, DYNAMICS CORP. OF AMERCA CORP | 1 |
ANEST IWATA CORPORATION (A JAPANESE CORPORATION) | ANEST IWATA CORPORATION | 1 |
ANHUI HE AN INFORMATION TECHNOLOGY COMPANY | ANHUI HE AN INFORMATION TECHNOLOGY COMPANY | |
ANHUI HUAI AN CHEMICAL GROUP COMPANY | ANHUI HUAI AN CHEMICAL GROUP COMPANY | |
ANHUI JIN AN KANG BIOTECHNOLOGY COMPANY | ANHUI JIN AN KANG BIOTECHNOLOGY COMPANY |
Could you please give me some suggestion about this?
is there any method to simply the code?
thanks in advance.
1. What do you use the CAT function?
2. Use the COMPBL function first, once and for all
3. Regular Expression are very power full, but expensive to run, so slowness is expected
4. Function INDEX is very fast. Test the string before calling the PRXCHANGE function:
if index(NAME,'COMPANY') then NAME=prxchange(...
5. Try adding an o after the last slash for the RegEx so it compiles once only.
6. Can you have 1 instead of -1?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.