Dear all,
When I use the following codes
data HAVE;
input NAME :& $500.;
if find(NAME,' CO ') then do;
if prxmatch('/(.*) CO\s?[\(.*\)|\[.*\]|\{.*\}|''.*''|".*"](.*)/',NAME ) then do;
NAME_B=prxchange('s/(.*) CO\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/\1 CO \2/',1,NAME );
NAME_address=prxchange('s/(.*) CO\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/\3/',1,NAME );
end;
else do;
NAME_B=substr(NAME,1,find(NAME,' CO ')+2);
NAME_address=strip(substr(NAME,find(NAME,' CO ')+4, length(NAME)));
end;
end;
cards;
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY)
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY'
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY"
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD
3M CO (1986) A LTD IN UK
3M CO 'US' A UK CO
3M CO "UK" A UK LTD
3M CO A UK LTD
run; I got
| NAME | NAME_B | NAME_address |
| 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | 3M CO ( | MINNESOTA MINING AND MANUFACTURING COMPANY) |
| 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | 3M CO ' | MINNESOTA MINING AND MANUFACTURING COMPANY' |
| 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | 3M CO " | MINNESOTA MINING AND MANUFACTURING COMPANY" |
| 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK | 3M CO ( | MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK |
| 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO | 3M CO ' | MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO |
| 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD | 3M CO " | MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD |
| 3M CO (1986) A LTD IN UK | 3M CO ( | 1986) A LTD IN UK |
| 3M CO 'US' A UK CO | 3M CO ' | US' A UK CO |
| 3M CO "UK" A UK LTD | 3M CO " | UK" A UK LTD |
| 3M CO A UK LTD | 3M CO | A UK LTD |
However, I expect to get
| NAME | NAME_B | NAME_address |
| 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | |
| 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | |
| 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | |
| 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK | 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | A LTD IN UK |
| 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO | 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | A UK CO |
| 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD | 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | A UK LTD |
| 3M CO (1986) A LTD IN UK | 3M CO (1986) | A LTD IN UK |
| 3M CO 'US' A UK CO | 3M CO 'US' | A UK CO |
| 3M CO "UK" A UK LTD | 3M CO "UK" | A UK LTD |
| 3M CO A UK LTD | 3M CO | A UK LTD |
I expect to 1.separate the name variable into two different variables (which are NAME_B, NAME_address) based on the company suffix 'CO' and 2. the string among (), [], {}, ' ', " " should be put in the NAME_B variable.
Could you please give me some suggestions about this?
thanks in advance.
See if this helps.
data HAVE(drop= start LENGTH prx_id);
input NAME :& $500.;
Length NAME_B $500. NAME_address $100.;
if _n_=1 then prx_id=prxparse("/A\s[UL][KT][\sD][\w\s][\w]+\s?(UK)*/");
retain prx_id;
call prxsubstr(prx_id,NAME,START,LENGTH);
if START > 0 then do;
NAME_B=substrn(NAME,1,start-1);
NAME_address=substrn(NAME,start,LENGTH);
end;
else do;
NAME_B=Name;
NAME_address='';
end;
cards;
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY)
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY'
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY"
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD
3M CO (1986) A LTD IN UK
3M CO 'US' A UK CO
3M CO "UK" A UK LTD
3M CO A UK LTD
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.