Dear all,
When I use the following codes
data HAVE;
input NAME :& $500.;
if find(NAME,' CO ') then do;
if prxmatch('/(.*) CO\s?[\(.*\)|\[.*\]|\{.*\}|''.*''|".*"](.*)/',NAME ) then do;
NAME_B=prxchange('s/(.*) CO\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/\1 CO \2/',1,NAME );
NAME_address=prxchange('s/(.*) CO\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/\3/',1,NAME );
end;
else do;
NAME_B=substr(NAME,1,find(NAME,' CO ')+2);
NAME_address=strip(substr(NAME,find(NAME,' CO ')+4, length(NAME)));
end;
end;
cards;
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY)
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY'
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY"
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD
3M CO (1986) A LTD IN UK
3M CO 'US' A UK CO
3M CO "UK" A UK LTD
3M CO A UK LTD
run;
I got
NAME | NAME_B | NAME_address |
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | 3M CO ( | MINNESOTA MINING AND MANUFACTURING COMPANY) |
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | 3M CO ' | MINNESOTA MINING AND MANUFACTURING COMPANY' |
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | 3M CO " | MINNESOTA MINING AND MANUFACTURING COMPANY" |
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK | 3M CO ( | MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK |
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO | 3M CO ' | MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO |
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD | 3M CO " | MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD |
3M CO (1986) A LTD IN UK | 3M CO ( | 1986) A LTD IN UK |
3M CO 'US' A UK CO | 3M CO ' | US' A UK CO |
3M CO "UK" A UK LTD | 3M CO " | UK" A UK LTD |
3M CO A UK LTD | 3M CO | A UK LTD |
However, I expect to get
NAME | NAME_B | NAME_address |
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | |
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | |
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | |
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK | 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) | A LTD IN UK |
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO | 3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' | A UK CO |
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD | 3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" | A UK LTD |
3M CO (1986) A LTD IN UK | 3M CO (1986) | A LTD IN UK |
3M CO 'US' A UK CO | 3M CO 'US' | A UK CO |
3M CO "UK" A UK LTD | 3M CO "UK" | A UK LTD |
3M CO A UK LTD | 3M CO | A UK LTD |
I expect to 1.separate the name variable into two different variables (which are NAME_B, NAME_address) based on the company suffix 'CO' and 2. the string among (), [], {}, ' ', " " should be put in the NAME_B variable.
Could you please give me some suggestions about this?
thanks in advance.
See if this helps.
data HAVE(drop= start LENGTH prx_id);
input NAME :& $500.;
Length NAME_B $500. NAME_address $100.;
if _n_=1 then prx_id=prxparse("/A\s[UL][KT][\sD][\w\s][\w]+\s?(UK)*/");
retain prx_id;
call prxsubstr(prx_id,NAME,START,LENGTH);
if START > 0 then do;
NAME_B=substrn(NAME,1,start-1);
NAME_address=substrn(NAME,start,LENGTH);
end;
else do;
NAME_B=Name;
NAME_address='';
end;
cards;
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY)
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY'
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY"
3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) A LTD IN UK
3M CO 'MINNESOTA MINING AND MANUFACTURING COMPANY' A UK CO
3M CO "MINNESOTA MINING AND MANUFACTURING COMPANY" A UK LTD
3M CO (1986) A LTD IN UK
3M CO 'US' A UK CO
3M CO "UK" A UK LTD
3M CO A UK LTD
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.