Thanks @ChrisNZ I really appreciate your code and running it like below, %MACRO CompnayNameM(no=,Company_name=);
data step9.Patstat_total_hrm_Step22;
Set step9.Patstat_total_hrm_Step22;
&Company_name._S22=&Company_name._S21;
&Company_name._S22=cat(' ', &Company_name._S22,' ');
&Company_name._S221=prxchange('s/( LTD[^\w\d]+CO[^\w\d]* | CO[^\w\d]+LTD[^\w\d]* )/ CO LTD /',1,&Company_name._S221 );
&Company_name._S221=prxchange('s/( ,?L ?\.? ?T ?\.? ?D ?\.? ?,? | [^\w\d]*LIMITED[^\w\d]* | [^\w\d]*LTD[^\w\d]* )/ LTD /',1,&Company_name._S221 );
&Company_name._S221=prxchange('s/( ,?C ?\.? ?O ?\.? ?,? | [^\w\d]*CO[^\w\d]* | [^\w\d]*COMPA(N|M)Y[^\w\d]* ) / CO /',1,&Company_name._S221);
&Company_name._S221=prxchange('s/( LTD +CO | CO +LTD )/ CO LTD /',1,&Company_name._S221);
if &Company_name._S221 ne &Company_name._S22 then do;
&Company_name._B=prxchange('s/(.*)( CO LTD | LTD | CO )(.*)/$1$2/',1,&Company_name._S221);
&Company_name._address=prxchange('s/(.*)( CO LTD | LTD | CO ) *(.*)/$3/',1,&Company_name._S221);
end;
run;
%MEND CompnayNameM;
%CompnayNameM(no=1,Company_name=HRM_L2)
%CompnayNameM(no=2,Company_name=PERSON_NAME)
Run; however, the value like '3M COMPANY (MINNESOTA MINING AND MANUFACTURING COMPANY)' is converted to '3M COMPANY (MINNESOTA MINING AND MANUFACTURING CO'. is it possible to keep the strings includes (), [], {} 1 following the previous string, 2 not been processed. in this step? for '3M COMPANY (MINNESOTA MINING AND MANUFACTURING COMPANY)', I just expect to get '3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY)' rather than split to a new variable however, for 'ACE SPORTS LIMITED TAIWAN BRANCH (B.V.I.)' , I expect to get 'ACE SPORTS LTD' and split 'TAIWAN BRANCH (B.V.I.)' in a new variable named &Company_name._address. Could you please give me some suggestions about this? edit; besides, for value ' ARCH TIMBER PROTECTION LIMITED, A PRIVATE LIMITED COMPANY ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM', I get ' ARCH TIMBER PROTECTION LTD A PRIVATE LIMITED CO' and 'ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM', however, I expect to have ' ARCH TIMBER PROTECTION LTD' and 'A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM' but I know it is relatively difficult to process. so I think I can do it step by step. In conclusion, I expect to separate the 'name' variable into two new variables, the first is 'name_B' variable and the second named as 'address'. 1.standardise the company suffix (which have been done by your code) and then 2.split the strings based on the standardised suffixes, If the 'name' variable includes two different company suffix( such as 'CO,' and 'LTD', two observations should be created. the strings among (), [], {}, ' ', " " should be ignored and just following the string before the '(','[','{'," ' ", ' " '. for example, for original data, NAME 3M COMPANY (MINNESOTA MINING AND MANUFACTURING COMPANY) ACE SPORTS LIMITED TAIWAN BRANCH (B.V.I.) ARCH TIMBER PROTECTION LIMITED, A PRIVATE LIMITED COMPANY ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM I expec to do step1 NAME 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) ACE SPORTS LTD TAIWAN BRANCH (B.V.I.) ARCH TIMBER PROTECTION LTD A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM step2 NAME name_B name_address 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) 3M CO (MINNESOTA MINING AND MANUFACTURING COMPANY) ACE SPORTS LTD TAIWAN BRANCH (B.V.I.) ACE SPORTS LTD TAIWAN BRANCH (B.V.I.) ARCH TIMBER PROTECTION LTD A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM ARCH TIMBER PROTECTION LTD A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM ARCH TIMBER PROTECTION LTD A PRIVATE LTD ARCH TIMBER PROTECTION LTD A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM ARCH TIMBER PROTECTION LTD A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM ARCH TIMBER PROTECTION LTD A PRIVATE LTD CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM Could you please give me some suggestions about how to do it? thanks very much
... View more