Dear all,
I am processing the variable which inlcudes 'LTD', 'CO', 'CO LTD', 'PLC', 'CORP' and trying to separate the company name from address/introduction information. However, the following variable includes at least two 'company suffix' (i.e., LTD, CO, CO LTD, PLC, CORP), so I expect to process the variable only once. But I do not know how to do it. Could you please give me some suggestions about this?
for following data,
data HAVE;
input NAME_S221 :& $500.;
if find(NAME_S221,' LTD ')>0 and find(NAME_S221,' LTD PARTNERSHIP ')=0 and prxmatch('/(.*) LTD\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/',NAME_S221 )=0 then do;
NAME_B=substr(NAME_S221,1,find(NAME_S221,' LTD ')+3);
NAME_address=strip(substr(NAME_S221,find(NAME_S221,' LTD ')+5, length(NAME_S221)));
end;
if find(NAME_S221,' CO ')>0 and prxmatch('/(.*) CO\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/',NAME_S221 )=0 then do;
NAME_B=substr(NAME_S221,1,find(NAME_S221,' CO ')+2);
NAME_address=strip(substr(NAME_S221,find(NAME_S221,' CO ')+4, length(NAME_S221)));
end;
if find(NAME_S221,' CO LTD ')>0 and prxmatch('/(.*) CO LTD\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/',NAME_S221 )=0 then do;
NAME_B=substr(NAME_S221,1,find(NAME_S221,' CO LTD ')+6);
NAME_address=strip(substr(NAME_S221,find(NAME_S221,' CO LTD ')+8, length(NAME_S221)));
end;
if find(NAME_S221,' PLC ')>0 and prxmatch('/(.*) PLC\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)(.*)/',NAME_S221 )=0 then do;
NAME_B=substr(NAME_S221,1,find(NAME_S221,' PLC ')+3);
NAME_address=strip(substr(NAME_S221,find(NAME_S221,' PLC ')+5, length(NAME_S221)));
end;
if find(NAME_S221,' CORP ')>0 and prxmatch('/(.*) CORP\s?([(\(.*\))|(\[.*\])|(\{.*\})|(''.*'')|(".*")]+)/',NAME_S221 )=0 then do;
NAME_B=substr(NAME_S221,1,find(NAME_S221,' CORP ')+4);
NAME_address=strip(substr(NAME_S221,find(NAME_S221,' CORP ')+6, length(NAME_S221)));
end;
cards;
BOTO (LICENSES) LTD AN ISLE OF MAN CO OF 3/F
BOTO (LICENSES) LTD AN ISLE OF MAN LTD OF 3/F
BPB LTD A U.K. CORP
BROADCOM UK LTD A DELAWARE CORP
ARCH TIMBER PROTECTION LTD A PRIVATE LIMITED CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM
AVDEL SYSTEMS LTD A BRITISH CO
run;
I get
NAME_S221 | NAME_B | NAME_address |
BOTO (LICENSES) LTD AN ISLE OF MAN CO OF 3/F | BOTO (LICENSES) LTD AN ISLE OF MAN CO | OF 3/F |
BOTO (LICENSES) LTD AN ISLE OF MAN LTD OF 3/F | BOTO (LICENSES) LTD | AN ISLE OF MAN LTD OF 3/F |
BPB LTD A U.K. CORP | BPB LTD A U.K. CORP | |
BROADCOM UK LTD A DELAWARE CORP | BROADCOM UK LTD A DELAWARE CORP | |
ARCH TIMBER PROTECTION LTD A PRIVATE LIMITED CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM | ARCH TIMBER PROTECTION LTD A PRIVATE LIMITED CO | ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM |
AVDEL SYSTEMS LTD A BRITISH CO | AVDEL SYSTEMS LTD A BRITISH CO |
However, I expect to process the variable 'NAME_S221' only once. namely, I expect to get
NAME_S221 | NAME_B | NAME_address |
BOTO (LICENSES) LTD | BOTO (LICENSES) LTD | AN ISLE OF MAN CO OF 3/F |
BOTO (LICENSES) LTD AN ISLE OF MAN LTD OF 3/F | BOTO (LICENSES) LTD | AN ISLE OF MAN LTD OF 3/F |
BPB LTD A U.K. CORP | BPB LTD | A U.K. CORP |
BROADCOM UK LTD A DELAWARE CORP | BROADCOM UK LTD | A DELAWARE CORP |
ARCH TIMBER PROTECTION LTD A PRIVATE LIMITED CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM | ARCH TIMBER PROTECTION LTD | A PRIVATE LIMITED CO ORGANISED UNDER THE LAWS OF THE UNITED KINDGOM |
AVDEL SYSTEMS LTD A BRITISH CO | AVDEL SYSTEMS LTD | A BRITISH CO |
When you are searching for values that may appear as part of another key value, such as your 'CO' and 'LTD' that are part of 'CO LTD' then you likely should have the LONGER value processed first.
You may need to have some ELSE. before the second and subsequent "if find(NAME_S221,' LTD ')>0 " clauses.
As written every single one of those comparisons is done and seems to be causing part of your concern.
When you are searching for values that may appear as part of another key value, such as your 'CO' and 'LTD' that are part of 'CO LTD' then you likely should have the LONGER value processed first.
You may need to have some ELSE. before the second and subsequent "if find(NAME_S221,' LTD ')>0 " clauses.
As written every single one of those comparisons is done and seems to be causing part of your concern.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.