Hi all,
i'm trying to match two different firm names using COMPGED (maybe SPECID, SOUNDEX can be used as alternative method)
but before that, I am thinking of making firm names similar as possible, by removing abbreviations at the end
(e.g: CO LTD, PTE LTD, Limited, INC, Incorporated, AG, SpA, Corp)
simplest way would be using the function TRANWRD, but i'm afraid this would replace not only abbreviations but letters that are part of the firm names. (say, if I was trying to remove 'Corp' at the end of firm names but by using TRANWRD i made 'Corpastta SpA' to 'astta SpA')
Thus, what is the best way to do this and has anyone done the same work as me?
maybe I should use reg expression?
Hello,
You can use perl regular expression for pattern matching.
data have;
infile datalines truncover;
input word $50.;
datalines;
Corpastta AB Crop
Corpastta Crop AB
AB Corpastta Crop
AB Corpastta
Crop AB Corpastta
ABCrop Corpastta
;
run;
data want;
set have;
position=prxmatch('m/ Crop | Crop|^Crop /io',word);
new_word1=ifc(position^=0,ifc(position>1,substr(word,1,prxmatch('m/ Crop | Crop|^Crop /io',word)-1),''),word);
new_word2=ifc(position^=0,substr(word,prxmatch('m/ Crop | Crop|^Crop /io',word)+5),'');
required_word=catx(' ',new_word1,new_word2);
run;
You need to include the blanks for the strings that your looking for.
'm/ Crop | Crop|^Crop /io'
| | |_ ^(cap) for starting of the word and blank at the end.
| |_______ Starting with blank and ends the line
|_______________ Blank at starting and ending.
Please post example data in a usable form. See https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... for details on how to create usable data.
If it has delimeters, then use that, e.g:
data want; length want $200; test="Something co"; do i=1 to countw(test," "); if scan(test,i," ") ne "co" then want=catx(" ",want,scan(test,i," ")); end; run;
Of course that is only showing one removal and with spaces, but you get the idea, and no test data in the form of a datastep prevents anything further.
Hello,
You can use perl regular expression for pattern matching.
data have;
infile datalines truncover;
input word $50.;
datalines;
Corpastta AB Crop
Corpastta Crop AB
AB Corpastta Crop
AB Corpastta
Crop AB Corpastta
ABCrop Corpastta
;
run;
data want;
set have;
position=prxmatch('m/ Crop | Crop|^Crop /io',word);
new_word1=ifc(position^=0,ifc(position>1,substr(word,1,prxmatch('m/ Crop | Crop|^Crop /io',word)-1),''),word);
new_word2=ifc(position^=0,substr(word,prxmatch('m/ Crop | Crop|^Crop /io',word)+5),'');
required_word=catx(' ',new_word1,new_word2);
run;
You need to include the blanks for the strings that your looking for.
'm/ Crop | Crop|^Crop /io'
| | |_ ^(cap) for starting of the word and blank at the end.
| |_______ Starting with blank and ends the line
|_______________ Blank at starting and ending.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.