I have duplicate words in a string which mimics other words with a minor differences in characters. But while identifying them using FIND/FINDW function it is considering both as same words and removing them.For example JAPAN and JAPANESE are both distinct words present in the same string , when trying to identify them using FIND/FINDW function both considered as same word and deleting one of them, but ideally both of them are different words. Same for FATEST and FATESTCD.How to identify the exact match to remove the duplicate words using FIND/FINDw or PRXMATCH functions
data have;
input string :$200.;
infile datalines dlm=',';
datalines;
apple orange kiwi apple grapes strawberry peach kiwi peach
China USA UK Australia Japanese USA UK Australian Japan Chinase
FOOTBALL BasketBall basketball Hockey football
FACAT FATESTCD FATEST FAOBJ STDT STDTC VISIT VISITNUM
;
data want(keep=string newstring);
set have;
newstring=scan(string, 1, ' ');
do i=2 to countw(string,' ');
word=scan(string, i, ' ');
found=find(newstring, word, 'it');
/* fnd=findw(newstring, word, 'it');*/
if found=0 then newstring=catx(' ', newstring, word);
end;
run;
Maybe it is to early for my brain, but what do expect as result?
Current Output
newstring apple orange kiwi grapes strawberry peach China USA UK Australia Japanese Australian Chinase------> JAPAN is deleted here,though it is unique word FOOTBALL BasketBall Hockey FACAT FATESTCD FAOBJ STDT STDTC VISIT VISITNUM---->FATEST is deleted here,though it is unique word
Expected output:
apple orange kiwi grapes strawberry peach China USA UK Australia Japanese Australian Japan Chinase FOOTBALL BasketBall Hockey FACAT FATESTCD FATEST FAOBJ STDT STDTC VISIT VISITNUM
The third argument of findw is the list of separating chars, you can't skip that parameter if you want to use the options-parameter. So try
found = findw(newstring, word, ' ', 'sit');
The option S had to be added, because T affects the third parameter, too.
This is my favorite document on prxmatch and other perl expression SAS functions:
https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
I use this all of the time.
Take a look!
Hi @keen_sas ,
I do not see any issue in your program. You may add lengths for newstring and word.
Here is the code I tried which seem to give what you want:
data want(keep=string newstring);
set have;
length word $30 newstring $200;
newstring=scan(string, 1, ' ');
do i=2 to countw(string,' ');
word=scan(string, i, ' ');
found=find(newstring, word, 'it');
if found=0 then newstring=catx(' ', newstring, word);
end;
run;
I didn't see anything wrong in output if you are using FINDW().
data want(keep=string newstring);
set have;
newstring=scan(string, 1, ' ');
do i=2 to countw(string,' ');
word=scan(string, i, ' ');
/* found=find(newstring, word, 'it'); */
found=findw(newstring, word, 'it');
if found=0 then newstring=catx(' ', newstring, word);
end;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: