Hello,
How to find the Neighbouring Repetitive Words?and get the repeated word.
For example, for the table 'have'
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
I would like to get
APPLE LTD LTD | LTD
Because it has an 'LTD LTD' which are together and are the same words. and then I would like to get a new variable which lists the word. in this example is ‘LTD'.
the
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
should not be extracted. although they have the same words they are not together.
Could you please give me some suggestion about this? Thanks in advance.
data have;
input string :$200.;
infile datalines dlm=',';
string=upcase(string);
datalines;
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
;
run;
Hi @Alexxxxxxx
Some fun stuff
data have;
input string :$200.;
infile datalines dlm=',';
string=upcase(string);
datalines;
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
APPLE LTD LTD INC INC
;
run;
data want;
if _n_ then do;
dcl hash H () ;
h.definekey ("repeat") ;
h.definedata ("repeat") ;
h.definedone () ;
dcl hiter hi('h');
end;
set have;
do _n_=2 to countw(string,' ');
if scan(string,_n_,' ')=scan(string,_n_-1,' ') then do;
repeat= scan(string,_n_,' ');
h.replace();
end;
end;
do while(hi.next()=0);
output;
end;
h.clear();
run;
Hello @Alexxxxxxx Are you expecting to have just one set of repeating words in the string or more. If more, what would the result look like?
Hello,
for the table
name |
APPLE LTD LTD |
USA Australia Japan USA |
FOOTBALL LTD FOOTBALL LP |
I expect to get
name | repeat |
APPLE LTD LTD | LTD |
I understood that. That is as simple as
data have;
input string :$200.;
infile datalines dlm=',';
string=upcase(string);
datalines;
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
;
run;
data want;
set have;
do _n_=2 to countw(string,' ');
if scan(string,_n_,' ')=scan(string,_n_-1,' ') then want=scan(string,_n_,' ');
end;
run;
My question though is what if there are more one set of repeating words.
For example,
APPLE LTD LTD INC INC
Hello,
appreciate for your remind.
I expect to get both of them if it happens.
just like
name | repeat |
APPLE LTD LTD INC INC | LTD |
APPLE LTD LTD INC INC | INC |
Could you please give me some suggestions about this?
thanks a lot.
Hi @Alexxxxxxx
Some fun stuff
data have;
input string :$200.;
infile datalines dlm=',';
string=upcase(string);
datalines;
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
APPLE LTD LTD INC INC
;
run;
data want;
if _n_ then do;
dcl hash H () ;
h.definekey ("repeat") ;
h.definedata ("repeat") ;
h.definedone () ;
dcl hiter hi('h');
end;
set have;
do _n_=2 to countw(string,' ');
if scan(string,_n_,' ')=scan(string,_n_-1,' ') then do;
repeat= scan(string,_n_,' ');
h.replace();
end;
end;
do while(hi.next()=0);
output;
end;
h.clear();
run;
Try this:
data have ;
input @1 str & $upcase30. ;
cards ;
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
APPLE LTD LTD INC INC
;
run ;
data want (drop = _:) ;
set have ;
length _s $ 32767 repeat $ 30 ;
do _x = 1 to countw (str) ;
repeat = scan (str, _x) ;
if repeat ne scan (str, _x + 1) or findw (_s, repeat) then continue ;
output ;
_s = catx (" ", _s, repeat) ;
end ;
run ;
Kind regards
Paul D.
data have;
input string :$200.;
infile datalines dlm=',';
string=upcase(string);
datalines;
APPLE LTD LTD
USA Australia Japan USA
FOOTBALL LTD FOOTBALL LP
APPLE LTD LTD INC INC
;
run;
data want(drop=st s l);
if _N_ = 1 then _iorc_=prxparse('/\b(\w+)\b\s*(\1)\b/');
set have;
st=string;
do while (prxmatch(_iorc_, st));
repeat=prxposn(_iorc_, 2, st);
output;
call prxposn(_iorc_, 2, s, l);
st=substr(st, s+l+1);
end;
run;
Result:
string repeat APPLE LTD LTD LTD APPLE LTD LTD INC INC LTD APPLE LTD LTD INC INC INC
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.