I have a dataset with two string columns that each contain multiple words (locations, for example).
Rowwise, I want to flag observations where the two strings have any words in common.
Does anyone know a way to achieve this?
Example strings and expected output:
col1 | col2 | flag
London, United Kingdom | British Columbia | 0
Mexico City | New Mexico | 1
Boston | New York | 0
United States of America | South America | 1
Welcome to the SAS Community! 🙂
You can do something like this
data have;
input col1 $ 1-24 col2 $ 27-43;
infile datalines dlm='|';
datalines;
London, United Kingdom | British Columbia
Mexico City | New Mexico
Boston | New York
United States of America| South America
;
data want(drop=i);
set have;
flag=0;
do i=1 to countw(col2);
if findw(col1, scan(col2, i, ' '), ' ', 'sit') then do;
flag=1;
end;
end;
run;
Result:
col1 col2 flag London, United Kingdom British Columbia 0 Mexico City New Mexico 1 Boston New York 0 United States of America South America 1
Welcome to the SAS Community! 🙂
You can do something like this
data have;
input col1 $ 1-24 col2 $ 27-43;
infile datalines dlm='|';
datalines;
London, United Kingdom | British Columbia
Mexico City | New Mexico
Boston | New York
United States of America| South America
;
data want(drop=i);
set have;
flag=0;
do i=1 to countw(col2);
if findw(col1, scan(col2, i, ' '), ' ', 'sit') then do;
flag=1;
end;
end;
run;
Result:
col1 col2 flag London, United Kingdom British Columbia 0 Mexico City New Mexico 1 Boston New York 0 United States of America South America 1
Hi @astrand
Here is something you can try. please note that it is case sensitive.
Best,
data have;
input col1 $ 1-24 col2 $ 26-42;
cards;
London, United Kingdom British Columbia
Mexico City New Mexico
Boston New York
;
run;
/*Counts the max number of words in a the first column*/
proc sql noprint;
select max(countw(col1))
into: maxword
from have;
quit;
data want;
set have;
array wordloc (&maxword) $ ;
array flagloc (&maxword) ;
do i=1 to &maxword;
wordloc(i) = scan(col1,i);
if index(col2,trim(wordloc(i)))>0 and wordloc(i) ne "" then flagloc(i)=1;
end;
flag = sum(of flagloc(*));
if flag = . then flag =0;
drop i;
run;
proc print data=want;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.