I have a dataset with two string columns that each contain multiple words (locations, for example).
Rowwise, I want to flag observations where the two strings have any words in common.
Does anyone know a way to achieve this?
Example strings and expected output:
col1 | col2 | flag
London, United Kingdom | British Columbia | 0
Mexico City | New Mexico | 1
Boston | New York | 0
United States of America | South America | 1
Welcome to the SAS Community! 🙂
You can do something like this
data have;
input col1 $ 1-24 col2 $ 27-43;
infile datalines dlm='|';
datalines;
London, United Kingdom | British Columbia
Mexico City | New Mexico
Boston | New York
United States of America| South America
;
data want(drop=i);
set have;
flag=0;
do i=1 to countw(col2);
if findw(col1, scan(col2, i, ' '), ' ', 'sit') then do;
flag=1;
end;
end;
run;
Result:
col1 col2 flag London, United Kingdom British Columbia 0 Mexico City New Mexico 1 Boston New York 0 United States of America South America 1
Welcome to the SAS Community! 🙂
You can do something like this
data have;
input col1 $ 1-24 col2 $ 27-43;
infile datalines dlm='|';
datalines;
London, United Kingdom | British Columbia
Mexico City | New Mexico
Boston | New York
United States of America| South America
;
data want(drop=i);
set have;
flag=0;
do i=1 to countw(col2);
if findw(col1, scan(col2, i, ' '), ' ', 'sit') then do;
flag=1;
end;
end;
run;
Result:
col1 col2 flag London, United Kingdom British Columbia 0 Mexico City New Mexico 1 Boston New York 0 United States of America South America 1
Hi @astrand
Here is something you can try. please note that it is case sensitive.
Best,
data have;
input col1 $ 1-24 col2 $ 26-42;
cards;
London, United Kingdom British Columbia
Mexico City New Mexico
Boston New York
;
run;
/*Counts the max number of words in a the first column*/
proc sql noprint;
select max(countw(col1))
into: maxword
from have;
quit;
data want;
set have;
array wordloc (&maxword) $ ;
array flagloc (&maxword) ;
do i=1 to &maxword;
wordloc(i) = scan(col1,i);
if index(col2,trim(wordloc(i)))>0 and wordloc(i) ne "" then flagloc(i)=1;
end;
flag = sum(of flagloc(*));
if flag = . then flag =0;
drop i;
run;
proc print data=want;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.