BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
astrand
Calcite | Level 5

I have a dataset with two string columns that each contain multiple words (locations, for example).

 

Rowwise, I want to flag observations where the two strings have any words in common.

 

Does anyone know a way to achieve this?

 

Example strings and expected output:

 

col1 | col2 | flag

London, United Kingdom | British Columbia  | 0

Mexico City | New Mexico | 1

Boston | New York | 0

United States of America | South America | 1

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Welcome to the SAS Community! 🙂

 

You can do something like this

 

data have;
input col1 $ 1-24 col2 $ 27-43;
infile datalines dlm='|';
datalines;
London, United Kingdom  | British Columbia
Mexico City             | New Mexico      
Boston                  | New York        
United States of America| South America   
;

data want(drop=i);
    set have;
    flag=0;
    do i=1 to countw(col2);
      if findw(col1, scan(col2, i, ' '), ' ', 'sit') then do;
        flag=1;
      end;    
    end;
run;

Result:

 

col1                      col2             flag
London, United Kingdom    British Columbia 0
Mexico City               New Mexico       1
Boston                    New York         0
United States of America  South America    1

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Welcome to the SAS Community! 🙂

 

You can do something like this

 

data have;
input col1 $ 1-24 col2 $ 27-43;
infile datalines dlm='|';
datalines;
London, United Kingdom  | British Columbia
Mexico City             | New Mexico      
Boston                  | New York        
United States of America| South America   
;

data want(drop=i);
    set have;
    flag=0;
    do i=1 to countw(col2);
      if findw(col1, scan(col2, i, ' '), ' ', 'sit') then do;
        flag=1;
      end;    
    end;
run;

Result:

 

col1                      col2             flag
London, United Kingdom    British Columbia 0
Mexico City               New Mexico       1
Boston                    New York         0
United States of America  South America    1
astrand
Calcite | Level 5
This worked perfectly. Thank you!
ed_sas_member
Meteorite | Level 14

Hi @astrand 

 

Here is something you can try. please note that it is case sensitive.

 

Best,

 

data have;
	input col1 $ 1-24 col2 $ 26-42;
	cards;
London, United Kingdom   British Columbia
Mexico City              New Mexico
Boston                   New York
	;
run;

/*Counts the max number of words in a the first column*/
proc sql noprint;
	select max(countw(col1))
	into: maxword
	from have;
quit;

data want;
	set have;
	
	array wordloc (&maxword) $ ;
	array flagloc (&maxword)   ;
	
	do i=1 to &maxword;
		wordloc(i) = scan(col1,i);
		if index(col2,trim(wordloc(i)))>0 and wordloc(i) ne "" then flagloc(i)=1;
	end;
		flag = sum(of flagloc(*));
		if flag = . then flag =0;
	drop i;
	
run;

proc print data=want;
run;
Astounding
PROC Star
A tip to speed up the process...

Stop processing once you find a match. For example, if you have this statement:

do i=1 to countw(col2) ;

switch to:

do i=1 to countw(col2) until (flag=1) ;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 727 views
  • 5 likes
  • 4 in conversation