SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Walternate
Obsidian | Level 7

Hi,

 

I have a data file which has two string variables that contain multiple words each:

 

V1                             V2             

abc def ghi jkl           abc

def ghi jkl   abc         abc

abcdef                      abc

abc                           abc def

 

Each string can contain one or more words. What I'm trying to do is identify records where any of the full words in V2 is found (also as a full word) in V1. So the first, second and last records would be matches, but the third record would not because the value in V2 does not match to a full word value found in V1. 

 

I did come up with a way to do this using multiple scan functions but it's really clunky and inefficient. I'm hoping for a more efficient and automated approach. 

 

Any help is much appreciated. 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is how I would do it

 

data have;
input V1 $ 1-16 V2 $ 17-25;
datalines;
abc def ghi jkl abc    
def ghi jkl abc abc    
abcdef          abc    
abc             abc def
;

data want (drop=i);
    set have;
    do i = 1 to countw(V2);
        if findw(V1, scan(V2, i)) then do;
            output;
            leave;
        end;
    end;
run;

 

Result:

 

V1               V2 
abc def ghi jkl  abc 
def ghi jkl abc  abc 
abc              abc def 

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I did come up with a way to do this using multiple scan functions but it's really clunky and inefficient. I'm hoping for a more efficient and automated approach. 

 

I think you need a DO loop in a DATA step, along with the COUNTW and SCAN and FIND functions. So, it's not clear what you did or whether it can be improved. Can you show us what you have tried?

--
Paige Miller
Walternate
Obsidian | Level 7

Basically a lot of scan functions:

 

if scan(v1, 1, ' ') = scan(v2, 1, ' ') or

  scan(v1, 2, ' ') = scan(v2, 1, ' ') etc. 

 

This is not a great approach because the strings can have up to 9 words (and it's super clunky).

PeterClemmensen
Tourmaline | Level 20

Here is how I would do it

 

data have;
input V1 $ 1-16 V2 $ 17-25;
datalines;
abc def ghi jkl abc    
def ghi jkl abc abc    
abcdef          abc    
abc             abc def
;

data want (drop=i);
    set have;
    do i = 1 to countw(V2);
        if findw(V1, scan(V2, i)) then do;
            output;
            leave;
        end;
    end;
run;

 

Result:

 

V1               V2 
abc def ghi jkl  abc 
def ghi jkl abc  abc 
abc              abc def 

 

PaigeMiller
Diamond | Level 26

Thanks, @PeterClemmensen , this is exactly what I had in mind.

--
Paige Miller

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1563 views
  • 1 like
  • 3 in conversation