SAS Programming

Walternate · Posted 02-26-2020 09:57 AM

Hi,

I have a data file which has two string variables that contain multiple words each:

V1 V2

abc def ghi jkl abc

def ghi jkl abc abc

abcdef abc

abc abc def

Each string can contain one or more words. What I'm trying to do is identify records where any of the full words in V2 is found (also as a full word) in V1. So the first, second and last records would be matches, but the third record would not because the value in V2 does not match to a full word value found in V1.

I did come up with a way to do this using multiple scan functions but it's really clunky and inefficient. I'm hoping for a more efficient and automated approach.

Any help is much appreciated.

PeterClemmensen · Posted 02-26-2020 10:16 AM

Here is how I would do it

data have;
input V1 $ 1-16 V2 $ 17-25;
datalines;
abc def ghi jkl abc    
def ghi jkl abc abc    
abcdef          abc    
abc             abc def
;

data want (drop=i);
    set have;
    do i = 1 to countw(V2);
        if findw(V1, scan(V2, i)) then do;
            output;
            leave;
        end;
    end;
run;

Result:

V1               V2 
abc def ghi jkl  abc 
def ghi jkl abc  abc 
abc              abc def

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PaigeMiller · Posted 02-26-2020 10:01 AM

I did come up with a way to do this using multiple scan functions but it's really clunky and inefficient. I'm hoping for a more efficient and automated approach.

I think you need a DO loop in a DATA step, along with the COUNTW and SCAN and FIND functions. So, it's not clear what you did or whether it can be improved. Can you show us what you have tried?

--
Paige Miller

Walternate · Posted 02-26-2020 10:08 AM

Basically a lot of scan functions:

if scan(v1, 1, ' ') = scan(v2, 1, ' ') or

scan(v1, 2, ' ') = scan(v2, 1, ' ') etc.

This is not a great approach because the strings can have up to 9 words (and it's super clunky).

PeterClemmensen · Posted 02-26-2020 10:16 AM

Here is how I would do it

data have;
input V1 $ 1-16 V2 $ 17-25;
datalines;
abc def ghi jkl abc    
def ghi jkl abc abc    
abcdef          abc    
abc             abc def
;

data want (drop=i);
    set have;
    do i = 1 to countw(V2);
        if findw(V1, scan(V2, i)) then do;
            output;
            leave;
        end;
    end;
run;

Result:

V1               V2 
abc def ghi jkl  abc 
def ghi jkl abc  abc 
abc              abc def

The DATA to DATA Step Macro
Blog: SASnrd

PaigeMiller · Posted 02-26-2020 10:20 AM

Thanks, @PeterClemmensen , this is exactly what I had in mind.

--
Paige Miller

SAS Programming

Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Follow Us

What is...

SAS Programming

Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Re: Matching full words across two columns

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...