Hi,
I have a data file which has two string variables that contain multiple words each:
V1 V2
abc def ghi jkl abc
def ghi jkl abc abc
abcdef abc
abc abc def
Each string can contain one or more words. What I'm trying to do is identify records where any of the full words in V2 is found (also as a full word) in V1. So the first, second and last records would be matches, but the third record would not because the value in V2 does not match to a full word value found in V1.
I did come up with a way to do this using multiple scan functions but it's really clunky and inefficient. I'm hoping for a more efficient and automated approach.
Any help is much appreciated.
Here is how I would do it
data have;
input V1 $ 1-16 V2 $ 17-25;
datalines;
abc def ghi jkl abc
def ghi jkl abc abc
abcdef abc
abc abc def
;
data want (drop=i);
set have;
do i = 1 to countw(V2);
if findw(V1, scan(V2, i)) then do;
output;
leave;
end;
end;
run;
Result:
V1 V2 abc def ghi jkl abc def ghi jkl abc abc abc abc def
I did come up with a way to do this using multiple scan functions but it's really clunky and inefficient. I'm hoping for a more efficient and automated approach.
I think you need a DO loop in a DATA step, along with the COUNTW and SCAN and FIND functions. So, it's not clear what you did or whether it can be improved. Can you show us what you have tried?
Basically a lot of scan functions:
if scan(v1, 1, ' ') = scan(v2, 1, ' ') or
scan(v1, 2, ' ') = scan(v2, 1, ' ') etc.
This is not a great approach because the strings can have up to 9 words (and it's super clunky).
Here is how I would do it
data have;
input V1 $ 1-16 V2 $ 17-25;
datalines;
abc def ghi jkl abc
def ghi jkl abc abc
abcdef abc
abc abc def
;
data want (drop=i);
set have;
do i = 1 to countw(V2);
if findw(V1, scan(V2, i)) then do;
output;
leave;
end;
end;
run;
Result:
V1 V2 abc def ghi jkl abc def ghi jkl abc abc abc abc def
Thanks, @PeterClemmensen , this is exactly what I had in mind.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.