Hi
I am trying to find all instances when two words are present in a text field.
In MS SQL I can use CONTAINS(t1.NOTES, 'NEAR((Apples, Red),3)'), meaning the words Apples and Red are located near each other.
When I try this in SAS EG 71. I get ERROR: Function CONTAINS could not be located.
I also tried prxmatch("!(Apples|Red)!i", t1.NOTES)> 0, however either word shows up in the results.
So if we have
1 - Apple Granny Green
2 - Apple Spartan Red
3 - Apple Jazz Red
4 - Apple Yellow Delicious
#2 and #3 are valid
I am doing this in Proc SQL.
Thanks
Here's one way to do it:
First data step sets up your test data
Then 2 examples using data step and PROC SQL
data got ;
infile cards ;
input string $30. ;
cards;
Apple Granny Green
Apple Spartan Red
Apple Jazz Red
Apple Yellow Delicious
Oranges and Pears
;
data want ;
set got ;
x=prxmatch('/Red/',string) ;
y=prxmatch('/Apple/',string) ;
if x and y then
put "Red Apple - " string ;
else
if x then
put "Red Only - " string ;
else
if y then
put "Apple Only - " string ;
else
put "None - " string ;
run ;
proc sql ;
select *
from got
where prxmatch('/Red/',string) and prxmatch('/Apple/',string)
;
quit ;
Hi @R_Auger,
The 'e' modifier of the FINDW function in conjunction with absolute differences of word positions could be useful to implement something similar to the proximity term NEAR in MS SQL.
Example:
data have;
input n text $50.;
cards;
1 Apple Granny Green
2 Apple Spartan Red
3 Apple Jazz Red
4 Apple Yellow Delicious
5 apple cool Jazz Juicy Dark Red Fruit
6 Apple cool Juicy Dark red Fruit
7 big red super fresh apple
8 red really red super fresh apple
;
proc sql;
create table want(drop=_:) as
select *, findw(text,'apple', ' ', 'ei') as _p1,
findw(text,'red', ' ', 'ei') as _p2
from have
having _p1 & _p2 & abs(_p1-_p2)<5;
quit;
Result:
n text 2 Apple Spartan Red 3 Apple Jazz Red 6 Apple cool Juicy Dark red Fruit 7 big red super fresh apple
Item 8 shows the limitation of FINDW, though: It finds only the first occurrence of a word so that duplicates could be a problem. Here, _p1=6 and _p2=1 violate the proximity condition.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.