SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

data search engine (based on text)

Accepted Solution Solved
Reply
Super Contributor
Posts: 441
Accepted Solution

data search engine (based on text)

Hi,

 

suppsoe that i ave the following data:

description

A car

1 car 23j @

Car

cars

2 trucks

One truck

One truck and one car

What I would like to have is to create a new data set where in the description either "car" or "truck" is present, so the new data table is:

 

description

A car

1 car 23j @

Car

One truck

One truck and one car

 

Thnak you!

 


Accepted Solutions
Solution
‎11-28-2015 11:39 PM
Trusted Advisor
Posts: 1,137

Re: data search engine (based on text)

Yes there is way to put the list into the function directly rather than passing each word, for this you need to create a macro variable which carries these words. Also it depends on the function which you are going to use.

 

For example, as i mentioned in the my previous reply about the prxmatch which could be used for checking the words, this function requires the words to be checked separated by pipe '|' so when you create the macro variables you need to have the words separated by pipe.

 

please try something like below

 

proc sql;
select distinct words into : word separated by '|' from have;
quit;

data have;
set want;
if prxmatch('m/\b(&word)\b/i',description)>0;
run;

Thanks,
Jag

View solution in original post


All Replies
Respected Advisor
Posts: 4,925

Re: data search engine (based on text)

Use IF FINDW(description, "car", " ", "i") > 0 OR FINDW(description, "truck", " ", "i") > 0;

PG
Trusted Advisor
Posts: 1,137

Re: data search engine (based on text)

Alternatively you could use the perl regular expression like

prxmatch('m/(car|truck)\b/i',description)>0
Thanks,
Jag
Respected Advisor
Posts: 4,925

Re: data search engine (based on text)

Posted in reply to Jagadishkatam

@Jagadishkatam you would need a word boundary before the words too 'm/\b(car|truck)\b/i', otherwise you will match anything like "I was struck by lightning" 

PG
Trusted Advisor
Posts: 1,137

Re: data search engine (based on text)

@PGStats thank you for the correction
Thanks,
Jag
Super Contributor
Posts: 441

Re: data search engine (based on text)

Hi,

 

thank you for replying!!!

 

Just another small extension of the question, if I have a list of words, say 10, is it possible to put the list instead of each word individually?

 

 

Thnka you!

Frequent Contributor
Posts: 108

Re: data search engine (based on text)

/*
Yes, You can in this code (word,"car","truck","ANYTHINHYOUWANT); you specify which word you need to find
find 
dataiwant created a new dataset with the result;

*/

data have dataiwant; 
input word$ 30.;
matchingword = find(word,"car","truck");
if matchingword > 0 then output dataiwant;
drop  matchingword;
datalines; 
A car
1 car 23j @
Car
cars
2 trucks
One truck
One truck and one car
;
run;
Proc Print data = dataiwant;
run;
Super User
Super User
Posts: 7,050

Re: data search engine (based on text)

Posted in reply to pearsoninst

That is not the right syntax for the FIND() function.

Solution
‎11-28-2015 11:39 PM
Trusted Advisor
Posts: 1,137

Re: data search engine (based on text)

Yes there is way to put the list into the function directly rather than passing each word, for this you need to create a macro variable which carries these words. Also it depends on the function which you are going to use.

 

For example, as i mentioned in the my previous reply about the prxmatch which could be used for checking the words, this function requires the words to be checked separated by pipe '|' so when you create the macro variables you need to have the words separated by pipe.

 

please try something like below

 

proc sql;
select distinct words into : word separated by '|' from have;
quit;

data have;
set want;
if prxmatch('m/\b(&word)\b/i',description)>0;
run;

Thanks,
Jag
Super User
Super User
Posts: 7,050

Re: data search engine (based on text)

Posted in reply to Jagadishkatam

Don't you need to use double quotes to have the macro variable reference resolve?

Trusted Advisor
Posts: 1,137

Re: data search engine (based on text)

Thank you @Tom for the correction, It should be prxmatch("m/\b(&word)\b/i",description)>0;
replaced the single quotes with double quotes
Thanks,
Jag
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 681 views
  • 7 likes
  • 5 in conversation