BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vomer
Obsidian | Level 7

Hi everyone,

I am wondering how I can do the following:

I have a data set with about 6000 rows of information - each field contains a string (something like this -> This is a sample text string)

I was wondering what I can do to "search" all the rows of information for specific key words and then only keep those in my new smaller data set.

Basically I want to say: if the words "this" "now" "then" appear in the column then keep that row or identify it somehow.

Any sample codes would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Editor's note: Thanks for all the contributions, some great ideas.  This is a good case for PRX functions and this answer illustrates it's ability.

 

PRX to the rescue!

 

data have;
length string $100;
input string &;
datalines;
This is a sample text string
That is another string
And then a last string
;

 

data want;

/* Search string for :

          a word boundary,

          then the first string or the second string or ... or the last string,

          then a word boundary

  make the search case insensitive

  compile the pattern only once */

set have(where=(prxmatch("/\b(this|now|then)\b/io", string)));

run;

 

proc print noobs; run;

 

PG

PG

View solution in original post

8 REPLIES 8
Reeza
Super User

Try the find() and index() functions.

There's also the PRX functions if you're interested in learning them.

vomer
Obsidian | Level 7

I tried scan and index - both give me this error:

function call has too many arguments

I have a bunch of keywords I am searching for in my code. Like this:

got_value=SCAN(var_1,"key1","key2" etc.);

Reeza
Super User

I think only PRX functions will allow you to have multiple keywords in a single function.

Scan is used to substr or subset a sentence though, not search through it.

if find(var_1, "word1")>0 or

find(var_1, "word2")>0 then got_value=1;

Is one way, but probably not optimal.

vomer
Obsidian | Level 7

I looked up prx, but cannot get it to work. This sample I modified seems to be what I need, but it is not working:

data _null_;

   if _N_ = 1 then

   do;

      retain patternID;

         /* The i option specifies a case insensitive search. */

      pattern = "/Mount|TGH|Joe|Michael|Sunny|Toro|Bay|Brid|Provi/i";

      patternID = prxparse(pattern);

   end;

input service_provider $90.;

call prxsubstr(patternID, service_provider, position);

   if position ^= 0 then

   do;

      match = substr(service_provider, position);

      put match:$QUOTE. "found in " service_provider:$QUOTE.;

   end;

NishunkSaxena
Calcite | Level 5

We can try index or find but in huge datasets try not using If condition instead use where for faster application as in the following code;-

proc sql;

create table col_data as

select memname from

sashelp.vtable

;quit;

data final;

set Col_data;

where index(upcase(memname),upcase("attribute"))>0; run;

Thanks

ballardw
Super User

You may want the FINDW, find word, function. And you can generally search for only one value at a time.

Also unless you are 100 percent positive that the words are always in the same case you may want to either UPCASE or LOWCASE the searched string and make sure the value searched for is in the same case.

findword1 = (findw(upcase(string), 'WORD')>0);

scan PULLS a word by position in a string.

Tom
Super User Tom
Super User

Try INDEXW function.

So if the list is short you can just type them all out.

if indexw(upcase(string),'WORD1')

or indexw(upcase(string),'WORD2')

...

;

PGStats
Opal | Level 21

Editor's note: Thanks for all the contributions, some great ideas.  This is a good case for PRX functions and this answer illustrates it's ability.

 

PRX to the rescue!

 

data have;
length string $100;
input string &;
datalines;
This is a sample text string
That is another string
And then a last string
;

 

data want;

/* Search string for :

          a word boundary,

          then the first string or the second string or ... or the last string,

          then a word boundary

  make the search case insensitive

  compile the pattern only once */

set have(where=(prxmatch("/\b(this|now|then)\b/io", string)));

run;

 

proc print noobs; run;

 

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 31099 views
  • 8 likes
  • 6 in conversation