BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
L777
Fluorite | Level 6

I have a dataset with one variable in a really messy format. I'd like to keep the rows with this variable value containing a list of words in both lower or upper case. There are around 20 words for this list.

My question is like this post:

https://communities.sas.com/t5/SAS-Procedures/Search-in-a-dataset-for-units-only-containing-words-in...

But my word list is longer and I'd like to ignore the influence of lower or upper case. 

How should I write the SAS code? Thank you.

this is my code, but after running it, I get 0 observations.

 

%let med_list = a b c d ...;

data data_filter;
set data;

if find(RAW_NAME,"&med_list", 'i') > 0;
run;

 

Additionally, I also would like to change the RAW_Name to the matched word in med_list. How can I do it?

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

If I understand your question right then below should work.

data have;
  input raw_name $80. ;
cards;
John Smith 
Liz Wood 
Eric Little
;

%let list=wood ERIC ;
%let list2=%sysfunc(translate(&list,|,%str( )));

data want;
  set have;
  _prxid=prxparse("/\b(&list2)\b/i");
  call prxsubstr(_prxid,trim(raw_name),_pos,_len);
  if _pos>0 then 
    do;
      raw_name=substr(raw_name,_pos,_len);
      output;
    end;
  drop _:;
run;

proc print data=want;
run;

View solution in original post

8 REPLIES 8
SASKiwi
PROC Star

We can't test this for you until you add some test data to search through. Add a DATALINES statement to your DATA step and add a few rows of data we can test your program with. 

Tom
Super User Tom
Super User

You either have to add a LOOP.  You could loop over the list of words you want to find.  Or you could loop over the list of words in the string.

 

Or you could try to convert your string into a regular expression by using | as the delimiter in the list.

L777
Fluorite | Level 6

I tried to use | as the delimiter in the list, but it still does not work. I wonder how to write it in a loop?

Tom
Super User Tom
Super User

Here how to test if any of a list of many words exist in a single (long) character variable.

data data_filter;
   set data;
   array list [4] $5 _temporary_ ('a' 'b' 'c' 'd');
   found=0;
   do index=1 to dim(list) while (not found);
      found = 0<findw(RAW_NAME,list[index],' ','ir');
   end;
run;

This is using spaces as the delimiter between the words in the long string.  The I modifier means to ignore case of letters.  The R modifier strips the delimiters from the word being searched for.

 

To make the list of words to find flexible you just need to have the list of words as quoted words.  (You can get that from your example macro variable by using the %QLIST() macro).  The number of words in the list. (You can get that using the the COUNTW() function.)  And a maximum length for each word.  (Just pick something that it longer than the longest word you are looking for.)

 

%let list=a b c d;
%let n=%sysfunc(countw(&list,%str( )));
data data_filter;
   set data;
   array list [&n] $5 _temporary_ %qlist(&list);
   found=0;
   do index=1 to dim(list) while (not found);
      found = 0<findw(RAW_NAME,list[index],' ','ir');
   end;
run;

Or build them from from a dataset.  So if they are in a variable named WORD in a dataset named WORD_LIST you can use PROC SQL to make macros variables LIST, MAXLEN and N like this and then use them to define the temporary array.

proc sql noprint;
select
   quote(trim(word),"'") 
 , max(length(word))
into :list separated by ' '
   , :maxlen trimmed     
from word_list
;
%let n=&sqlobs;

data date_filter;
   set data;
   array list [&n] $&maxlen (&list) ;
....
Tom
Super User Tom
Super User

Using | works fine for me.

data have;
  input raw_name $80. ;
cards;
John Smith 
Liz Wood 
Eric Little
;

%let list=wood ERIC ;
%let list2=%sysfunc(translate(&list,|,%str( )));

data want;
  set have;
  found = 0<prxmatch("/\b(&list2)\b/i",raw_name);
run;

Result

Tom_0-1688095118233.png

 

L777
Fluorite | Level 6

Thank you so much! It works. I have one more question. How to replace the original value with the searched value directly if found=1. Let Liz Wood=Wood, and Eric Little= Eric, and delete the rows with found=0. 

Tom
Super User Tom
Super User

@L777 wrote:

Thank you so much! It works. I have one more question. How to replace the original value with the searched value directly if found=1. Let Liz Wood=Wood, and Eric Little= Eric, and delete the rows with found=0. 


Probably easiest in that case to remember the actual location where the "word" was found rather than converting it to a binary result.   You can take the first word from that location using SUBSTR() and SCAN().  (You can probably also do it directly with CALL SCAN() but that is more complicated.

 

So if FOUND has the location where the first match was made you could do what you want by using:

if found = 0 then delete;
raw_name=scan(substr(RAW_NAME,found),1,' ');

 

Patrick
Opal | Level 21

If I understand your question right then below should work.

data have;
  input raw_name $80. ;
cards;
John Smith 
Liz Wood 
Eric Little
;

%let list=wood ERIC ;
%let list2=%sysfunc(translate(&list,|,%str( )));

data want;
  set have;
  _prxid=prxparse("/\b(&list2)\b/i");
  call prxsubstr(_prxid,trim(raw_name),_pos,_len);
  if _pos>0 then 
    do;
      raw_name=substr(raw_name,_pos,_len);
      output;
    end;
  drop _:;
run;

proc print data=want;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 905 views
  • 2 likes
  • 4 in conversation