I have a dataset with one variable in a really messy format. I'd like to keep the rows with this variable value containing a list of words in both lower or upper case. There are around 20 words for this list.
My question is like this post:
But my word list is longer and I'd like to ignore the influence of lower or upper case.
How should I write the SAS code? Thank you.
this is my code, but after running it, I get 0 observations.
%let med_list = a b c d ...;
data data_filter;
set data;
if find(RAW_NAME,"&med_list", 'i') > 0;
run;
Additionally, I also would like to change the RAW_Name to the matched word in med_list. How can I do it?
If I understand your question right then below should work.
data have;
input raw_name $80. ;
cards;
John Smith
Liz Wood
Eric Little
;
%let list=wood ERIC ;
%let list2=%sysfunc(translate(&list,|,%str( )));
data want;
set have;
_prxid=prxparse("/\b(&list2)\b/i");
call prxsubstr(_prxid,trim(raw_name),_pos,_len);
if _pos>0 then
do;
raw_name=substr(raw_name,_pos,_len);
output;
end;
drop _:;
run;
proc print data=want;
run;
We can't test this for you until you add some test data to search through. Add a DATALINES statement to your DATA step and add a few rows of data we can test your program with.
You either have to add a LOOP. You could loop over the list of words you want to find. Or you could loop over the list of words in the string.
Or you could try to convert your string into a regular expression by using | as the delimiter in the list.
I tried to use | as the delimiter in the list, but it still does not work. I wonder how to write it in a loop?
Here how to test if any of a list of many words exist in a single (long) character variable.
data data_filter;
set data;
array list [4] $5 _temporary_ ('a' 'b' 'c' 'd');
found=0;
do index=1 to dim(list) while (not found);
found = 0<findw(RAW_NAME,list[index],' ','ir');
end;
run;
This is using spaces as the delimiter between the words in the long string. The I modifier means to ignore case of letters. The R modifier strips the delimiters from the word being searched for.
To make the list of words to find flexible you just need to have the list of words as quoted words. (You can get that from your example macro variable by using the %QLIST() macro). The number of words in the list. (You can get that using the the COUNTW() function.) And a maximum length for each word. (Just pick something that it longer than the longest word you are looking for.)
%let list=a b c d;
%let n=%sysfunc(countw(&list,%str( )));
data data_filter;
set data;
array list [&n] $5 _temporary_ %qlist(&list);
found=0;
do index=1 to dim(list) while (not found);
found = 0<findw(RAW_NAME,list[index],' ','ir');
end;
run;
Or build them from from a dataset. So if they are in a variable named WORD in a dataset named WORD_LIST you can use PROC SQL to make macros variables LIST, MAXLEN and N like this and then use them to define the temporary array.
proc sql noprint;
select
quote(trim(word),"'")
, max(length(word))
into :list separated by ' '
, :maxlen trimmed
from word_list
;
%let n=&sqlobs;
data date_filter;
set data;
array list [&n] $&maxlen (&list) ;
....
Using | works fine for me.
data have;
input raw_name $80. ;
cards;
John Smith
Liz Wood
Eric Little
;
%let list=wood ERIC ;
%let list2=%sysfunc(translate(&list,|,%str( )));
data want;
set have;
found = 0<prxmatch("/\b(&list2)\b/i",raw_name);
run;
Result
Thank you so much! It works. I have one more question. How to replace the original value with the searched value directly if found=1. Let Liz Wood=Wood, and Eric Little= Eric, and delete the rows with found=0.
@L777 wrote:
Thank you so much! It works. I have one more question. How to replace the original value with the searched value directly if found=1. Let Liz Wood=Wood, and Eric Little= Eric, and delete the rows with found=0.
Probably easiest in that case to remember the actual location where the "word" was found rather than converting it to a binary result. You can take the first word from that location using SUBSTR() and SCAN(). (You can probably also do it directly with CALL SCAN() but that is more complicated.
So if FOUND has the location where the first match was made you could do what you want by using:
if found = 0 then delete;
raw_name=scan(substr(RAW_NAME,found),1,' ');
If I understand your question right then below should work.
data have;
input raw_name $80. ;
cards;
John Smith
Liz Wood
Eric Little
;
%let list=wood ERIC ;
%let list2=%sysfunc(translate(&list,|,%str( )));
data want;
set have;
_prxid=prxparse("/\b(&list2)\b/i");
call prxsubstr(_prxid,trim(raw_name),_pos,_len);
if _pos>0 then
do;
raw_name=substr(raw_name,_pos,_len);
output;
end;
drop _:;
run;
proc print data=want;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.