Hi everyone,
I'm trying to use FINDW to find a list of words in string variables. Since I have a long list of words that were originally taken from Excel, I put them all into a macro variable. So far I have the syntax below, but each time I run it, I get an error message.
I have a feeling it has to do with the fact some of the words in my list have spaces, parentheses and dashes, but I'm not sure how to account for that in my code.
Any help is greatly appreciated!
data test;
set data;
if findw(lowcase(var1),&alldrugs, '|','I S') >0 or
findw(lowcase(var2),&alldrugs, '|','I S') >0 or
findw(lowcase(var3),&alldrugs, '|','I S') >0 or
findw(lowcase(var4),&alldrugs, '|','I S') >0 or
findw(lowcase(var5),&alldrugs, '|','I S') >0
then drug='Yes' ;
else drug='No';
run;
or this?
%let alldrugs=orchids|lillies|roses|daisy;
%let ndrugs=%sysfunc(countw(&alldrugs,|));
data customer;
input var1 $100.;
cards;
The house was surrounded with various objects. Some objects being boxes, shoes, and roses.
The house was surrounded with various objects. Some objects being boxes, lillies, and frogs.
The house was surrounded with various objects. Some objects being boxes, oranges and frogs.
;
data a;
array index_drug $ index_drug1-index_drug4;
array drug_word $ drug_word1-drug_word&ndrugs;
set customer;
cust_no=_n_;
do i=1 to &ndrugs;
drug_word(i)=scan("&alldrugs",i,"|");
index_drug(i)=findw(lowcase(var1),compress(lowcase(drug_word(i))));
end;
if sum(of index_drug:)>0 then index=1; else index=0;
drop var1 i drug_word:;
run;
data test;
set data;
if findw(lowcase(var1),&alldrugs, '|','I S') >0 or
findw(lowcase(var2),&alldrugs, '|','I S') >0 or
findw(lowcase(var3),&alldrugs, '|','I S') >0 or
findw(lowcase(var4),&alldrugs, '|','I S') >0 or
findw(lowcase(var5),&alldrugs, '|','I S') >0 /*or this is the error*/
then drug='Yes' ;
else drug='No';
run;
or then drug='Yes' ; /*is invalid*/
Sorry, that's a mistake in the example. In the original code there isn't an or at the end. I've corrected the question.
Can you show us a portion of &alldrugs?
One correction is that &alldrugs must be in double quotes
if findw(lowcase(var1),"&alldrugs", '|','I S') >0 or
etc.
If that's not it, then please paste the entire SAS log, including the code and the error message (not just the error message) as text and not as a screen capture, by clicking on the {i} icon and pasting the text into the window that appears. Do not paste the text into your message any other way.
Good point! I forgot to add those in. It removes the error message, but I still end up with no observations for drug='Yes' when there should be.
In terms of the &alldrugs variable, it was created with the following syntax:
proc sql;
select drug_name into: alldrugs separated by ' '
from druglist;
quit;
%put &alldrugs;
and gave the drugs output into a table. Below are 4 of the drugs listed from the top.
1-(3-CHLOROPHENYL)PIPERAZINE
1-(5-METHOXY-1H-INDOL-3-YL)-2-PROPANAMINE
10-HYDROXYCARBAZEPINE
10-OH CARBAZEPINE
I also should mention, some of the drugs have comma's so they are written as 1,3-dimethoxy. Which is why I have the data as pipe delimited.
Thanks for the help!
Are expecting one value of VAR1 to have ALL of those drug names IN THAT EXACT ORDER?
Or did you want to test if the value of VAR1 is in that list of drug names?
Did you mean:
findw("&alldrug",var1,....)
@JoanneR wrote:
Good point! I forgot to add those in. It removes the error message, but I still end up with no observations for drug='Yes' when there should be.
In terms of the &alldrugs variable, it was created with the following syntax:
proc sql; select drug_name into: alldrugs separated by ' ' from druglist; quit; %put &alldrugs;
and gave the drugs output into a table. Below are 4 of the drugs listed from the top.
1-(3-CHLOROPHENYL)PIPERAZINE
1-(5-METHOXY-1H-INDOL-3-YL)-2-PROPANAMINE
10-HYDROXYCARBAZEPINE
10-OH CARBAZEPINE
Thanks for the help!
@JoanneR wrote:
and gave the drugs output into a table. Below are 4 of the drugs listed from the top.
1-(3-CHLOROPHENYL)PIPERAZINE
1-(5-METHOXY-1H-INDOL-3-YL)-2-PROPANAMINE
10-HYDROXYCARBAZEPINE
10-OH CARBAZEPINE
These drugs are upper case. Your original code tested if lowcase(var1) etc. contains these drugs, and you have a case mismatch, you will never get "Yes" until you fix that.
I tried changing that to make both lower case using the two following forms, but both don't seem to work.
data test;
set data;
if findw(lowcase(var1),lowcase("&alldrugs"),' ', 'I S') >0
then drug= 'Yes';
else drug='No';
run;
data test;
set data;
if findw(lowcase(var1),compress(lowcase("&alldrugs")),' ', 'I S') >0
then drug= 'Yes';
else drug='No';
run;
Thank you!
As @ballardw said, we need to see your data now — a portion of it where you expect to get a match with one of the drugs in &alldrugs.
@JoanneR wrote:
Hi everyone,
I'm trying to use FINDW to find a list of words in string variables. Since I have a long list of words that were originally taken from Excel, I put them all into a macro variable. So far I have the syntax below, but each time I run it, I get an error message.
I have a feeling it has to do with the fact some of the words in my list have spaces, parentheses and dashes, but I'm not sure how to account for that in my code.
Any help is greatly appreciated!
data test; set data; if findw(lowcase(var1),&alldrugs, '|','I S') >0 or findw(lowcase(var2),&alldrugs, '|','I S') >0 or findw(lowcase(var3),&alldrugs, '|','I S') >0 or findw(lowcase(var4),&alldrugs, '|','I S') >0 or findw(lowcase(var5),&alldrugs, '|','I S') >0 then drug='Yes' ; else drug='No'; run;
And your example data containing VAR1, Var2, Var3 etc is where?
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
And you should provide the actual value of that macro variable.
I suspect the basic problem is order of the variables. The first argument of FINDW is the value to search in and the second parameter is what to search for. Does that sound like what you are doing?
See this small example:
data example; longstring= "this is a string with several words"; x ="string with"; y= findw(longstring,x); z= findw(x,longstr); run;
Your FINDW code I think looks more like the Z version above.
Due to the sensitivity of this data I cannot provide further info. Instead, I will provide an example to better explain my question. Lets say I have an Excel spreadsheet with a list of fake flowers
Flowers:
1. Orchids
2. Lilies
3. Roses
4. 1,4,5-Daisy
5. Dahlia
6. 3,4-Lilac(2-Lilies)Daisy
7. Common lilac
8. False goats beard
Say these flowers were placed into &alldrugs. Now in my data I have an observation with the following report under var1:
"The house was surrounded with various objects. Some objects being boxes, shoes, and roses."
I want sas to read through that statement and see if the keywords from &alldrugs are mentioned in that statement. If not then to say "No".
My syntax was originally this, but it takes too long to run and I would like it to be more efficient by using the syntax I provided in the original question.
drug=1;
i=1;
do while ((scan("&alldrugs",i,"|") ne '') and drug>0);
i+1; /*Number of the drug found on the keyword list -1 */
drug_word=scan("&alldrugs",i-1,"|");
indx_drug1=findw(lowcase(var1),compress(lowcase(drug_word)));
indx_drug2=findw(lowcase(var2),compress(lowcase(drug_word)));
indx_drug3=findw(lowcase(var3),compress(lowcase(drug_word)));
indx_drug4=findw(lowcase(var4),compress(lowcase(drug_word)));
indx_drug5=findw(lowcase(var5),compress(lowcase(drugd_word)));
indx_drug6=findw(lowcase(var6),compress(lowcase(drug_word)));
indx_drug7=findw(lowcase(var7),compress(lowcase(drug_word)));
if indx_drug1>0 or indx_drug2>0 or .... indx_drug7>0 then drug=1;
else drug=drug+1;
It's fine if you give us a made up example, because real data is sensitive.
This works for me.
%let alldrugs=orchids|lillies|roses|daisy;
data a;
var1="The house was surrounded with various objects. Some objects being boxes, shoes, and roses.";
i=1;
do while (scan("&alldrugs",i,"|") ne '');
i+1; /*Number of the drug found on the keyword list -1 */
drug_word=scan("&alldrugs",i-1,"|");
indx_drug1=findw(lowcase(var1),compress(lowcase(drug_word)));
output;
end;
run;
Do you know if there's a way to run that without having to write everything out? The keylist has about 200 words, and the variables have around 50k observations with a different response on each.
Thank you! I really appreciate all the input that's been provided.
Is this what you mean?
data a;
array index_drug $ index_drug1-index_drug4;
array drug_word $ drug_word1-drug_word4;
var1="The house was surrounded with various objects. Some objects being boxes, shoes, and roses.";
do i=1 to countw("&alldrugs",'|');
drug_word(i)=scan("&alldrugs",i,"|");
index_drug(i)=findw(lowcase(var1),compress(lowcase(drug_word(i))));
end;
if sum(of index_drug:)>0 then index=1; else index=0;
drop var1 i;
run;
You can drop additional variables from the data set if desired.
or this?
%let alldrugs=orchids|lillies|roses|daisy;
%let ndrugs=%sysfunc(countw(&alldrugs,|));
data customer;
input var1 $100.;
cards;
The house was surrounded with various objects. Some objects being boxes, shoes, and roses.
The house was surrounded with various objects. Some objects being boxes, lillies, and frogs.
The house was surrounded with various objects. Some objects being boxes, oranges and frogs.
;
data a;
array index_drug $ index_drug1-index_drug4;
array drug_word $ drug_word1-drug_word&ndrugs;
set customer;
cust_no=_n_;
do i=1 to &ndrugs;
drug_word(i)=scan("&alldrugs",i,"|");
index_drug(i)=findw(lowcase(var1),compress(lowcase(drug_word(i))));
end;
if sum(of index_drug:)>0 then index=1; else index=0;
drop var1 i drug_word:;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.