Also, I was wrong earlier about the case of the text. Your FINDW command uses the modifier 'i' which then ignores the case of the text when finding matches.
See if this also helps.
proc sql noprint; select quote(strip(name)) into : namelist separated by ',' from sashelp.class ; quit; %let namecount = &sqlobs.; %put Names are &namelist. count is &namecount.; data example; infile datalines truncover; informat var1 var2 $35.; input var1 1-35 /var2 1-35 ; datalines; First var string with no name and a second with no name Example of one name that is John in two lines And more names like Alice, Mary and james ; run; data want; set example; array name{&namecount.} $20 _temporary_ (&namelist.); array v var1 var2; wordcount=0; do i= 1 to dim(name); do j=1 to dim(v); wordcount= wordcount+ (findw(v[j],strip(name[i]),' ','IPQRO')>0); end; end; drop i j; run;
I am still not sure if you need an over all count of the drugs in all the phrases only, in each phrase, or some combination. This only does an over all count and only uses two variables to search as I am not going to spend a lot time making dummy data when I really don't know how well it might replicate your situation.
I does use a data set value to populate the list.
In your particular case you need to ensure that the length of the variable in the array name statement, the 20, is at least as long as the longest value in your word list.
The &sqlobs macro variable is an automatic count of the result from the last Proc SQL. It is also very likely to be resent so capture it into a known variable. The value is needed because you must specify the number of elements in the _temporary_ array. The SQL places quotes and commas so the single macro variable resolves properly for the _temporary_ array values.
@JoanneR wrote:
Due to the sensitivity of this data I cannot provide further info. Instead, I will provide an example to better explain my question. Lets say I have an Excel spreadsheet with a list of fake flowers
Flowers:
1. Orchids
2. Lilies
3. Roses
4. 1,4,5-Daisy
5. Dahlia
6. 3,4-Lilac(2-Lilies)Daisy
7. Common lilac
8. False goats beard
Say these flowers were placed into &alldrugs. Now in my data I have an observation with the following report under var1:
"The house was surrounded with various objects. Some objects being boxes, shoes, and roses."
I want sas to read through that statement and see if the keywords from &alldrugs are mentioned in that statement. If not then to say "No".
My syntax was originally this, but it takes too long to run and I would like it to be more efficient by using the syntax I provided in the original question.
drug=1; i=1; do while ((scan("&alldrugs",i,"|") ne '') and drug>0); i+1; /*Number of the drug found on the keyword list -1 */ drug_word=scan("&alldrugs",i-1,"|"); indx_drug1=findw(lowcase(var1),compress(lowcase(drug_word))); indx_drug2=findw(lowcase(var2),compress(lowcase(drug_word))); indx_drug3=findw(lowcase(var3),compress(lowcase(drug_word))); indx_drug4=findw(lowcase(var4),compress(lowcase(drug_word))); indx_drug5=findw(lowcase(var5),compress(lowcase(drugd_word))); indx_drug6=findw(lowcase(var6),compress(lowcase(drug_word))); indx_drug7=findw(lowcase(var7),compress(lowcase(drug_word))); if indx_drug1>0 or indx_drug2>0 or .... indx_drug7>0 then drug=1; else drug=drug+1;
Do you actually need all of the different index variables for other work or do you just want the any are found? It isn't quite clear whether you need to count the number of drugs that appear in all of the phrases, one phrase or what. Which is why we ask for a concrete example. The values do not have to match your data as long as the behavior of the need can be demonstrated.
This is a first step to a long process. As you said, my intent is to get the cases where at least one word from the keylist appeared under the variables for each observation.
The data is quite complex which is why it has been difficult to provide a concise example. I am looking at a total of 12 variables that each contain string variables (in the example they are vars1-7). The process in my original syntax was to find the words from the keylist and see if they are present in the responses for each observation. The word needs to match exactly as written in the keylist. Not every case will have a the keywords, and therefore will not be cases.
@JoanneR wrote:
This is a first step to a long process. As you said, my intent is to get the cases where at least one word from the keylist appeared under the variables for each observation.
The data is quite complex which is why it has been difficult to provide a concise example. I am looking at a total of 12 variables that each contain string variables (in the example they are vars1-7). The process in my original syntax was to find the words from the keylist and see if they are present in the responses for each observation. The word needs to match exactly as written in the keylist. Not every case will have a the keywords, and therefore will not be cases.
If you need to know which variable found contained a key word then you could add and array to my example that has a Found value for each of the VAR variables. Instead of the single accumulator you would need to set the accumulator between the I and J loops to reset and then save after the end of the J loop if you need a NUMBER of matches, You can also use the I value when the count increments to keep identifiers of the found key words as that seems like it may be of interest.
Really providing some sort of example input data and the desired result for that is the way to ask questions. We are sort of wandering around with not knowing exactly the requirement such as :
Any of the key words found in any of the variables
How many of the key words found in any of the variables
Which of the key words found in any of the variables
Any of the key words found in each of the variables
How many of the key words found in each of the variables
Which of the key words found in each of the variables
This example creates a count per Var variable of the number of words matched and creates a delimited list of the words found from the key word list. If your values might contain commas then use a different delimiter in the CATX function call that you do not have in our key words.
proc sql noprint; select quote(strip(name)) into : namelist separated by ',' from sashelp.class ; quit; %let namecount = &sqlobs.; %put Names are &namelist. count is &namecount.; data example; infile datalines truncover; informat var1 var2 $35.; input var1 1-35 /var2 1-35 ; datalines; First var string with no name and a second with no name Example of one name that is John in two lines And more names like Alice, Mary and james ; run; data want; set example; array name{&namecount.} $20 _temporary_ (&namelist.); array v var1 var2; /* the 2 below matches the number of VAR variables*/ /* fcount will have the number of found words per var*/ array fcount (2); /* this will hold a list of the found words*/ array fwords (2) $ 200; do i= 1 to dim(name); do j=1 to dim(v); t= (findw(v[j],strip(name[i]),' ','IPQRO')>0); fcount[j]= sum(fcount[j], t); if t then fwords[j]= catx(',',fwords[j],strip(name[i])); end; end; drop i j t; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.