BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PaigeMiller
Diamond | Level 26

Also, I was wrong earlier about the case of the text. Your FINDW command uses the modifier 'i' which then ignores the case of the text when finding matches.

--
Paige Miller
ballardw
Super User

See if this also helps.

 

proc sql noprint;
   select quote(strip(name)) into : namelist separated by ','
   from sashelp.class
   ;
quit;
%let namecount = &sqlobs.;

%put Names are &namelist. count is &namecount.;

data example;
   infile datalines truncover;
   informat var1 var2 $35.;
   input var1 1-35
         /var2 1-35
   ;
datalines;
First var string with no name
and a second with no name
Example of one name 
that is John in two lines
And more names like Alice, Mary
and james
;
run;



data want;
   set example;
   array name{&namecount.} $20 _temporary_ (&namelist.);
   array v var1 var2;
   wordcount=0;
   do i= 1 to dim(name);
      do j=1 to dim(v);
         wordcount= wordcount+ (findw(v[j],strip(name[i]),' ','IPQRO')>0);
      end;
   end;
   drop i j;
run;

I am still not sure if you need an over all count of the drugs in all the phrases only, in each phrase, or some combination. This only does an over all count and only uses two variables to search as I am not going to spend a lot time making dummy data when I really don't know how well it might replicate your situation.

I does use a data set value to populate the list.

In your particular case you need to ensure that the length of the variable in the array name statement, the 20, is at least as long as the longest value in your word list.

 

The &sqlobs macro variable is an automatic count of the result from the last Proc SQL. It is also very likely to be resent so capture it into a known variable. The value is needed because you must specify the number of elements in the _temporary_ array. The SQL places quotes and commas so the single macro variable resolves properly for the _temporary_ array values.

ballardw
Super User

@JoanneR wrote:

Due to the sensitivity of this data I cannot provide further info. Instead, I will provide an example to better explain my question. Lets say I have an Excel spreadsheet with a list of fake flowers

 

Flowers:

1. Orchids

2. Lilies

3. Roses

4. 1,4,5-Daisy

5. Dahlia

6. 3,4-Lilac(2-Lilies)Daisy

7. Common lilac

8. False goats beard

 

Say these flowers were placed into &alldrugs. Now in my data I have an observation with the following report under var1:

"The house was surrounded with various objects. Some objects being boxes, shoes, and roses."

 

I want sas to read through that statement and see if the keywords from &alldrugs are mentioned in that statement. If not then to say "No". 

 

My syntax was originally this, but it takes too long to run and I would like it to be more efficient by using the syntax I provided in the original question.

	drug=1;
	i=1;
	do while ((scan("&alldrugs",i,"|") ne '') and drug>0);		
	  i+1;																/*Number of the drug found on the keyword list -1 */
	  drug_word=scan("&alldrugs",i-1,"|");							
	  indx_drug1=findw(lowcase(var1),compress(lowcase(drug_word)));
	  indx_drug2=findw(lowcase(var2),compress(lowcase(drug_word)));
	  indx_drug3=findw(lowcase(var3),compress(lowcase(drug_word)));
	  indx_drug4=findw(lowcase(var4),compress(lowcase(drug_word)));
	  indx_drug5=findw(lowcase(var5),compress(lowcase(drugd_word)));
	  indx_drug6=findw(lowcase(var6),compress(lowcase(drug_word)));
	  indx_drug7=findw(lowcase(var7),compress(lowcase(drug_word)));

if indx_drug1>0 or indx_drug2>0 or .... indx_drug7>0 then drug=1;
else drug=drug+1;

Do you actually need all of the different index variables for other work or do you just want the any are found? It isn't quite clear whether you need to count the number of drugs that appear in all of the phrases, one phrase or what. Which is why we ask for a concrete example. The values do not have to match your data as long as the behavior of the need can be demonstrated.

 

 

JoanneR
Calcite | Level 5

This is a first step to a long process. As you said, my intent is to get the cases where at least one word from the keylist appeared under the variables for each observation. 

 

The data is quite complex which is why it has been difficult to provide a concise example. I am looking at a total of 12 variables that each contain string variables (in the example they are vars1-7). The process in my original syntax was to find the words from the keylist and see if they are present in the responses for each observation. The word needs to match exactly as written in the keylist. Not every case will have a the keywords, and therefore will not be cases.

ballardw
Super User

@JoanneR wrote:

This is a first step to a long process. As you said, my intent is to get the cases where at least one word from the keylist appeared under the variables for each observation. 

 

The data is quite complex which is why it has been difficult to provide a concise example. I am looking at a total of 12 variables that each contain string variables (in the example they are vars1-7). The process in my original syntax was to find the words from the keylist and see if they are present in the responses for each observation. The word needs to match exactly as written in the keylist. Not every case will have a the keywords, and therefore will not be cases.


If you need to know which variable found contained a key word then you could add and array to my example that has a Found value for each of the VAR variables. Instead of the single accumulator you would need to set the accumulator between the I and J loops to reset and then save after the end of the J loop if you need a NUMBER of matches, You can also use the I value when the count increments to keep identifiers of the found key words as that seems like it may be of interest.

 

Really providing some sort of example input data and the desired result for that is the way to ask questions. We are sort of wandering around with not knowing exactly the requirement such as :

Any of the key words found in any of the variables

How many of the key words found in any of the variables

Which of the key words found in any of the variables

Any of the key  words found in each of the variables

How many of the key  words found in each of the variables

Which of the key  words found in each of the variables

 

This example creates a count per Var variable of the number of words matched and creates a delimited list of the words found from the key word list. If your values might contain commas then use a different delimiter in the CATX function call that you do not have in our key words.

 

proc sql noprint;
   select quote(strip(name)) into : namelist separated by ','
   from sashelp.class
   ;
quit;
%let namecount = &sqlobs.;

%put Names are &namelist. count is &namecount.;

data example;
   infile datalines truncover;
   informat var1 var2 $35.;
   input var1 1-35
         /var2 1-35
   ;
datalines;
First var string with no name
and a second with no name
Example of one name 
that is John in two lines
And more names like Alice, Mary
and james
;
run;



data want;
   set example;
   array name{&namecount.} $20 _temporary_ (&namelist.);
   array v var1 var2;
   /* the 2 below matches the number of VAR variables*/
   /* fcount will have the number of found words per var*/
   array fcount (2);
   /* this will hold a list of the found words*/
   array fwords (2) $ 200;
   
   do i= 1 to dim(name);
      do j=1 to dim(v);
         t= (findw(v[j],strip(name[i]),' ','IPQRO')>0);
         fcount[j]= sum(fcount[j], t);
         if t then fwords[j]= catx(',',fwords[j],strip(name[i]));
      end;
   end;
   drop i j t;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 3867 views
  • 2 likes
  • 5 in conversation