I am using Enterprise Guide 7.13 and SAS version 9.4. I have two datasets. One has several field including a text field with sentences in it. The other dataset I created made up of key words. I want to search each of the fields in the first dataset to see if they contain one of the key words in the second dataset. If a row in the first dataset contains one of the key words, I want to append that word to the dataset for that row. If there is no match the new column would be empty. The end goal is to count the number of matches for each keyword. I have code that will do the matching and output those matches to a new dataset, but not append the key word as a separate variable.
I have an excel sheet enclosed that illustrates what I am looking for. The code below creates a dataset of the matches.
data _null_;
set dataset2 end=last;
if _n_ = 1 then call execute('data dataset3; set dataset1; ');
call execute(cats('if findw(upcase(information),"',upcase(word),'") > 0 then output;'));
if last then call execute('run;');
RUN;
Based on this code you posted :
data _null_;
set dataset2 end=last;
if _n_ = 1 then call execute('data dataset3; set dataset1; ');
call execute(cats('if findw(upcase(information),"',upcase(word),'") > 0 then output;'));
if last then call execute('run;');
RUN;
You are running a data step like:
data dataset3;
set dataset1;
if findw(upcase(information),"NURSE") > 0 then output;
if findw(upcase(information),"DOCTOR") > 0 then output;
if findw(upcase(information),"POLICE") > 0 then output;
...
run;
Why not just join the two datasets directly?
proc sql;
create table dataset3 as
select a.information,b.word
from dataset1 a
right join dataset2 b
on findw(upcase(a.information),upcase(strip(b.word)))
;
quit;
Or reduce the time and space and just store the counts.
proc sql;
create table dataset3 as
select b.word,count(*) as n_matches
from dataset1 a
right join dataset2 b
on findw(upcase(a.information),upcase(strip(b.word)))
group by 1
;
quit;
Can you post your datasets in a more usable form for us to work with?
What does the output look like if two or more of the words in dataset2 are found in the same sentence?
Are the matches supposed to be case sensitive? i.e. does "Military" match "military"?
What are the rules for a word that appears in a composite use such as "police" in "policeman".
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Based on this code you posted :
data _null_;
set dataset2 end=last;
if _n_ = 1 then call execute('data dataset3; set dataset1; ');
call execute(cats('if findw(upcase(information),"',upcase(word),'") > 0 then output;'));
if last then call execute('run;');
RUN;
You are running a data step like:
data dataset3;
set dataset1;
if findw(upcase(information),"NURSE") > 0 then output;
if findw(upcase(information),"DOCTOR") > 0 then output;
if findw(upcase(information),"POLICE") > 0 then output;
...
run;
Why not just join the two datasets directly?
proc sql;
create table dataset3 as
select a.information,b.word
from dataset1 a
right join dataset2 b
on findw(upcase(a.information),upcase(strip(b.word)))
;
quit;
Or reduce the time and space and just store the counts.
proc sql;
create table dataset3 as
select b.word,count(*) as n_matches
from dataset1 a
right join dataset2 b
on findw(upcase(a.information),upcase(strip(b.word)))
group by 1
;
quit;
That was perfect Tom. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.