BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pangea17
Quartz | Level 8

I am using Enterprise Guide 7.13 and SAS version 9.4.  I have two datasets. One has several field including a text field with sentences in it.  The other dataset I created made up of key words.  I want to search each of the fields in the first dataset to see if they contain one of the key words in the second dataset. If a row in the first dataset contains one of the key words, I want to append that word to the dataset for that row.  If there is no match the new column would be empty.  The end goal is to count the number of matches for each keyword.  I have code that will do the matching and output those matches to a new dataset, but not append the key word as a separate variable.

 

I have an excel sheet enclosed that illustrates what I am looking for.  The code below creates a dataset of the matches.

 

data _null_;

set dataset2 end=last;

if _n_ = 1 then call execute('data dataset3; set dataset1; ');

call execute(cats('if findw(upcase(information),"',upcase(word),'") > 0 then output;'));

if last then call execute('run;');

RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Based on this code you posted :

data _null_;
  set dataset2 end=last;
  if _n_ = 1 then call execute('data dataset3; set dataset1; ');
  call execute(cats('if findw(upcase(information),"',upcase(word),'") > 0 then output;'));
  if last then call execute('run;');
RUN;

You are running a data step like:

data dataset3;
   set dataset1; 
   if findw(upcase(information),"NURSE") > 0 then output;
   if findw(upcase(information),"DOCTOR") > 0 then output;
   if findw(upcase(information),"POLICE") > 0 then output;
   ...
run;

Why not just join the two datasets directly?

proc sql;
create table dataset3 as
  select a.information,b.word
  from dataset1 a
  right join dataset2 b
  on findw(upcase(a.information),upcase(strip(b.word)))
;
quit;

Or reduce the time and space and just store the counts.

proc sql;
create table dataset3 as
  select b.word,count(*) as n_matches
  from dataset1 a
  right join dataset2 b
  on findw(upcase(a.information),upcase(strip(b.word)))
  group by 1
;
quit;

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Can you post your datasets in a more usable form for us to work with?

ballardw
Super User

What does the output look like if two or more of the words in dataset2 are found in the same sentence?

Are the matches supposed to be case sensitive? i.e. does "Military" match "military"?

What are the rules for a word that appears in a composite use such as "police" in "policeman".

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

pangea17
Quartz | Level 8
I believe SAS is case sensitive, so it is best to upcase everything so that you get a match. Just looking for an exact match. In your example policeman would not be a match for police. Trying to keep things simple for the time being. I will settle for the first work matching to be the only result.


Tom
Super User Tom
Super User

Based on this code you posted :

data _null_;
  set dataset2 end=last;
  if _n_ = 1 then call execute('data dataset3; set dataset1; ');
  call execute(cats('if findw(upcase(information),"',upcase(word),'") > 0 then output;'));
  if last then call execute('run;');
RUN;

You are running a data step like:

data dataset3;
   set dataset1; 
   if findw(upcase(information),"NURSE") > 0 then output;
   if findw(upcase(information),"DOCTOR") > 0 then output;
   if findw(upcase(information),"POLICE") > 0 then output;
   ...
run;

Why not just join the two datasets directly?

proc sql;
create table dataset3 as
  select a.information,b.word
  from dataset1 a
  right join dataset2 b
  on findw(upcase(a.information),upcase(strip(b.word)))
;
quit;

Or reduce the time and space and just store the counts.

proc sql;
create table dataset3 as
  select b.word,count(*) as n_matches
  from dataset1 a
  right join dataset2 b
  on findw(upcase(a.information),upcase(strip(b.word)))
  group by 1
;
quit;
pangea17
Quartz | Level 8

That was perfect Tom. Thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 5262 views
  • 1 like
  • 4 in conversation