SAS Enterprise Guide

OscarBoots1 · Posted 03-05-2017 10:21 PM

Hi Forum,

I have a list of several hundred Keywords & short phrases 2 or 3 words together.

I also have a long list of unstructured text where I want to know if the Keywords & phrases appear exactly as I've got in the list.

If they do appear, I want to export the ID field & then export the Keywords/Phrases out delimited by a comma on the same row as the ID.

I want to export to a new table in the existing Project.

I've had a look at some functions but maybe a Data Step Loop would be the most effective method?

Can anyone assist with the Data Step code I would need to do this or suggest a better solution?

I'm using SAS EG Vers 7.13 64 bit.

Thanks

Shmuel · Posted 03-05-2017 10:45 PM

Look for hash iterate method within a datastep, using functions like find or index.

It would be more helpful if you post some test examples of your key words and of your text dataset.

OscarBoots1 · Posted 03-05-2017 11:33 PM

Hi Shmuel,

I've posted a quick Create table & Insert statement to help.

Note, I've added some punctuation to show that the data is a bit messy but I'm just after the words/phrases regardless of punctuation or case.

I want to find & extract the words/phrases in the 'TEXT TO FIND' table from the UNSTRUCTURED TEXT' Table.

I was thinking of a Loop that runs the first word/phrase from the 'TEXT TO FIND' table through all the records in the UNSTRUCTURED TEXT' Table extract the matches & do the same with the next word/phrase.

Cheers

proc sql;
   create table RDW.UNSTRUCTURED_TEXT
       (Check_String char(3000));

insert into RDW.UNSTRUCTURED_TEXT
    values('The quick brown dog, jumped over-the lazy dog')
    values('The QUICK GREEN dog jumped over, the lazy dog')
    values('The quick snake slithered over  the lazy dog')
	values('The dog quickly Jumped over the lazy Fox')
	values('The Quick; brown_dog! jumped over the Lazy dog');

title 'RDW.UNSTRUCTURED_TEXT';

select *
   from RDW.UNSTRUCTURED_TEXT;

proc sql;
   create table RDW.TEXT_TO_FIND
       (Check_String char(300));

insert into RDW.TEXT_TO_FIND
    values('QUICK GREEN dog')
    values('quickly Jumped over')
    values('slithered over  the lazy dog')
	values('The Quick;')
	values('snake');

title 'RDW.TEXT_TO_FIND';

select *
   from RDW.TEXT_TO_FIND;

proc printto; run;

TomKari · Posted 03-06-2017 05:00 AM

This may give you some ideas:

http://support.sas.com/resources/papers/proceedings14/1717-2014.pdf

Tom

OscarBoots1 · Posted 03-06-2017 05:02 PM

Thanks TomKari,

It's good but a little beyond my data Step ken at the moment.

Could I get a quick example using my posted data?

I need to crawl before I can walk. : )

ballardw · Posted 03-06-2017 07:04 PM

Your data sets did not have a ID variables so I don't know which one was supposed to. I put one into one of the sets, this approach would then just select the correct id. I'm not exactly sure what your desired output is. This may be a start.

proc sql;
   create table work.UNSTRUCTURED_TEXT
       (id int, Check_String char(3000));

   insert into work.UNSTRUCTURED_TEXT
    values(1,'The quick brown dog, jumped over-the lazy dog')
    values(2,'The QUICK GREEN dog jumped over, the lazy dog')
    values(3,'The quick snake slithered over  the lazy dog')
	 values(4,'The dog quickly Jumped over the lazy Fox')
	 values(5,'The Quick; brown_dog! jumped over the Lazy dog');
  select id,check_string 
  from work.UNSTRUCTURED_TEXT;
quit;

proc sql;
   create table work.TEXT_TO_FIND
       (Check_String char(300));

insert into work.TEXT_TO_FIND
    values('QUICK GREEN dog')
    values('quickly Jumped over')
    values('slithered over  the lazy dog')
	values('The Quick;')
	values('snake');

quit;

proc sql;
   create table want as
   select a.id,a.check_string, b.check_string as Found_text
   from work.UNSTRUCTURED_TEXT as a,work.TEXT_TO_FIND as b
   where index(a.check_string,strip(b.check_string))>0
   order by a.id;
quit;

I will make zero claim for effeciency other than in code maitenance or readability.

Note: Putting YOUR libraries into code can be very problematic...

OscarBoots1 · Posted 03-11-2017 07:25 PM

Thanks ballardw,

Very useful!

TomKari · Posted 03-07-2017 04:50 PM

I think this is pretty close:

data PhraseList;

input SearchPhrase $50.;

cards4;

QUICK GREEN dog

quickly Jumped over

slithered over the lazy dog

The Quick;

snake

;;;;

run;

data Have;

input Comment $50.;

SeqNo = _n_;

cards4;

The quick brown dog, jumped over-the lazy dog

The QUICK GREEN dog jumped over, the lazy dog

The quick snake slithered over the lazy dog

The dog quickly Jumped over the lazy Fox

The Quick; brown_dog! jumped over the Lazy dog

;;;;

run;

data Inter1(keep=SeqNo SearchPhrase);

if _n_ = 1 then do;

if 0 then

set PhraseList;

declare hash P(dataset:'PhraseList', multidata:'y');

P.definekey('SearchPhrase');

P.definedone();

end;

declare hiter iter("P");

set Have;

rc = iter.first();

do while (rc=0);

if find(Comment, SearchPhrase, 't') then output;

rc = iter.next();

end;

run;

data Want(keep=SeqNo SearchResults);

length SearchResults $10000;

retain SearchResults;

set Inter1;

by SeqNo;

if first.SeqNo

then SearchResults = SearchPhrase;

else SearchResults = catx(",",SearchResults, SearchPhrase);

if last.SeqNo then output;

run;

proc datasets lib=work nolist;

delete PhraseList;

delete Have;

delete Inter1;

quit;

OscarBoots1 · Posted 03-11-2017 07:34 PM

Thanks TomKari,

That's great

Much appreciated!

SAS Enterprise Guide

Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Re: Find Keywords or Phrases inside a block of unstructured text

Effective Markup of Unstructured Text - Part 1

Reading unstructured text file into sas

Extracting line from free text that contains a keyword

PROC REPORT compute block: How to map a variable value to the line tex...

Macro to extract keyword text

Follow Us

What is...

SAS Enterprise Guide

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...