BookmarkSubscribeRSS Feed
OscarBoots1
Quartz | Level 8

Hi Forum,

 

I have a list of several hundred Keywords & short phrases 2 or 3 words together.

 

I also have a long list of unstructured text where I want to know if the Keywords & phrases appear exactly as I've got in the list.

 

If they do appear, I want to export the ID field & then export the Keywords/Phrases out delimited by a comma on the same row as the ID.

 

I want to export to a new table in the existing Project.

 

I've had a look at some functions but maybe a Data Step Loop would be the most effective method?

 

Can anyone assist with the Data Step code I would need to do this or suggest a better solution?

 

I'm using SAS EG Vers 7.13 64 bit.

 

Thanks 

8 REPLIES 8
Shmuel
Garnet | Level 18

Look for hash iterate method within a datastep, using functions like find or index.

 

It would be more helpful if you post some test examples of your key words and of your text dataset.

OscarBoots1
Quartz | Level 8

Hi Shmuel,

 

I've posted a quick Create table & Insert statement to help.

 

Note, I've added some punctuation to show that the data is a bit messy but I'm just after the words/phrases regardless of punctuation or case.

 

I want to find & extract the words/phrases in the 'TEXT TO FIND' table from the UNSTRUCTURED TEXT' Table.

 

I was thinking of a Loop that runs the first word/phrase from the 'TEXT TO FIND' table through all the records in the UNSTRUCTURED TEXT' Table extract the matches & do the same with the next word/phrase.  

 

Cheers

 

proc sql;
   create table RDW.UNSTRUCTURED_TEXT
       (Check_String char(3000));

insert into RDW.UNSTRUCTURED_TEXT
    values('The quick brown dog, jumped over-the lazy dog')
    values('The QUICK GREEN dog jumped over, the lazy dog')
    values('The quick snake slithered over  the lazy dog')
	values('The dog quickly Jumped over the lazy Fox')
	values('The Quick; brown_dog! jumped over the Lazy dog');

title 'RDW.UNSTRUCTURED_TEXT';

select *
   from RDW.UNSTRUCTURED_TEXT;

proc sql;
   create table RDW.TEXT_TO_FIND
       (Check_String char(300));

insert into RDW.TEXT_TO_FIND
    values('QUICK GREEN dog')
    values('quickly Jumped over')
    values('slithered over  the lazy dog')
	values('The Quick;')
	values('snake');

title 'RDW.TEXT_TO_FIND';

select *
   from RDW.TEXT_TO_FIND;

proc printto; run;

 

OscarBoots1
Quartz | Level 8

Thanks TomKari,

 

It's good but a little beyond my data Step ken at the moment.

 

Could I get a quick example using my posted data?

 

I need to crawl before I can walk.  :  )

ballardw
Super User

Your data sets did not have a ID variables so I don't know which one was supposed to. I put one into one of the sets, this approach would then just select the correct id. I'm not exactly sure what your desired output is. This may be a start.

 

proc sql;
   create table work.UNSTRUCTURED_TEXT
       (id int, Check_String char(3000));

   insert into work.UNSTRUCTURED_TEXT
    values(1,'The quick brown dog, jumped over-the lazy dog')
    values(2,'The QUICK GREEN dog jumped over, the lazy dog')
    values(3,'The quick snake slithered over  the lazy dog')
	 values(4,'The dog quickly Jumped over the lazy Fox')
	 values(5,'The Quick; brown_dog! jumped over the Lazy dog');
  select id,check_string 
  from work.UNSTRUCTURED_TEXT;
quit;

proc sql;
   create table work.TEXT_TO_FIND
       (Check_String char(300));

insert into work.TEXT_TO_FIND
    values('QUICK GREEN dog')
    values('quickly Jumped over')
    values('slithered over  the lazy dog')
	values('The Quick;')
	values('snake');

quit;

proc sql;
   create table want as
   select a.id,a.check_string, b.check_string as Found_text
   from work.UNSTRUCTURED_TEXT as a,work.TEXT_TO_FIND as b
   where index(a.check_string,strip(b.check_string))>0
   order by a.id;
quit;

 

I will make zero claim for effeciency other than in code maitenance or readability.

Note: Putting YOUR libraries into code can be very problematic...

OscarBoots1
Quartz | Level 8

Thanks ballardw,

 

Very useful!

TomKari
Onyx | Level 15

I think this is pretty close:

 

data PhraseList;

input SearchPhrase $50.;

cards4;

QUICK GREEN dog

quickly Jumped over

slithered over the lazy dog

The Quick;

snake

;;;;

run;

data Have;

input Comment $50.;

SeqNo = _n_;

cards4;

The quick brown dog, jumped over-the lazy dog

The QUICK GREEN dog jumped over, the lazy dog

The quick snake slithered over the lazy dog

The dog quickly Jumped over the lazy Fox

The Quick; brown_dog! jumped over the Lazy dog

;;;;

run;

data Inter1(keep=SeqNo SearchPhrase);

if _n_ = 1 then do;

if 0 then

set PhraseList;

declare hash P(dataset:'PhraseList', multidata:'y');

P.definekey('SearchPhrase');

P.definedone();

end;

declare hiter iter("P");

set Have;

rc = iter.first();

do while (rc=0);

if find(Comment, SearchPhrase, 't') then output;

rc = iter.next();

end;

run;

data Want(keep=SeqNo SearchResults);

length SearchResults $10000;

retain SearchResults;

set Inter1;

by SeqNo;

if first.SeqNo

then SearchResults = SearchPhrase;

else SearchResults = catx(",",SearchResults, SearchPhrase);

if last.SeqNo then output;

run;

proc datasets lib=work nolist;

delete PhraseList;

delete Have;

delete Inter1;

quit;

OscarBoots1
Quartz | Level 8

Thanks TomKari,

 

That's great

 

Much appreciated!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1967 views
  • 0 likes
  • 4 in conversation