Hi Forum,
I have a list of several hundred Keywords & short phrases 2 or 3 words together.
I also have a long list of unstructured text where I want to know if the Keywords & phrases appear exactly as I've got in the list.
If they do appear, I want to export the ID field & then export the Keywords/Phrases out delimited by a comma on the same row as the ID.
I want to export to a new table in the existing Project.
I've had a look at some functions but maybe a Data Step Loop would be the most effective method?
Can anyone assist with the Data Step code I would need to do this or suggest a better solution?
I'm using SAS EG Vers 7.13 64 bit.
Thanks
Look for hash iterate method within a datastep, using functions like find or index.
It would be more helpful if you post some test examples of your key words and of your text dataset.
Hi Shmuel,
I've posted a quick Create table & Insert statement to help.
Note, I've added some punctuation to show that the data is a bit messy but I'm just after the words/phrases regardless of punctuation or case.
I want to find & extract the words/phrases in the 'TEXT TO FIND' table from the UNSTRUCTURED TEXT' Table.
I was thinking of a Loop that runs the first word/phrase from the 'TEXT TO FIND' table through all the records in the UNSTRUCTURED TEXT' Table extract the matches & do the same with the next word/phrase.
Cheers
proc sql; create table RDW.UNSTRUCTURED_TEXT (Check_String char(3000)); insert into RDW.UNSTRUCTURED_TEXT values('The quick brown dog, jumped over-the lazy dog') values('The QUICK GREEN dog jumped over, the lazy dog') values('The quick snake slithered over the lazy dog') values('The dog quickly Jumped over the lazy Fox') values('The Quick; brown_dog! jumped over the Lazy dog'); title 'RDW.UNSTRUCTURED_TEXT'; select * from RDW.UNSTRUCTURED_TEXT; proc sql; create table RDW.TEXT_TO_FIND (Check_String char(300)); insert into RDW.TEXT_TO_FIND values('QUICK GREEN dog') values('quickly Jumped over') values('slithered over the lazy dog') values('The Quick;') values('snake'); title 'RDW.TEXT_TO_FIND'; select * from RDW.TEXT_TO_FIND; proc printto; run;
This may give you some ideas:
http://support.sas.com/resources/papers/proceedings14/1717-2014.pdf
Tom
Thanks TomKari,
It's good but a little beyond my data Step ken at the moment.
Could I get a quick example using my posted data?
I need to crawl before I can walk. : )
Your data sets did not have a ID variables so I don't know which one was supposed to. I put one into one of the sets, this approach would then just select the correct id. I'm not exactly sure what your desired output is. This may be a start.
proc sql; create table work.UNSTRUCTURED_TEXT (id int, Check_String char(3000)); insert into work.UNSTRUCTURED_TEXT values(1,'The quick brown dog, jumped over-the lazy dog') values(2,'The QUICK GREEN dog jumped over, the lazy dog') values(3,'The quick snake slithered over the lazy dog') values(4,'The dog quickly Jumped over the lazy Fox') values(5,'The Quick; brown_dog! jumped over the Lazy dog'); select id,check_string from work.UNSTRUCTURED_TEXT; quit; proc sql; create table work.TEXT_TO_FIND (Check_String char(300)); insert into work.TEXT_TO_FIND values('QUICK GREEN dog') values('quickly Jumped over') values('slithered over the lazy dog') values('The Quick;') values('snake'); quit; proc sql; create table want as select a.id,a.check_string, b.check_string as Found_text from work.UNSTRUCTURED_TEXT as a,work.TEXT_TO_FIND as b where index(a.check_string,strip(b.check_string))>0 order by a.id; quit;
I will make zero claim for effeciency other than in code maitenance or readability.
Note: Putting YOUR libraries into code can be very problematic...
Thanks ballardw,
Very useful!
I think this is pretty close:
data PhraseList;
input SearchPhrase $50.;
cards4;
QUICK GREEN dog
quickly Jumped over
slithered over the lazy dog
The Quick;
snake
;;;;
run;
data Have;
input Comment $50.;
SeqNo = _n_;
cards4;
The quick brown dog, jumped over-the lazy dog
The QUICK GREEN dog jumped over, the lazy dog
The quick snake slithered over the lazy dog
The dog quickly Jumped over the lazy Fox
The Quick; brown_dog! jumped over the Lazy dog
;;;;
run;
data Inter1(keep=SeqNo SearchPhrase);
if _n_ = 1 then do;
if 0 then
set PhraseList;
declare hash P(dataset:'PhraseList', multidata:'y');
P.definekey('SearchPhrase');
P.definedone();
end;
declare hiter iter("P");
set Have;
rc = iter.first();
do while (rc=0);
if find(Comment, SearchPhrase, 't') then output;
rc = iter.next();
end;
run;
data Want(keep=SeqNo SearchResults);
length SearchResults $10000;
retain SearchResults;
set Inter1;
by SeqNo;
if first.SeqNo
then SearchResults = SearchPhrase;
else SearchResults = catx(",",SearchResults, SearchPhrase);
if last.SeqNo then output;
run;
proc datasets lib=work nolist;
delete PhraseList;
delete Have;
delete Inter1;
quit;
Thanks TomKari,
That's great
Much appreciated!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.