DATA Step, Macro, Functions and more

Reading Text

Accepted Solution Solved
Reply
Contributor afs
Contributor
Posts: 28
Accepted Solution

Reading Text

i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this . 

 

 


Accepted Solutions
Solution
‎11-11-2016 01:44 PM
Trusted Advisor
Posts: 1,553

Re: Reading Text

As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )

then maybe a slight change to the loop code will do the work:

 

do i = length(trim(paragraph)) - 1 to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

View solution in original post


All Replies
Respected Advisor
Posts: 4,919

Re: Reading Text

You can use either SCAN() or regular expression matching. Here is how to do the later:

 

data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this .
;

data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]*[.?!]/");
start = 1;
stop=length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
    sentence = substr(text, pos, len);
    call prxNext(prx, start, stop, text, pos, len);
    end;
keep text sentence;
run;

proc print; run;

 

PG
Contributor afs
Contributor
Posts: 28

Re: Reading Text

It worked -Thanks But the result is showing both the text and the sentence..Is it possible just to see the result ?
Respected Advisor
Posts: 4,919

Re: Reading Text

keep sentence;

 

instead of 

 

keep text sentence;

PG
Respected Advisor
Posts: 4,919

Re: Reading Text

Same, but slightly improved and better tested:

 

data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this .
First sentence. Last sentence with missing punctuation
First sentence!!! Last sentence, with emphatic punctuation ?!

First sentence is also the last.
;

data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]+([.?!]+|$)/");
start = 1;
stop = length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
    sentence = left(substr(text, pos, len));
    call prxNext(prx, start, stop, text, pos, len);
    end;
keep sentence;
run;

proc print; run;
PG
Trusted Advisor
Posts: 1,553

Re: Reading Text

Function INDEXC enables look forward for first occurence of a character from a given string characters.

As I couldn't find similar function to look backwards, I shall use a loop searching the end of a last-previous sentence:

 

length paragraph $350;  /* contains your long string */

 

do i = length(trim(paragraph)) to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

Respected Advisor
Posts: 4,173

Re: Reading Text

Contributor afs
Contributor
Posts: 28

Re: Reading Text

Dear Patrick-Thanks - I will study the link
Trusted Advisor
Posts: 1,553

Re: Reading Text

Thank you @Patrick, I have skiped the right modifier reading the FINDC documentation.

 

this function shorts the code into:

     i = findc(paragraph , '.?!' , B);   /* B modifies to search Backwards */

     sentence = substr(paragraph, i+1,  length(trim(paragraph)) - i );

 

     put sentence=;    /* write the sentence to the log */

Super User
Posts: 10,018

Re: Reading Text

Can't use scan() ?

 

 want = scan(paragraph ,-1, '.?!' );   
Respected Advisor
Posts: 4,919

Re: Reading Text

@Ksharp, you would have to use something like

 

sentence = coalescec(scan(text ,-1, '.?!' ), scan(text ,-2, '.?!' ));

 

but you would still miss the last punctuation character.

PG
Solution
‎11-11-2016 01:44 PM
Trusted Advisor
Posts: 1,553

Re: Reading Text

As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )

then maybe a slight change to the loop code will do the work:

 

do i = length(trim(paragraph)) - 1 to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 472 views
  • 0 likes
  • 5 in conversation