i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use No leading blanks be there. I don't know how to develop a code for this .
As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )
then maybe a slight change to the loop code will do the work:
do i = length(trim(paragraph)) - 1 to 1 by -1;
if substr(paragraph,i,1) in ('.' , '!' , '?') /* you may add any other characters you want */
then leave; /* get out of the loop with i points to end of last_previous sentence */
end;
last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );
You can use either SCAN() or regular expression matching. Here is how to do the later:
data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use No leading blanks be there. I don't know how to develop a code for this .
;
data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]*[.?!]/");
start = 1;
stop=length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
sentence = substr(text, pos, len);
call prxNext(prx, start, stop, text, pos, len);
end;
keep text sentence;
run;
proc print; run;
keep sentence;
instead of
keep text sentence;
Same, but slightly improved and better tested:
data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use No leading blanks be there. I don't know how to develop a code for this .
First sentence. Last sentence with missing punctuation
First sentence!!! Last sentence, with emphatic punctuation ?!
First sentence is also the last.
;
data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]+([.?!]+|$)/");
start = 1;
stop = length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
sentence = left(substr(text, pos, len));
call prxNext(prx, start, stop, text, pos, len);
end;
keep sentence;
run;
proc print; run;
Function INDEXC enables look forward for first occurence of a character from a given string characters.
As I couldn't find similar function to look backwards, I shall use a loop searching the end of a last-previous sentence:
length paragraph $350; /* contains your long string */
do i = length(trim(paragraph)) to 1 by -1;
if substr(paragraph,i,1) in ('.' , '!' , '?') /* you may add any other characters you want */
then leave; /* get out of the loop with i points to end of last_previous sentence */
end;
last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );
You can search from right to left with findc()
Thank you @Patrick, I have skiped the right modifier reading the FINDC documentation.
this function shorts the code into:
i = findc(paragraph , '.?!' , B); /* B modifies to search Backwards */
sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );
put sentence=; /* write the sentence to the log */
Can't use scan() ?
want = scan(paragraph ,-1, '.?!' );
@Ksharp, you would have to use something like
sentence = coalescec(scan(text ,-1, '.?!' ), scan(text ,-2, '.?!' ));
but you would still miss the last punctuation character.
As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )
then maybe a slight change to the loop code will do the work:
do i = length(trim(paragraph)) - 1 to 1 by -1;
if substr(paragraph,i,1) in ('.' , '!' , '?') /* you may add any other characters you want */
then leave; /* get out of the loop with i points to end of last_previous sentence */
end;
last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.