BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
afs
Calcite | Level 5 afs
Calcite | Level 5

i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this . 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )

then maybe a slight change to the loop code will do the work:

 

do i = length(trim(paragraph)) - 1 to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

You can use either SCAN() or regular expression matching. Here is how to do the later:

 

data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this .
;

data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]*[.?!]/");
start = 1;
stop=length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
    sentence = substr(text, pos, len);
    call prxNext(prx, start, stop, text, pos, len);
    end;
keep text sentence;
run;

proc print; run;

 

PG
afs
Calcite | Level 5 afs
Calcite | Level 5
It worked -Thanks But the result is showing both the text and the sentence..Is it possible just to see the result ?
PGStats
Opal | Level 21

keep sentence;

 

instead of 

 

keep text sentence;

PG
PGStats
Opal | Level 21

Same, but slightly improved and better tested:

 

data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this .
First sentence. Last sentence with missing punctuation
First sentence!!! Last sentence, with emphatic punctuation ?!

First sentence is also the last.
;

data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]+([.?!]+|$)/");
start = 1;
stop = length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
    sentence = left(substr(text, pos, len));
    call prxNext(prx, start, stop, text, pos, len);
    end;
keep sentence;
run;

proc print; run;
PG
Shmuel
Garnet | Level 18

Function INDEXC enables look forward for first occurence of a character from a given string characters.

As I couldn't find similar function to look backwards, I shall use a loop searching the end of a last-previous sentence:

 

length paragraph $350;  /* contains your long string */

 

do i = length(trim(paragraph)) to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

afs
Calcite | Level 5 afs
Calcite | Level 5
Dear Patrick-Thanks - I will study the link
Shmuel
Garnet | Level 18

Thank you @Patrick, I have skiped the right modifier reading the FINDC documentation.

 

this function shorts the code into:

     i = findc(paragraph , '.?!' , B);   /* B modifies to search Backwards */

     sentence = substr(paragraph, i+1,  length(trim(paragraph)) - i );

 

     put sentence=;    /* write the sentence to the log */

Ksharp
Super User

Can't use scan() ?

 

 want = scan(paragraph ,-1, '.?!' );   
PGStats
Opal | Level 21

@Ksharp, you would have to use something like

 

sentence = coalescec(scan(text ,-1, '.?!' ), scan(text ,-2, '.?!' ));

 

but you would still miss the last punctuation character.

PG
Shmuel
Garnet | Level 18

As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )

then maybe a slight change to the loop code will do the work:

 

do i = length(trim(paragraph)) - 1 to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1557 views
  • 0 likes
  • 5 in conversation