BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
afs
Calcite | Level 5 afs
Calcite | Level 5

i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this . 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )

then maybe a slight change to the loop code will do the work:

 

do i = length(trim(paragraph)) - 1 to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

You can use either SCAN() or regular expression matching. Here is how to do the later:

 

data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this .
;

data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]*[.?!]/");
start = 1;
stop=length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
    sentence = substr(text, pos, len);
    call prxNext(prx, start, stop, text, pos, len);
    end;
keep text sentence;
run;

proc print; run;

 

PG
afs
Calcite | Level 5 afs
Calcite | Level 5
It worked -Thanks But the result is showing both the text and the sentence..Is it possible just to see the result ?
PGStats
Opal | Level 21

keep sentence;

 

instead of 

 

keep text sentence;

PG
PGStats
Opal | Level 21

Same, but slightly improved and better tested:

 

data have;
infile datalines truncover;
input text $char350.;
datalines;
i want to read the last sentence in a long text having length 350 .it can be separated by marks, or ! or ? or period. I don't want to use  No leading blanks be there.   I don't know how to develop a code for this .
First sentence. Last sentence with missing punctuation
First sentence!!! Last sentence, with emphatic punctuation ?!

First sentence is also the last.
;

data lastSentence;
set have;
length sentence $350;
if not prx then prx + prxParse("/[^.?!]+([.?!]+|$)/");
start = 1;
stop = length(text);
call prxNext(prx, start, stop, text, pos, len);
do while(pos>0);
    sentence = left(substr(text, pos, len));
    call prxNext(prx, start, stop, text, pos, len);
    end;
keep sentence;
run;

proc print; run;
PG
Shmuel
Garnet | Level 18

Function INDEXC enables look forward for first occurence of a character from a given string characters.

As I couldn't find similar function to look backwards, I shall use a loop searching the end of a last-previous sentence:

 

length paragraph $350;  /* contains your long string */

 

do i = length(trim(paragraph)) to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

afs
Calcite | Level 5 afs
Calcite | Level 5
Dear Patrick-Thanks - I will study the link
Shmuel
Garnet | Level 18

Thank you @Patrick, I have skiped the right modifier reading the FINDC documentation.

 

this function shorts the code into:

     i = findc(paragraph , '.?!' , B);   /* B modifies to search Backwards */

     sentence = substr(paragraph, i+1,  length(trim(paragraph)) - i );

 

     put sentence=;    /* write the sentence to the log */

Ksharp
Super User

Can't use scan() ?

 

 want = scan(paragraph ,-1, '.?!' );   
PGStats
Opal | Level 21

@Ksharp, you would have to use something like

 

sentence = coalescec(scan(text ,-1, '.?!' ), scan(text ,-2, '.?!' ));

 

but you would still miss the last punctuation character.

PG
Shmuel
Garnet | Level 18

As any of the functions: INDEXC , FINDC, SCAN - will miss the delimiter character (. ? ! )

then maybe a slight change to the loop code will do the work:

 

do i = length(trim(paragraph)) - 1 to 1 by -1;

      if substr(paragraph,i,1) in ('.' , '!' , '?')      /* you may add any other characters you want */

         then leave;             /* get out of the loop with i points to end of last_previous sentence */

end;

last_sentence = substr(paragraph, i+1, length(trim(paragraph)) - i );

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1517 views
  • 0 likes
  • 5 in conversation