<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I extract multiple instance of a pattern of text in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815830#M321994</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large block of text that I am trying to extract sentences from. The sentences that I'm interested in extracting begin with the phrase "failed" and end with a period. However, sometimes the large text includes several instances of a failed to phrase and the code I'm using now does not capture each instance, but only the first.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is an example of the large block of text I am working with:&lt;/P&gt;&lt;P&gt;Based on document review and interview, it was determined John failed to properly put away all&amp;nbsp; materials used during construction. This could result in damage to the work place and possible injury to co-works. It was also noted that Dave failed to secure the ladder at the end of his shift. Additionally, Deborah failed to properly shut down her computer before leaving for the day.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the code I have been using. Another possible start phrase is "it was determined"&amp;nbsp; and another possible end phrase is "Findings", but I'm really primarily concerned with extracting between "failed" and the first period.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data test3;&lt;BR /&gt;set test2;&lt;BR /&gt;failed = index(text,'failed');&lt;BR /&gt;determined = index(text,'it was determined');&lt;BR /&gt;findings = index(text,'Findings');&lt;BR /&gt;if findings ne 0 then do;&lt;BR /&gt;if failed ne 0 then do;&lt;BR /&gt;tmp = substr(text,failed+0);&lt;BR /&gt;put tmp;&lt;BR /&gt;pos2 = index(tmp,"Findings");&lt;BR /&gt;Extract1 = substr(tmp,1,pos2-1);&lt;BR /&gt;put Extract1;&lt;BR /&gt;end;&lt;BR /&gt;else if failed = 0 then do;&lt;BR /&gt;tmp2 = substr(text,determined+0);&lt;BR /&gt;put tmp2;&lt;BR /&gt;pos4 = index(tmp2,"Findings");&lt;BR /&gt;Extract2 = substr(tmp2,1,pos4-1);&lt;BR /&gt;put Extract2;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;if findings = 0 then do;&lt;BR /&gt;if failed ne 0 then do;&lt;BR /&gt;tmp = substr(text,failed+0);&lt;BR /&gt;put tmp;&lt;BR /&gt;pos2 = index(tmp,'.');&lt;BR /&gt;Extract1 = substr(tmp,1,pos2-1);&lt;BR /&gt;put Extract1;&lt;BR /&gt;end;&lt;BR /&gt;else if failed = 0 then do;&lt;BR /&gt;tmp2 = substr(text,determined+0);&lt;BR /&gt;put tmp2;&lt;BR /&gt;pos4 = index(tmp2,'.');&lt;BR /&gt;Extract2 = substr(tmp2,1,pos4-1);&lt;BR /&gt;put Extract2;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;keep text Extract1 Extract2;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for any help!&lt;/P&gt;</description>
    <pubDate>Tue, 31 May 2022 14:01:18 GMT</pubDate>
    <dc:creator>cqr525</dc:creator>
    <dc:date>2022-05-31T14:01:18Z</dc:date>
    <item>
      <title>How do I extract multiple instance of a pattern of text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815830#M321994</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large block of text that I am trying to extract sentences from. The sentences that I'm interested in extracting begin with the phrase "failed" and end with a period. However, sometimes the large text includes several instances of a failed to phrase and the code I'm using now does not capture each instance, but only the first.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is an example of the large block of text I am working with:&lt;/P&gt;&lt;P&gt;Based on document review and interview, it was determined John failed to properly put away all&amp;nbsp; materials used during construction. This could result in damage to the work place and possible injury to co-works. It was also noted that Dave failed to secure the ladder at the end of his shift. Additionally, Deborah failed to properly shut down her computer before leaving for the day.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the code I have been using. Another possible start phrase is "it was determined"&amp;nbsp; and another possible end phrase is "Findings", but I'm really primarily concerned with extracting between "failed" and the first period.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data test3;&lt;BR /&gt;set test2;&lt;BR /&gt;failed = index(text,'failed');&lt;BR /&gt;determined = index(text,'it was determined');&lt;BR /&gt;findings = index(text,'Findings');&lt;BR /&gt;if findings ne 0 then do;&lt;BR /&gt;if failed ne 0 then do;&lt;BR /&gt;tmp = substr(text,failed+0);&lt;BR /&gt;put tmp;&lt;BR /&gt;pos2 = index(tmp,"Findings");&lt;BR /&gt;Extract1 = substr(tmp,1,pos2-1);&lt;BR /&gt;put Extract1;&lt;BR /&gt;end;&lt;BR /&gt;else if failed = 0 then do;&lt;BR /&gt;tmp2 = substr(text,determined+0);&lt;BR /&gt;put tmp2;&lt;BR /&gt;pos4 = index(tmp2,"Findings");&lt;BR /&gt;Extract2 = substr(tmp2,1,pos4-1);&lt;BR /&gt;put Extract2;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;if findings = 0 then do;&lt;BR /&gt;if failed ne 0 then do;&lt;BR /&gt;tmp = substr(text,failed+0);&lt;BR /&gt;put tmp;&lt;BR /&gt;pos2 = index(tmp,'.');&lt;BR /&gt;Extract1 = substr(tmp,1,pos2-1);&lt;BR /&gt;put Extract1;&lt;BR /&gt;end;&lt;BR /&gt;else if failed = 0 then do;&lt;BR /&gt;tmp2 = substr(text,determined+0);&lt;BR /&gt;put tmp2;&lt;BR /&gt;pos4 = index(tmp2,'.');&lt;BR /&gt;Extract2 = substr(tmp2,1,pos4-1);&lt;BR /&gt;put Extract2;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;keep text Extract1 Extract2;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for any help!&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2022 14:01:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815830#M321994</guid>
      <dc:creator>cqr525</dc:creator>
      <dc:date>2022-05-31T14:01:18Z</dc:date>
    </item>
    <item>
      <title>Re: How do I extract multiple instance of a pattern of text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815861#M322007</link>
      <description>&lt;P&gt;Take a look at&amp;nbsp;&lt;A href="https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n1obc9u7z3225mn1npwnassehff0.htm" target="_self"&gt;CALL PRXNEXT Routine&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Using the example code and adjusting the Regular Expression&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
   ExpressionID = prxparse('/failed.*?\./');
   text = 'The woods have a failed here for some reason. bat, cat, and failed here with some other text. a rat!';
   start = 1;
   stop = length(text);
      /* Use PRXNEXT to find the first instance of the pattern, */
      /* then use DO WHILE to find all further instances.       */
      /* PRXNEXT changes the start parameter so that searching  */
      /* begins again after the last match.                     */
   call prxnext(ExpressionID, start, stop, text, position, length);
      do while (position &amp;gt; 0);
         found = substr(text, position, length);
         put found= position= length=;
         call prxnext(ExpressionID, start, stop, text, position, length);
      end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 31 May 2022 15:06:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815861#M322007</guid>
      <dc:creator>AMSAS</dc:creator>
      <dc:date>2022-05-31T15:06:34Z</dc:date>
    </item>
    <item>
      <title>Re: How do I extract multiple instance of a pattern of text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815874#M322012</link>
      <description>&lt;P&gt;This worked and put all the instances of failed into the log. Is there a way to extract them into a new variable instead of being put in the log?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2022 15:45:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815874#M322012</guid>
      <dc:creator>cqr525</dc:creator>
      <dc:date>2022-05-31T15:45:38Z</dc:date>
    </item>
    <item>
      <title>Re: How do I extract multiple instance of a pattern of text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815970#M322045</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/426577"&gt;@cqr525&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;This worked and put all the instances of failed into the log. Is there a way to extract them into a new variable instead of being put in the log?&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basic approach: 1) Replace data _null_ with: Data yourdatasetnamegoeshere.&lt;/P&gt;
&lt;P&gt;2) Found would be the name of the new variable. If you replace&lt;/P&gt;
&lt;LI-CODE lang="sas"&gt;put found= position= length=;&lt;/LI-CODE&gt;
&lt;P&gt;with&lt;/P&gt;
&lt;PRE&gt;Output;&lt;/PRE&gt;
&lt;P&gt;it will write the current record including the variables Start, Stop, Found, Position and Length to the data set each time it is "found". You would use a DROP statement to prevent any of those variables from appearing in the data set. For example, this means that Start and Stop do not make it into the data set.&lt;/P&gt;
&lt;PRE&gt;drop start stop;&lt;/PRE&gt;</description>
      <pubDate>Tue, 31 May 2022 22:04:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/815970#M322045</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-05-31T22:04:39Z</dc:date>
    </item>
    <item>
      <title>Re: How do I extract multiple instance of a pattern of text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/816030#M322065</link>
      <description>This is great and worked, thank you very much!&lt;BR /&gt;&lt;BR /&gt;My last question is it possible to make each extraction outputted as its own variable? I'll have multiple blocks of texts of to look through and if all the found text is one variable it will make for a very long and messy output to read.&lt;BR /&gt;So ideally my variables would be:&lt;BR /&gt;&lt;BR /&gt;Text, Found1, Found2, Found3,... with each found being an instance of "failed to..." within a block the block of text.&lt;BR /&gt;&lt;BR /&gt;I really appreciate all the help!</description>
      <pubDate>Wed, 01 Jun 2022 15:34:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-extract-multiple-instance-of-a-pattern-of-text/m-p/816030#M322065</guid>
      <dc:creator>cqr525</dc:creator>
      <dc:date>2022-06-01T15:34:50Z</dc:date>
    </item>
  </channel>
</rss>

