<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Need to extract text between characters which includes quotation marks in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920500#M362522</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/264750"&gt;@RandoDando&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's create a few more test strings as sample data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input StringVar &amp;amp;$100.;
cards;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;&amp;lt;/a&amp;gt;
&amp;lt;b&amp;gt;bla&amp;lt;/b&amp;gt;&amp;lt;abc&amp;gt;xyz&amp;lt;/abc&amp;gt;&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;
TEST&amp;lt;/a&amp;gt;
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now let's fix your first approach:&lt;/P&gt;
&lt;PRE&gt;data want(drop=pos:);
set have;
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;pos0=find(StringVar, '&amp;lt;a ');&lt;/STRONG&gt;&lt;/FONT&gt;
pos1=findc(StringVar, '&amp;gt;'&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;, pos0+3&lt;/STRONG&gt;&lt;/FONT&gt;);
pos2=&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;find&lt;/STRONG&gt;&lt;/FONT&gt;(StringVar, '&amp;lt;/a&amp;gt;'&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;, pos1+1&lt;/FONT&gt;&lt;/STRONG&gt;);
NewVar = &lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;substrn&lt;/STRONG&gt;&lt;/FONT&gt;(StringVar, &lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;p&lt;/STRONG&gt;&lt;/FONT&gt;os1+1, &lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;p&lt;/STRONG&gt;&lt;/FONT&gt;os2-1-&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;p&lt;/STRONG&gt;&lt;/FONT&gt;os&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt;);
run;&lt;/PRE&gt;
&lt;P&gt;To make it more robust (against strings such as&amp;nbsp;&lt;FONT face="courier new,courier"&gt;'NOT&amp;gt;THIS&amp;lt;/a&amp;gt;'&lt;/FONT&gt;), you may want to add IF conditions like&lt;/P&gt;
&lt;PRE&gt;data want(drop=pos:);
set have;
pos0=find(StringVar, '&amp;lt;a ');
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;if pos0 then&lt;/STRONG&gt; &lt;/FONT&gt;pos1=findc(StringVar, '&amp;gt;', pos0+3);
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;if pos1 then&lt;/STRONG&gt;&lt;/FONT&gt; pos2=find(StringVar, '&amp;lt;/a&amp;gt;', pos1+1);
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;if pos2 then&lt;/STRONG&gt; &lt;/FONT&gt;NewVar = substrn(StringVar, pos1+1, pos2-1-pos1);
run;&lt;/PRE&gt;
&lt;P&gt;possibly with nested DO-END blocks if performance is an issue. You could also modify your second approach analogously.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that the separation of the fourth and fifth observation prevents &lt;FONT face="courier new,courier"&gt;'TEST'&lt;/FONT&gt; from being found.&lt;/P&gt;</description>
    <pubDate>Fri, 15 Mar 2024 19:04:35 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2024-03-15T19:04:35Z</dc:date>
    <item>
      <title>Need to extract text between characters which includes quotation marks</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920485#M362516</link>
      <description>&lt;P&gt;&amp;nbsp;I have a raw data file which is importing some text columns with tags on either side of the data.&amp;nbsp; The tags are not exactly the same on the left side for each case, and include no space.&amp;nbsp; An example would be this:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I want to extract from that PROPER DATA.&lt;/P&gt;
&lt;P&gt;The alphanumeric portion after href= on the left side can vary from row to row.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've tried this, but it does nothing.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;pos1=findc(StringVar, '&amp;gt;');
pos2=findc(StringVar, '&amp;lt;/a&amp;gt;');
NewVar = substr(StringVar, nos1+1, nos2-1-nos2);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I've tried this and had some result.&amp;nbsp; No change.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;pos1=findc(StringVar, '_blank"&amp;gt;');
pos2=findc(StringVar, '&amp;lt;/a&amp;gt;');
NewVar = substr(StringVar, nos1+1, nos2-1-nos2);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;How can I extract that text?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 17:32:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920485#M362516</guid>
      <dc:creator>RandoDando</dc:creator>
      <dc:date>2024-03-15T17:32:48Z</dc:date>
    </item>
    <item>
      <title>Re: Need to extract text between characters which includes quotation marks</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920496#M362519</link>
      <description>&lt;P&gt;Assuming the only time the &amp;lt; and &amp;gt; symbols are in your data is for the tags, you could use the SCAN function and treat those as delimiters:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;6    data want ;
7      str='&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;' ;
8      length want $50 ;
9      want=scan(str,2,'&amp;lt;&amp;gt;') ;
10     put want= ;
11   run ;

want=PROPER DATA
NOTE: The data set WORK.WANT has 1 observations and 2 variables.
&lt;/PRE&gt;</description>
      <pubDate>Fri, 15 Mar 2024 18:28:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920496#M362519</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-03-15T18:28:09Z</dc:date>
    </item>
    <item>
      <title>Re: Need to extract text between characters which includes quotation marks</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920497#M362520</link>
      <description>&lt;P&gt;If your example is of all the lines you need to read (unlikely I suspect) then perhaps:&lt;/P&gt;
&lt;PRE&gt;data example;
   string='&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;';
   want = scan(string,2,'&amp;lt;&amp;gt;');
run;&lt;/PRE&gt;
&lt;P&gt;SCAN will allow you to provide characters that delimit string values. Since your line starts with an &amp;lt; then it precedes the first string which would be everything until the next &amp;lt; or &amp;gt; character.&lt;/P&gt;
&lt;P&gt;You would want to assign a length to the target variable, Want above, to hold the longest expected value prior to use.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would expect strange results. Your use of&lt;/P&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;nos2-1-nos2)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Would tent to always result in a value of -1. ( nos2-1-nos2 is equivalent to nos2-nos2-1) Which when you get negative lengths SUBSTR tends to odd things as documented in the online help:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;DIV class="xisDoc-syntax"&gt;
&lt;DIV class="xisDoc-syntaxDescription"&gt;
&lt;DIV class="xisDoc-optionalArgGroup"&gt;
&lt;DIV id="n00ca3ifw9gzexn1k4g0t5ge9f2i" class="xisDoc-argDescriptionPair"&gt;
&lt;DIV class="xisDoc-argumentDescription"&gt;
&lt;SECTION class="xisDoc-tableWrap"&gt;
&lt;TABLE class="xisDoc-summary"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TH class="xisDoc-summaryInteraction"&gt;Interaction&lt;/TH&gt;
&lt;TD class="xisDoc-summaryText"&gt;If &lt;EM class="xisDoc-userSuppliedValue"&gt;length&lt;/EM&gt; is zero, &lt;STRONG&gt;a negative value&lt;/STRONG&gt;, or larger than the length of the expression that remains in &lt;EM class="xisDoc-userSuppliedValue"&gt;string&lt;/EM&gt; after &lt;EM class="xisDoc-userSuppliedValue"&gt;position&lt;/EM&gt;, SAS extracts the remainder of the expression. SAS also sets _ERROR_ to 1 and prints a note to the log indicating that the &lt;EM class="xisDoc-userSuppliedValue"&gt;length&lt;/EM&gt; argument is invalid.&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/SECTION&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;There should be notes in the log.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/264750"&gt;@RandoDando&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;I have a raw data file which is importing some text columns with tags on either side of the data.&amp;nbsp; The tags are not exactly the same on the left side for each case, and include no space.&amp;nbsp; An example would be this:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I want to extract from that PROPER DATA.&lt;/P&gt;
&lt;P&gt;The alphanumeric portion after href= on the left side can vary from row to row.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've tried this, but it does nothing.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;pos1=findc(StringVar, '&amp;gt;');
pos2=findc(StringVar, '&amp;lt;/a&amp;gt;');
NewVar = substr(StringVar, nos1+1, nos2-1-nos2);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I've tried this and had some result.&amp;nbsp; No change.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;pos1=findc(StringVar, '_blank"&amp;gt;');
pos2=findc(StringVar, '&amp;lt;/a&amp;gt;');
NewVar = substr(StringVar, nos1+1, nos2-1-nos2);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;How can I extract that text?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 18:28:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920497#M362520</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-03-15T18:28:50Z</dc:date>
    </item>
    <item>
      <title>Re: Need to extract text between characters which includes quotation marks</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920499#M362521</link>
      <description>This seems to work.  THanks</description>
      <pubDate>Fri, 15 Mar 2024 19:03:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920499#M362521</guid>
      <dc:creator>RandoDando</dc:creator>
      <dc:date>2024-03-15T19:03:56Z</dc:date>
    </item>
    <item>
      <title>Re: Need to extract text between characters which includes quotation marks</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920500#M362522</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/264750"&gt;@RandoDando&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's create a few more test strings as sample data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input StringVar &amp;amp;$100.;
cards;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;&amp;lt;/a&amp;gt;
&amp;lt;b&amp;gt;bla&amp;lt;/b&amp;gt;&amp;lt;abc&amp;gt;xyz&amp;lt;/abc&amp;gt;&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;
TEST&amp;lt;/a&amp;gt;
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now let's fix your first approach:&lt;/P&gt;
&lt;PRE&gt;data want(drop=pos:);
set have;
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;pos0=find(StringVar, '&amp;lt;a ');&lt;/STRONG&gt;&lt;/FONT&gt;
pos1=findc(StringVar, '&amp;gt;'&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;, pos0+3&lt;/STRONG&gt;&lt;/FONT&gt;);
pos2=&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;find&lt;/STRONG&gt;&lt;/FONT&gt;(StringVar, '&amp;lt;/a&amp;gt;'&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;, pos1+1&lt;/FONT&gt;&lt;/STRONG&gt;);
NewVar = &lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;substrn&lt;/STRONG&gt;&lt;/FONT&gt;(StringVar, &lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;p&lt;/STRONG&gt;&lt;/FONT&gt;os1+1, &lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;p&lt;/STRONG&gt;&lt;/FONT&gt;os2-1-&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;p&lt;/STRONG&gt;&lt;/FONT&gt;os&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt;);
run;&lt;/PRE&gt;
&lt;P&gt;To make it more robust (against strings such as&amp;nbsp;&lt;FONT face="courier new,courier"&gt;'NOT&amp;gt;THIS&amp;lt;/a&amp;gt;'&lt;/FONT&gt;), you may want to add IF conditions like&lt;/P&gt;
&lt;PRE&gt;data want(drop=pos:);
set have;
pos0=find(StringVar, '&amp;lt;a ');
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;if pos0 then&lt;/STRONG&gt; &lt;/FONT&gt;pos1=findc(StringVar, '&amp;gt;', pos0+3);
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;if pos1 then&lt;/STRONG&gt;&lt;/FONT&gt; pos2=find(StringVar, '&amp;lt;/a&amp;gt;', pos1+1);
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;if pos2 then&lt;/STRONG&gt; &lt;/FONT&gt;NewVar = substrn(StringVar, pos1+1, pos2-1-pos1);
run;&lt;/PRE&gt;
&lt;P&gt;possibly with nested DO-END blocks if performance is an issue. You could also modify your second approach analogously.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that the separation of the fourth and fifth observation prevents &lt;FONT face="courier new,courier"&gt;'TEST'&lt;/FONT&gt; from being found.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 19:04:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920500#M362522</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2024-03-15T19:04:35Z</dc:date>
    </item>
    <item>
      <title>Re: Need to extract text between characters which includes quotation marks</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920629#M362595</link>
      <description>&lt;P&gt;It is best example/scenario for Perl Regular Expression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input StringVar &amp;amp;$100.;
cards;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;&amp;lt;/a&amp;gt;
&amp;lt;b&amp;gt;bla&amp;lt;/b&amp;gt;&amp;lt;abc&amp;gt;xyz&amp;lt;/abc&amp;gt;&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;PROPER DATA&amp;lt;/a&amp;gt;
&amp;lt;a href="a213w9245992999Vrh" target="_blank"&amp;gt;
TEST&amp;lt;/a&amp;gt;
;
data want;
 set have;
 want=prxchange('s/&amp;lt;.+?&amp;gt;//',-1,StringVar);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 18 Mar 2024 05:11:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Need-to-extract-text-between-characters-which-includes-quotation/m-p/920629#M362595</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-03-18T05:11:39Z</dc:date>
    </item>
  </channel>
</rss>

