<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic extract contents within html tags in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733282#M228507</link>
    <description>&lt;P&gt;Hi , everyone I am trying to extract an article from a website , the web page has the paragraphs in the following structure:&lt;/P&gt;
&lt;P&gt;&amp;lt;p&amp;gt;&lt;SPAN&gt;Our &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;universal abuse of natural resources has created an imbalance in nature, contributing to the beginning of extinction.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt; Over the last 100 years over 500 species have already gone extinct. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If we do not act now many more animals face extinction over the next 30 years including Orangutans, Rhinos, Polar Bears, Gorillas, Gibbons, Chimpanzees to name a few.&amp;lt;/p&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How do I extract the content within the html tags?&lt;/P&gt;</description>
    <pubDate>Tue, 13 Apr 2021 12:29:48 GMT</pubDate>
    <dc:creator>kaziumair</dc:creator>
    <dc:date>2021-04-13T12:29:48Z</dc:date>
    <item>
      <title>extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733282#M228507</link>
      <description>&lt;P&gt;Hi , everyone I am trying to extract an article from a website , the web page has the paragraphs in the following structure:&lt;/P&gt;
&lt;P&gt;&amp;lt;p&amp;gt;&lt;SPAN&gt;Our &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;universal abuse of natural resources has created an imbalance in nature, contributing to the beginning of extinction.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt; Over the last 100 years over 500 species have already gone extinct. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If we do not act now many more animals face extinction over the next 30 years including Orangutans, Rhinos, Polar Bears, Gorillas, Gibbons, Chimpanzees to name a few.&amp;lt;/p&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How do I extract the content within the html tags?&lt;/P&gt;</description>
      <pubDate>Tue, 13 Apr 2021 12:29:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733282#M228507</guid>
      <dc:creator>kaziumair</dc:creator>
      <dc:date>2021-04-13T12:29:48Z</dc:date>
    </item>
    <item>
      <title>Re: extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733288#M228511</link>
      <description>&lt;P&gt;If the tags are always on an individual line, it's easy:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
infile datalines truncover;
retain flag 0;
input line $200.;
if line = "&amp;lt;p&amp;gt;" then flag = 1;
else do;
  if line = "&amp;lt;/p&amp;gt;" then flag = 0;
  if flag then output;
end;
drop flag;
datalines;
&amp;lt;p&amp;gt;
Our universal abuse of natural resources has created an imbalance in nature, contributing to the beginning of extinction. Over the last 100 years over 500 species have already gone extinct.
If we do not act now many more animals face extinction over the next 30 years including Orangutans, Rhinos, Polar Bears, Gorillas, Gibbons, Chimpanzees to name a few.
&amp;lt;/p&amp;gt;
;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 13 Apr 2021 12:23:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733288#M228511</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-04-13T12:23:10Z</dc:date>
    </item>
    <item>
      <title>Re: extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733291#M228512</link>
      <description>Hi , sorry I made a mistake in the example , the tags are not on individual lines, the paragraphs are similar to the following :&lt;BR /&gt;&amp;lt;p&amp;gt;Our &lt;BR /&gt;universal abuse of natural resources has created an imbalance in nature, contributing to the beginning of extinction. Over the last 100 years over 500 species have already gone extinct.&lt;BR /&gt;If we do not act now many more animals face extinction over the next 30 years including Orangutans, Rhinos, Polar Bears, Gorillas, Gibbons, Chimpanzees to name a few.&amp;lt;/p&amp;gt;</description>
      <pubDate>Tue, 13 Apr 2021 12:28:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733291#M228512</guid>
      <dc:creator>kaziumair</dc:creator>
      <dc:date>2021-04-13T12:28:28Z</dc:date>
    </item>
    <item>
      <title>Re: extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733296#M228513</link>
      <description>&lt;P&gt;I modified the code, now it can also deal with lines where &amp;lt;p&amp;gt; is not at position 1 and &amp;lt;/p&amp;gt; not at the end of the line, or where both appear on one line (it does not handle cases where more than one paragraph appears on one input line):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
infile datalines truncover;
retain flag 0;
input line $200.;
pos = index(line,'&amp;lt;p&amp;gt;');
if pos
then do;
  flag = 1;
  line = substr(line,pos + 3);
end;
pos = index(line,'&amp;lt;/p');
if pos
then do;
  line = substr(line,1,pos - 1);
  output;
  flag = 0;
end;
else if flag then output;
drop flag pos;
datalines;
some uninteresting text
&amp;lt;p&amp;gt;Our
universal abuse of natural resources has created an imbalance in nature, contributing to the beginning of extinction. Over the last 100 years over 500 species have already gone extinct.
If we do not act now many more animals face extinction over the next 30 years including Orangutans, Rhinos, Polar Bears, Gorillas, Gibbons, Chimpanzees to name a few.&amp;lt;/p&amp;gt;
xxx&amp;lt;p&amp;gt;this is a test&amp;lt;/p&amp;gt;yyy
more uninteresting text
xxx&amp;lt;p&amp;gt;another
test&amp;lt;/p&amp;gt;zzz
even more uninteresting text
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Apr 2021 12:51:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733296#M228513</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-04-13T12:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733297#M228514</link>
      <description>&lt;P&gt;Then remove these html tags .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data want;
infile cards truncover;
input line $200.;
want=prxchange('s/&amp;lt;.+?&amp;gt;//o',-1,line);
datalines;
&amp;lt;p&amp;gt;Our
universal abuse of natural resources has created an imbalance in nature, contributing to the beginning of extinction. Over the last 100 years over 500 species have already gone extinct.
If we do not act now many more animals face extinction over the next 30 years including Orangutans, Rhinos, Polar Bears, Gorillas, Gibbons, Chimpanzees to name a few.&amp;lt;/p&amp;gt;
;

proc print;run;&lt;/PRE&gt;</description>
      <pubDate>Tue, 13 Apr 2021 12:45:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733297#M228514</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-04-13T12:45:06Z</dc:date>
    </item>
    <item>
      <title>Re: extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733300#M228515</link>
      <description>Thanks a lot, it worked</description>
      <pubDate>Tue, 13 Apr 2021 12:59:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733300#M228515</guid>
      <dc:creator>kaziumair</dc:creator>
      <dc:date>2021-04-13T12:59:17Z</dc:date>
    </item>
    <item>
      <title>Re: extract contents within html tags</title>
      <link>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733304#M228517</link>
      <description>Hi , thanks for taking out the time to help .</description>
      <pubDate>Tue, 13 Apr 2021 13:08:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/extract-contents-within-html-tags/m-p/733304#M228517</guid>
      <dc:creator>kaziumair</dc:creator>
      <dc:date>2021-04-13T13:08:23Z</dc:date>
    </item>
  </channel>
</rss>

