<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to Extract information between line(s) from HTML in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321211#M70876</link>
    <description>&lt;P&gt;Hello everybody,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to extract infomation from a website. As I am very new to SAS, I don't know how to get information/paragraph between lines. I've been thinking of this for sever days already T.T . Please help!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question 1:&lt;/STRONG&gt;&amp;nbsp;to get usertitle&amp;nbsp;information between&amp;nbsp;&amp;lt;span class="usertitle"&amp;gt; and&amp;nbsp;&lt;SPAN&gt;&amp;lt;/span&amp;gt;, and also to specifiy if such information is missing when there is nothing&amp;nbsp;between &amp;lt;span class="usertitle"&amp;gt; and&amp;lt; span style="font-weight: ).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;BR /&gt;Member&lt;BR /&gt;&amp;lt;/span&amp;gt;&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;BR /&gt;Junior Member&lt;BR /&gt;&amp;lt;/span&amp;gt;&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;BR /&gt;&amp;lt;span style="font-weight: bold; color: black;"&amp;gt;Not your guy, fwiend...&amp;lt;/span&amp;gt;&lt;BR /&gt;&amp;lt;/span&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&amp;nbsp;2:&lt;/STRONG&gt; to extract replied contents between&amp;nbsp;&lt;SPAN&gt;&amp;lt;blockquote class="postcontent restore "&amp;gt; and&amp;nbsp;&amp;lt;/blockquote&amp;gt;, and to delete&amp;nbsp;&amp;lt;br /&amp;gt; in the output.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;lt;blockquote class="postcontent restore "&amp;gt;&lt;BR /&gt;replied content line 1 -- omit details here for brevity&amp;lt;br /&amp;gt;&lt;BR /&gt;&amp;lt;br /&amp;gt;&lt;BR /&gt;&lt;SPAN&gt;replied content line 2&amp;nbsp;-- omit details here for brevity&lt;/SPAN&gt;.&amp;lt;br /&amp;gt;&lt;BR /&gt;&amp;lt;br /&amp;gt;&lt;BR /&gt;&lt;SPAN&gt;replied content line 3&amp;nbsp;-- omit details here for brevity&lt;/SPAN&gt;.&lt;BR /&gt;&amp;lt;/blockquote&amp;gt;&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 26 Dec 2016 22:00:38 GMT</pubDate>
    <dc:creator>may0423</dc:creator>
    <dc:date>2016-12-26T22:00:38Z</dc:date>
    <item>
      <title>How to Extract information between line(s) from HTML</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321211#M70876</link>
      <description>&lt;P&gt;Hello everybody,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to extract infomation from a website. As I am very new to SAS, I don't know how to get information/paragraph between lines. I've been thinking of this for sever days already T.T . Please help!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question 1:&lt;/STRONG&gt;&amp;nbsp;to get usertitle&amp;nbsp;information between&amp;nbsp;&amp;lt;span class="usertitle"&amp;gt; and&amp;nbsp;&lt;SPAN&gt;&amp;lt;/span&amp;gt;, and also to specifiy if such information is missing when there is nothing&amp;nbsp;between &amp;lt;span class="usertitle"&amp;gt; and&amp;lt; span style="font-weight: ).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;BR /&gt;Member&lt;BR /&gt;&amp;lt;/span&amp;gt;&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;BR /&gt;Junior Member&lt;BR /&gt;&amp;lt;/span&amp;gt;&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;BR /&gt;&amp;lt;span style="font-weight: bold; color: black;"&amp;gt;Not your guy, fwiend...&amp;lt;/span&amp;gt;&lt;BR /&gt;&amp;lt;/span&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&amp;nbsp;2:&lt;/STRONG&gt; to extract replied contents between&amp;nbsp;&lt;SPAN&gt;&amp;lt;blockquote class="postcontent restore "&amp;gt; and&amp;nbsp;&amp;lt;/blockquote&amp;gt;, and to delete&amp;nbsp;&amp;lt;br /&amp;gt; in the output.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;lt;blockquote class="postcontent restore "&amp;gt;&lt;BR /&gt;replied content line 1 -- omit details here for brevity&amp;lt;br /&amp;gt;&lt;BR /&gt;&amp;lt;br /&amp;gt;&lt;BR /&gt;&lt;SPAN&gt;replied content line 2&amp;nbsp;-- omit details here for brevity&lt;/SPAN&gt;.&amp;lt;br /&amp;gt;&lt;BR /&gt;&amp;lt;br /&amp;gt;&lt;BR /&gt;&lt;SPAN&gt;replied content line 3&amp;nbsp;-- omit details here for brevity&lt;/SPAN&gt;.&lt;BR /&gt;&amp;lt;/blockquote&amp;gt;&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2016 22:00:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321211#M70876</guid>
      <dc:creator>may0423</dc:creator>
      <dc:date>2016-12-26T22:00:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract information between line(s) from HTML</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321213#M70878</link>
      <description>&lt;P&gt;You could read line-by-line and look for '&amp;lt;span class="usertitle"&amp;gt;', or you could use the INPUT statement to do it for you as in&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;input&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; @ &lt;/FONT&gt;&lt;FONT color="#800080" face="Sasfont"&gt;'&amp;lt;span class="usertitle"&amp;gt;'&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; / _line_ :&amp;amp;&lt;/FONT&gt;&lt;FONT color="#008080" face="Sasfont"&gt;$200.&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; ;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The&amp;nbsp;&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;@ '&amp;lt;span class="usertitle"&amp;gt;'&lt;/STRONG&gt;&lt;/EM&gt; says to look for the specified string, even if it goes over serveral lines.&lt;/LI&gt;
&lt;LI&gt;The '/' means skip to next line.&lt;/LI&gt;
&lt;LI&gt;The remainder says to read in a character variable named _LINE_ of up to 200 chacracters (and the '&amp;amp;' means don't stop before 200 characters if you encounter interior single blanks in the line - so you get "Junior Member" instead of just "Junior").&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Then&amp;nbsp;all you have to do is check the contents of the _LINE_ variable for the unwanted markup, and assign USERTITLE accordingly, as below (note the "=:" relation is different from the ordinary "=".&amp;nbsp; It means compare only the first X characters, where X is the length of the shorter character value)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also you'll be reading from an external file, so use the INFILE statement to point the INPUT operation to the right source.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Sasfont"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; want;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;&amp;nbsp; length&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; usertitle $&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Sasfont"&gt;20&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Sasfont"&gt; ;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;&amp;nbsp; input&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; @ &lt;/FONT&gt;&lt;FONT color="#800080" face="Sasfont"&gt;'&amp;lt;span class="usertitle"&amp;gt;'&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; / _line_ :&amp;amp;&lt;/FONT&gt;&lt;FONT color="#008080" face="Sasfont"&gt;$200.&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; ;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;&amp;nbsp; if&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; not (_line_ =: &lt;/FONT&gt;&lt;FONT color="#800080" face="Sasfont"&gt;'&amp;lt;span style="font-weight:'&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt;) &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;then&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; usertitle=_line_;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;&amp;nbsp; drop&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt; _line_;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Sasfont"&gt;datalines4&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Member&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;/span&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Junior Member&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;/span&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;span class="usertitle"&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;span style="font-weight: bold; color: black;"&amp;gt; Not your guy, fwiend...&amp;lt;/span&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;/span&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;;;;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Sasfont"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Sasfont"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That solves your first request.&amp;nbsp; And you can use the same tools to begin solving the second.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2016 22:41:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321213#M70878</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2016-12-26T22:41:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract information between line(s) from HTML</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321215#M70880</link>
      <description>&lt;P&gt;Hi mkeintz,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for the quick response. What if my data is from&amp;nbsp;an url where there are 25 usertitles? The&amp;nbsp;datalines4; seems not work for me.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2016 23:27:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321215#M70880</guid>
      <dc:creator>may0423</dc:creator>
      <dc:date>2016-12-26T23:27:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract information between line(s) from HTML</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321217#M70881</link>
      <description>&lt;P&gt;I adjusted a little bit. Works now!!!!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much&amp;nbsp;&lt;SPAN class="login-bold"&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461" target="_self"&gt;mkeintz&lt;/A&gt;&amp;nbsp;&lt;img id="smileyvery-happy" class="emoticon emoticon-smileyvery-happy" src="https://communities.sas.com/i/smilies/16x16_smiley-very-happy.png" alt="Smiley Very Happy" title="Smiley Very Happy" /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Dec 2016 00:21:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321217#M70881</guid>
      <dc:creator>may0423</dc:creator>
      <dc:date>2016-12-27T00:21:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to Extract information between line(s) from HTML</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321218#M70882</link>
      <description>&lt;P&gt;When SAS reads data from a series of lines directly following the data step program (rather than from an external file), the DATALINES statement is needed to tell SAS that the program code is ended and the data is about to start.&amp;nbsp; I should have told you that when you read from an external file, the datalines statement is not needed.&amp;nbsp;&amp;nbsp; The reason it's DATALINES4 rather then DATALINES is because otherwise SAS will take the first semicolon in the data to indicate end-of-data.&amp;nbsp; DATALINES4 tells SAS that 4 consecutive semicolons are required to indicate end of data.&amp;nbsp; (So you can drop the line of 4 semicolons also).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Dec 2016 01:33:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-Extract-information-between-line-s-from-HTML/m-p/321218#M70882</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2016-12-27T01:33:24Z</dc:date>
    </item>
  </channel>
</rss>

