<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: webscraping the SEC in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/347501#M80283</link>
    <description>&lt;P&gt;I noticed on your SEC link that right at the top of the results table there is an RSS link.&amp;nbsp; This is essentially XML formatted data that is a bit easier to read into SAS (using the RSS link and passing in &amp;amp;count=100 for more data):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;/* temp location for XML data */
filename resp temp;

/* get request from sec api */
proc http
 url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;amp;CIK=0000726728&amp;amp;CIK=0000726728&amp;amp;type=DEF%2014%25&amp;amp;dateb=&amp;amp;owner=exclude&amp;amp;start=0&amp;amp;count=100&amp;amp;output=atom"
 method= "GET"
 out=resp;
run;

/* use automap with XML libname engine */
filename tempMap Temp;
libname sec xmlv2 xmlfileref=resp xmlmap=tempMap automap=replace;

/* copy data to work to view more details */
proc copy in=sec out=work;
run;
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;You can peruse the output files in WORK to pull out the pieces you may need (or join with the other tables).&amp;nbsp; A lot more information on using XML in SAS is here:&amp;nbsp; &lt;A href="http://support.sas.com/rnd/base/xmlengine/" target="_blank"&gt;http://support.sas.com/rnd/base/xmlengine/&amp;nbsp;&lt;/A&gt; (my example just scratches the surface)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, a web search on "SEC API" points to other sources as well.&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
    <pubDate>Wed, 05 Apr 2017 20:07:51 GMT</pubDate>
    <dc:creator>DaveHorne</dc:creator>
    <dc:date>2017-04-05T20:07:51Z</dc:date>
    <item>
      <title>webscraping the SEC</title>
      <link>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/346884#M80034</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am trying to extract data from the Securities and Exchange Comission.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wrote a code which brings me directly to a link and from there I want to extract the links to other data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;filename link url 
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;amp;CIK=o&amp;amp;type=DEF+14&amp;amp;dateb=&amp;amp;owner=exclude&amp;amp;count=10";

data web;
infile link length=len lrecl=32767;
input line $varying32767. len;
p=find(line,'a href="Archives/edgar/data/');
if p then do;
output;
end;

run;
&lt;/PRE&gt;
&lt;P&gt;Once at the link, there will be 10 "Documents" buttons, and it is their link that I am trying to extract. But I get the error message that "the message received was unexpected or badly formatted". Is there a way to remedie this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 04 Apr 2017 01:49:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/346884#M80034</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2017-04-04T01:49:10Z</dc:date>
    </item>
    <item>
      <title>Re: webscraping the SEC</title>
      <link>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/346888#M80035</link>
      <description>&lt;P&gt;SAS is probably the last tool I'd use for webscraping.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Import.io is a free and easy to use tool.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Selenium is another free and slightly more difficult to use tool.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you're on a Mac the built in Automator has several examples.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Apr 2017 01:58:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/346888#M80035</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-04-04T01:58:15Z</dc:date>
    </item>
    <item>
      <title>Re: webscraping the SEC</title>
      <link>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/347501#M80283</link>
      <description>&lt;P&gt;I noticed on your SEC link that right at the top of the results table there is an RSS link.&amp;nbsp; This is essentially XML formatted data that is a bit easier to read into SAS (using the RSS link and passing in &amp;amp;count=100 for more data):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;/* temp location for XML data */
filename resp temp;

/* get request from sec api */
proc http
 url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;amp;CIK=0000726728&amp;amp;CIK=0000726728&amp;amp;type=DEF%2014%25&amp;amp;dateb=&amp;amp;owner=exclude&amp;amp;start=0&amp;amp;count=100&amp;amp;output=atom"
 method= "GET"
 out=resp;
run;

/* use automap with XML libname engine */
filename tempMap Temp;
libname sec xmlv2 xmlfileref=resp xmlmap=tempMap automap=replace;

/* copy data to work to view more details */
proc copy in=sec out=work;
run;
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;You can peruse the output files in WORK to pull out the pieces you may need (or join with the other tables).&amp;nbsp; A lot more information on using XML in SAS is here:&amp;nbsp; &lt;A href="http://support.sas.com/rnd/base/xmlengine/" target="_blank"&gt;http://support.sas.com/rnd/base/xmlengine/&amp;nbsp;&lt;/A&gt; (my example just scratches the surface)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, a web search on "SEC API" points to other sources as well.&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2017 20:07:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/webscraping-the-SEC/m-p/347501#M80283</guid>
      <dc:creator>DaveHorne</dc:creator>
      <dc:date>2017-04-05T20:07:51Z</dc:date>
    </item>
  </channel>
</rss>

