DATA Step, Macro, Functions and more

webscraping the SEC

Reply
Super Contributor
Posts: 413

webscraping the SEC

Hi,

 

I am trying to extract data from the Securities and Exchange Comission.

 

I wrote a code which brings me directly to a link and from there I want to extract the links to other data.

 

filename link url 
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=o&type=DEF+14&dateb=&owner=exclude&count=10";

data web;
infile link length=len lrecl=32767;
input line $varying32767. len;
p=find(line,'a href="Archives/edgar/data/');
if p then do;
output;
end;

run;

Once at the link, there will be 10 "Documents" buttons, and it is their link that I am trying to extract. But I get the error message that "the message received was unexpected or badly formatted". Is there a way to remedie this?

 

Thank you!

Super User
Posts: 17,868

Re: webscraping the SEC

SAS is probably the last tool I'd use for webscraping. 

 

Import.io is a free and easy to use tool. 

Selenium is another free and slightly more difficult to use tool. 

 

If you're on a Mac the built in Automator has several examples. 

 

SAS Employee
Posts: 9

Re: webscraping the SEC

[ Edited ]

I noticed on your SEC link that right at the top of the results table there is an RSS link.  This is essentially XML formatted data that is a bit easier to read into SAS (using the RSS link and passing in &count=100 for more data):

 

/* temp location for XML data */
filename resp temp;

/* get request from sec api */
proc http
 url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000726728&CIK=0000726728&type=DEF%2014%25&dateb=&owner=exclude&start=0&count=100&output=atom"
 method= "GET"
 out=resp;
run;

/* use automap with XML libname engine */
filename tempMap Temp;
libname sec xmlv2 xmlfileref=resp xmlmap=tempMap automap=replace;

/* copy data to work to view more details */
proc copy in=sec out=work;
run;

 

 

You can peruse the output files in WORK to pull out the pieces you may need (or join with the other tables).  A lot more information on using XML in SAS is here:  http://support.sas.com/rnd/base/xmlengine/  (my example just scratches the surface)

 

Also, a web search on "SEC API" points to other sources as well.

Hope this helps.

Ask a Question
Discussion stats
  • 2 replies
  • 148 views
  • 1 like
  • 3 in conversation