DATA Step, Macro, Functions and more

webscraping the SEC

Super Contributor
Posts: 459

webscraping the SEC



I am trying to extract data from the Securities and Exchange Comission.


I wrote a code which brings me directly to a link and from there I want to extract the links to other data.


filename link url 

data web;
infile link length=len lrecl=32767;
input line $varying32767. len;
p=find(line,'a href="Archives/edgar/data/');
if p then do;


Once at the link, there will be 10 "Documents" buttons, and it is their link that I am trying to extract. But I get the error message that "the message received was unexpected or badly formatted". Is there a way to remedie this?


Thank you!

Super User
Posts: 23,980

Re: webscraping the SEC

SAS is probably the last tool I'd use for webscraping. is a free and easy to use tool. 

Selenium is another free and slightly more difficult to use tool. 


If you're on a Mac the built in Automator has several examples. 


SAS Employee
Posts: 22

Re: webscraping the SEC

[ Edited ]

I noticed on your SEC link that right at the top of the results table there is an RSS link.  This is essentially XML formatted data that is a bit easier to read into SAS (using the RSS link and passing in &count=100 for more data):


/* temp location for XML data */
filename resp temp;

/* get request from sec api */
proc http
 method= "GET"

/* use automap with XML libname engine */
filename tempMap Temp;
libname sec xmlv2 xmlfileref=resp xmlmap=tempMap automap=replace;

/* copy data to work to view more details */
proc copy in=sec out=work;



You can peruse the output files in WORK to pull out the pieces you may need (or join with the other tables).  A lot more information on using XML in SAS is here:  (my example just scratches the surface)


Also, a web search on "SEC API" points to other sources as well.

Hope this helps.

Ask a Question
Discussion stats
  • 2 replies
  • 1 like
  • 3 in conversation