BookmarkSubscribeRSS Feed
ilikesas
Barite | Level 11

Hi,

 

I am trying to extract data from the Securities and Exchange Comission.

 

I wrote a code which brings me directly to a link and from there I want to extract the links to other data.

 

filename link url 
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=o&type=DEF+14&dateb=&owner=exclude&count=10";

data web;
infile link length=len lrecl=32767;
input line $varying32767. len;
p=find(line,'a href="Archives/edgar/data/');
if p then do;
output;
end;

run;

Once at the link, there will be 10 "Documents" buttons, and it is their link that I am trying to extract. But I get the error message that "the message received was unexpected or badly formatted". Is there a way to remedie this?

 

Thank you!

2 REPLIES 2
Reeza
Super User

SAS is probably the last tool I'd use for webscraping. 

 

Import.io is a free and easy to use tool. 

Selenium is another free and slightly more difficult to use tool. 

 

If you're on a Mac the built in Automator has several examples. 

 

DaveHorne
SAS Employee

I noticed on your SEC link that right at the top of the results table there is an RSS link.  This is essentially XML formatted data that is a bit easier to read into SAS (using the RSS link and passing in &count=100 for more data):

 

/* temp location for XML data */
filename resp temp;

/* get request from sec api */
proc http
 url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000726728&CIK=0000726728&type=DEF%2014%25&dateb=&owner=exclude&start=0&count=100&output=atom"
 method= "GET"
 out=resp;
run;

/* use automap with XML libname engine */
filename tempMap Temp;
libname sec xmlv2 xmlfileref=resp xmlmap=tempMap automap=replace;

/* copy data to work to view more details */
proc copy in=sec out=work;
run;

 

 

You can peruse the output files in WORK to pull out the pieces you may need (or join with the other tables).  A lot more information on using XML in SAS is here:  http://support.sas.com/rnd/base/xmlengine/  (my example just scratches the surface)

 

Also, a web search on "SEC API" points to other sources as well.

Hope this helps.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1670 views
  • 1 like
  • 3 in conversation