BookmarkSubscribeRSS Feed
ilikesas
Barite | Level 11

Hi,

 

I am trying to extract data from the Securities and Exchange Comission.

 

I wrote a code which brings me directly to a link and from there I want to extract the links to other data.

 

filename link url 
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=o&type=DEF+14&dateb=&owner=exclude&count=10";

data web;
infile link length=len lrecl=32767;
input line $varying32767. len;
p=find(line,'a href="Archives/edgar/data/');
if p then do;
output;
end;

run;

Once at the link, there will be 10 "Documents" buttons, and it is their link that I am trying to extract. But I get the error message that "the message received was unexpected or badly formatted". Is there a way to remedie this?

 

Thank you!

2 REPLIES 2
Reeza
Super User

SAS is probably the last tool I'd use for webscraping. 

 

Import.io is a free and easy to use tool. 

Selenium is another free and slightly more difficult to use tool. 

 

If you're on a Mac the built in Automator has several examples. 

 

DaveHorne
SAS Employee

I noticed on your SEC link that right at the top of the results table there is an RSS link.  This is essentially XML formatted data that is a bit easier to read into SAS (using the RSS link and passing in &count=100 for more data):

 

/* temp location for XML data */
filename resp temp;

/* get request from sec api */
proc http
 url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000726728&CIK=0000726728&type=DEF%2014%25&dateb=&owner=exclude&start=0&count=100&output=atom"
 method= "GET"
 out=resp;
run;

/* use automap with XML libname engine */
filename tempMap Temp;
libname sec xmlv2 xmlfileref=resp xmlmap=tempMap automap=replace;

/* copy data to work to view more details */
proc copy in=sec out=work;
run;

 

 

You can peruse the output files in WORK to pull out the pieces you may need (or join with the other tables).  A lot more information on using XML in SAS is here:  http://support.sas.com/rnd/base/xmlengine/  (my example just scratches the surface)

 

Also, a web search on "SEC API" points to other sources as well.

Hope this helps.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1118 views
  • 1 like
  • 3 in conversation