BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BillJones
Calcite | Level 5

Hello,

I'm trying to use SAS to acquire data from the internet.  More specifically, I'm interested in financial data on publicly traded companies.  I have code which downloads the financial information.  This part works fine.  The issue is that the data may span across several observations.  My plan to is tag all the observations with relevant data and then use proc transpose to combine the multiple observations into one observation where I can use nested if statements and substrate/index functions to extract the information.  I'm attempting to use a DO UNTIL loop to mark the records with relevant information, but can't get it to work properly.  My program crashes whenever the data spans across more than one observation.  Please see my code below.  Does anyone have an idea of what I'm doing wrong?  Any help or advice would be greatly appreciated.  Thanks so much!


Regards,

Bill

*DOWNLOAD FINANCIAL DATA;

filename in url 'http://finance.yahoo.com/q/bs?s=AAPL' debug lrecl=8192;

DATA OUT.YAHOO_HTML;

infile in length=len;

input record $varying8192. len;

RUN;

*PARSE HTML;

%MACRO BAL_SHEET_ITEM(YAHOO_NAME,SAS_NAME);

DATA OUT.&SAS_NAME.1;

SET OUT.YAHOO_HTML;

SYMBOL='AAPL';

*LIMIT DATA SET TO RECORDS WITH RELEVANT INFO;

KEEPDATA = 0;

%LET BEG_DATA=INDEX(record,&YAHOO_NAME.); /* BEGINNING OF DATA*/

%LET END_DATA=index(record,' </td></tr><tr>');/*END OF DATA*/

IF &BEG_DATA. > 0 THEN DO UNTIL (&END_DATA.>0);

KEEPDATA=1;

END;

IF KEEPDATA=1;

RUN;

%MEND;

%BAL_SHEET_ITEM('Cash And Cash Equivalents',CASH_AND_EQUIV);

%BAL_SHEET_ITEM('Inventory',INVENTORY);

%BAL_SHEET_ITEM('Short Term Investments',SHORT_TERM_ASSETS);

%BAL_SHEET_ITEM('Net Receivables',NET_REC);

%BAL_SHEET_ITEM('Inventory',INVENTORY);

%BAL_SHEET_ITEM('Other Current Assets',OTH_CURR_ASSETS);

%BAL_SHEET_ITEM('Accounts Payable',ACCTS_PAYABLE);

%BAL_SHEET_ITEM('Short/Current Long Term Debt',SHORT_CURR_LT_DEBT);

%BAL_SHEET_ITEM('Other Current Liabilities',OTH_CURR_LIAB);

%BAL_SHEET_ITEM('Long Term Debt',LT_DEBT);

%BAL_SHEET_ITEM('Other Liabilities',OTH_LIAB);

%BAL_SHEET_ITEM('Deferred Long Term Liability Charges',DEF_LT_LIAB);

%BAL_SHEET_ITEM('Minority Interest',MIN_INT);

%BAL_SHEET_ITEM('Negative Goodwill',NEG_GOODWILL);

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why not let SAS input statement handle it for you instead of stuffing the html into a character variable?

For example to get the column headers you could use something like:

data headers ;

infile in flowover ;

input @'Period Ending' @;

array datestr(4) $12 ;

do col=1 to 4;

   input @ '<b>'  datestr(col) $12. @;

end;

run;

If the problem is that key text fields like "Period Ending" is flowed across rows then perhaps you can pull in the file and write it back out without the end of line breaks.

You can begin to automate it by making the SYMBOL used in the URL a macro variable.

For the numbers you might need to first read in the value as a character string and then parse it to eliminate '&nbsp' or other HTML codes.

%let symbol=AAPL;

filename in url "http://finance.yahoo.com:80/q/bs?s=&symbol" lrecl=8192;

data cash ;

infile in flowover ;

symbol="&symbol";

input @'Cash And Cash Equivalents' @;

array cash (4) ;

do col=1 to 4;

   input @ '<td align="right">'  string $12. @;

   cash(col)=input(scan(string,1,'&<'),??comma12.);

end;

output;

stop;

drop col string;

run;

data _null_;

set ;

put (_all_) (=);

run;

symbol=AAPL cash1=12053000 cash2=10746000 cash3=27084000 cash4=18383000

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

Why not let SAS input statement handle it for you instead of stuffing the html into a character variable?

For example to get the column headers you could use something like:

data headers ;

infile in flowover ;

input @'Period Ending' @;

array datestr(4) $12 ;

do col=1 to 4;

   input @ '<b>'  datestr(col) $12. @;

end;

run;

If the problem is that key text fields like "Period Ending" is flowed across rows then perhaps you can pull in the file and write it back out without the end of line breaks.

You can begin to automate it by making the SYMBOL used in the URL a macro variable.

For the numbers you might need to first read in the value as a character string and then parse it to eliminate '&nbsp' or other HTML codes.

%let symbol=AAPL;

filename in url "http://finance.yahoo.com:80/q/bs?s=&symbol" lrecl=8192;

data cash ;

infile in flowover ;

symbol="&symbol";

input @'Cash And Cash Equivalents' @;

array cash (4) ;

do col=1 to 4;

   input @ '<td align="right">'  string $12. @;

   cash(col)=input(scan(string,1,'&<'),??comma12.);

end;

output;

stop;

drop col string;

run;

data _null_;

set ;

put (_all_) (=);

run;

symbol=AAPL cash1=12053000 cash2=10746000 cash3=27084000 cash4=18383000

BillJones
Calcite | Level 5

Tom,

Wow, thank you so much.  The code works great.  Your solution is way more efficient than what I had in mind.  Evidently, I must have sustained a severe head injury prior to thinking about this problem???  I'm busy modifying the code to combine and then transpose the data with the purpose of creating one table that has the symbol, balance sheet items, and financial statement dates.

Thanks again for your assistance.

-Bill

BillJones
Calcite | Level 5

Quick follow-up question: one of the balance sheet items (Long Term Debt) is completely contained in another balance sheet item (Short/Current Long Term Debt).  I want both of these variables.  Whenever I run the code the code below, I cannot download the values for 'Long Term Debt' since 'Short/Current Long Term Debt' comes first in the HTML.  The code pulls in the values for 'Short/Current Long Term Debt' instead of 'Long Term Debt'.  Does anyone have any suggestions on how to handle this type of situation?  Is there an option in the INPUT statement that will allow me to select the 2nd instance of a phrase?  Any help would be greatly appreciated.


Thank you!

Bill


%let symbol=AAPL;

filename in url "http://finance.yahoo.com:80/q/bs?s=&symbol" lrecl=8192;

data cash ;

infile in flowover ;

symbol="&symbol";

input @'Long Term Debt' @;

array cash (4) ;

do col=1 to 4;

   input @ '<td align="right">'  string $12. @;

   cash(col)=input(scan(string,1,'&<'),??comma12.);

end;

output;

stop;

drop col string;

run;

Peter_C
Rhodochrosite | Level 12

Repeat that INPUT @'Long Term ' before loading the variables

Tom
Super User Tom
Super User

Be more precise in what you search for?  I see this in the source for AAPL.

>Long Term Debt<

>Short/Current Long Term Debt<


So if you search for them with he leading > (from the <td> tag) then it will be unique.

BillJones
Calcite | Level 5

Thanks for the comments. 

Tom K. - Good to know.  For the webpage in question, the record length of 8192 appears to be working fine.  I'll keep your suggestion in mind when I start to branch out and look at other webpages.

Tom - I like your solution.  Currently I use an IF statement to determine when to repeat the input command.  I'll change my code to be more precise in my search which should improve execution.

Cheers,

Bill

BillJones
Calcite | Level 5

Peter,

Thank you for the suggestion.  It solved the problem.

Regards,

Bill

TomKari
Onyx | Level 15

One other caution. It is definitely possible for HTML to exceed 8192 bytes. Whenever I pull it in, I stuff it into a 32767 character field, and then reduce it once I know what the max record length is.

Tom (the other Tom, that is!)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5084 views
  • 0 likes
  • 4 in conversation