topic Good text book on extracting data from HTML(Web) in SAS Procedures

Good text book on extracting data from HTML(Web)

VX_Xc — Wed, 11 Jan 2012 02:10:31 GMT

I need to extract messy data from the website. Could anyone recommend a good textbook that covers how to extract data efficiently from the web, plz?

Thank you.

Good text book on extracting data from HTML(Web)

art297 — Wed, 11 Jan 2012 03:48:14 GMT

Seunghoon,

Do you mean automatically or as in copy/paste? If it is the latter, I'll be doing an SGF presentation on the topic in April, titled 'Copy and Paste Almost Anything'. I already presented a draft of the paper at one of my local user group meetings and you can find it at:

http://torsas.ca/page18.php

HTH,

Art

Good text book on extracting data from HTML(Web)

VX_Xc — Wed, 11 Jan 2012 04:05:35 GMT

I meant automatically. For example I would like to learn PROC (with many optional statements) that extracts data from the HTML file if I give it a address of a website or .html file directory.

maybe there isn't one? Then I would have to use DATA steps with a lot of @<tag> arguments in INPUT statement, which would not be very practical.

But thanks for the link. I will have a look, looks promising.

Good text book on extracting data from HTML(Web)

art297 — Wed, 11 Jan 2012 04:08:28 GMT

Then you want to look into proc html and proc soap. Do a search on the discussion forums for either. If you include my id or friedeggs id in the search, I'm sure that will help to eliminate much of the noise.

Good text book on extracting data from HTML(Web)

Ksharp — Wed, 11 Jan 2012 05:06:45 GMT

Yes. You can do it.

filename x url 'http://www.sas.com';
data want(where=(line is not missing));
infile x dsd dlm='<>' lrecl=32767;
input @ '>' line : $400. @@;
run;

Ksharp