Help using Base SAS procedures

Good text book on extracting data from HTML(Web)

Reply
Contributor
Posts: 53

Good text book on extracting data from HTML(Web)

I need to extract messy data from the website. Could anyone recommend a good textbook that covers how to extract data efficiently from the web, plz?

Thank you.

PROC Star
Posts: 7,357

Good text book on extracting data from HTML(Web)

Seunghoon,

Do you mean automatically or as in copy/paste?  If it is the latter, I'll be doing an SGF presentation on the topic in April, titled 'Copy and Paste Almost Anything'.  I already presented a draft of the paper at one of my local user group meetings and you can find it at:

http://torsas.ca/page18.php

HTH,

Art

Contributor
Posts: 53

Good text book on extracting data from HTML(Web)

I meant automatically. For example I would like to learn PROC (with many optional statements) that extracts data from the HTML file if I give it a address of a website or .html file directory.

maybe there isn't one? Then I would have to use DATA steps with a lot of @<tag> arguments in INPUT statement, which would not be very practical.

But thanks for the link. I will have a look, looks promising.

PROC Star
Posts: 7,357

Good text book on extracting data from HTML(Web)

Then you want to look into proc html and proc soap.  Do a search on the discussion forums for either.  If you include my id or friedeggs id in the search, I'm sure that will help to eliminate much of the noise.

Super User
Posts: 9,671

Good text book on extracting data from HTML(Web)

Yes. You can do it.

filename x url 'http://www.sas.com';
data want(where=(line is not missing));
infile x dsd dlm='<>' lrecl=32767;
input @ '>' line : $400. @@;
run;


Ksharp

Ask a Question
Discussion stats
  • 4 replies
  • 164 views
  • 3 likes
  • 3 in conversation