BookmarkSubscribeRSS Feed
VX_Xc
Calcite | Level 5

I need to extract messy data from the website. Could anyone recommend a good textbook that covers how to extract data efficiently from the web, plz?

Thank you.

4 REPLIES 4
art297
Opal | Level 21

Seunghoon,

Do you mean automatically or as in copy/paste?  If it is the latter, I'll be doing an SGF presentation on the topic in April, titled 'Copy and Paste Almost Anything'.  I already presented a draft of the paper at one of my local user group meetings and you can find it at:

http://torsas.ca/page18.php

HTH,

Art

VX_Xc
Calcite | Level 5

I meant automatically. For example I would like to learn PROC (with many optional statements) that extracts data from the HTML file if I give it a address of a website or .html file directory.

maybe there isn't one? Then I would have to use DATA steps with a lot of @<tag> arguments in INPUT statement, which would not be very practical.

But thanks for the link. I will have a look, looks promising.

art297
Opal | Level 21

Then you want to look into proc html and proc soap.  Do a search on the discussion forums for either.  If you include my id or friedeggs id in the search, I'm sure that will help to eliminate much of the noise.

Ksharp
Super User

Yes. You can do it.

filename x url 'http://www.sas.com';
data want(where=(line is not missing));
infile x dsd dlm='<>' lrecl=32767;
input @ '>' line : $400. @@;
run;


Ksharp

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 819 views
  • 3 likes
  • 3 in conversation