11-17-2016 12:11 PM - edited 11-17-2016 12:35 PM
Thanks for your help,
11-18-2016 01:51 PM
You may need to be a bit more detailed with your question.
Does the program example runs for the given site and get the expected, or at least useable, data?
Are you asking on how to modify this program to access other sites? with different keywords?
Please not that attempting to read PDF files is likely to be a less-than-joyous experience. So are you attempting to down load PDFs? Or Html?
Your first other URL shows a page that implies it is expecting some kind of query, so you likley need to change the URL but I a have no clue to what.
11-18-2016 02:15 PM
The program works for the given websitethat is currently in the program. After running the program, it can be seen that each chapter comes up with the word that one is looking for. I would want to access other websites with the same key words. The key words can be changed but the problem is whenever the websites are change it does not work as well. It is mainly html. I already have a program that scan PDFs.
12-22-2017 09:21 AM
I've published some general guidance about scraping data from web pages with SAS in this blog post.
Others have written papers on the topic:
SAS Text Miner (as @Patrick mentioned) has a built-in capability for crawling web sites with the %TMFILTER macro - and is designed to be more robust, with safeguards for performance and web-crawling etiquette.
11-18-2016 10:22 PM - edited 11-18-2016 10:24 PM
Ideally you'd have the SAS Text Analytics bundle licensed as this would give you everything you need (and more).
I'm sure there are ways to do everything in Foundation SAS (eventually with the help of calling some 3rd party tools out of SAS like Tika) but I'd assume it's going to cost you a lot of effort to get it right and every change to your sources will cause you a lot of additional work.
If you don't have access to SAS Text Analytics or at least some of it's sub-components like Web Crawler then consider to look into using Python for at least the data retrieval and data prep part of your task.
Python is an open source programming environment which integrates quite well with SAS (and it will integrate even better in future releases).