Thanks for your help,
You may need to be a bit more detailed with your question.
Does the program example runs for the given site and get the expected, or at least useable, data?
Are you asking on how to modify this program to access other sites? with different keywords?
Please not that attempting to read PDF files is likely to be a less-than-joyous experience. So are you attempting to down load PDFs? Or Html?
Your first other URL shows a page that implies it is expecting some kind of query, so you likley need to change the URL but I a have no clue to what.
The program works for the given websitethat is currently in the program. After running the program, it can be seen that each chapter comes up with the word that one is looking for. I would want to access other websites with the same key words. The key words can be changed but the problem is whenever the websites are change it does not work as well. It is mainly html. I already have a program that scan PDFs.
I've published some general guidance about scraping data from web pages with SAS in this blog post.
While your program is good and works well with the one style of page that you designed it for, it's a big challenge to build something that works for every web site out there. The diversity of web pages and how they are produced (HTML, Javascript, DIV tags vs TABLE tags, etc.) is immense.
Others have written papers on the topic:
SAS Text Miner (as @Patrick mentioned) has a built-in capability for crawling web sites with the %TMFILTER macro - and is designed to be more robust, with safeguards for performance and web-crawling etiquette.
Ideally you'd have the SAS Text Analytics bundle licensed as this would give you everything you need (and more).
I'm sure there are ways to do everything in Foundation SAS (eventually with the help of calling some 3rd party tools out of SAS like Tika) but I'd assume it's going to cost you a lot of effort to get it right and every change to your sources will cause you a lot of additional work.
If you don't have access to SAS Text Analytics or at least some of it's sub-components like Web Crawler then consider to look into using Python for at least the data retrieval and data prep part of your task.
Python is an open source programming environment which integrates quite well with SAS (and it will integrate even better in future releases).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.