- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone , I am trying to scrape a website which requires subscription . I am using proc http to extract source html code and then using SAS character functions to extract the information I require , however I am getting email not verified in the source html code for this particular website . Attaching the proc http code I am using below .
filename dest "location";
proc http
url = "https://www.pressreader.com/catalog"
out = dest
method = "GET"
webusername="XXXX"
webpassword="XXXX"
auth_basic;
run;
.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some sites are designed to be interactive and will provide content only in a browser that runs Javascript, serving content when a user is browsing the page.
Also, this site supports "native" accounts as well as social sign-in. If you used a social account like Facebook or Google, then those credentials would likely not work from a script like PROC HTTP or cURL.
It appears that PressReader.com offers an API. This would be a much more reliable method for pulling data from the site. Requires an API account to get a token, but not sure if there is a cost. For information about using PROC HTTP with APIs like this, see this Ask the Expert session.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris ,
Thanks for your help , I will definitely check the API. In the code , I have used native account credentials maybe the website is interactive and requires user browsing as you suggested.
Could you please confirm whether the code I used is the correct way of using proc http for websites that require login credentials?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that's the correct method for basic authentication (user / password). However, many websites use other types of authentication including OAuth or some other token, and providing your user/pw is just a way to get that token, which the website will manage automatically for further browsing/requests.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content