- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 07-15-2021 09:54 AM
(1578 views)
Hi everyone ,
I am trying to scrape "https://www.corruptionwatch.org.za/news-views/" website . However my access is getting blocked.
I have attached the proc http code I used to fetch the HTML Code and the extracted HTML for your reference below.
Please suggest a solution to resolve this problem?
filename extract "Location/corruptionwatch.txt";
proc http
method="GET"
out=extract
url="https://www.corruptionwatch.org.za/news-views/";
run;
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="https://cdn.sucuri.net/sucuri-firewall-block.css" />
<section class="center clearfix">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sucuri WebSite Firewall - Access Denied</title>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700" rel="stylesheet" type="text/css">
</head>
<body>
<div id="main-container">
<header class="app-header clearfix">
<div class="wrap">
<a href="https://www.sucuri.net/?utm_source=firewall_block" class="logo"></a>
<span class="logo-neartext">Website Firewall</span>
<a href="https://sucuri.net/?utm_source=firewall_block" class="site-link">Back to sucuri.net</a>
</div>
</header>
<section class="app-content access-denied clearfix"><div class="box center width-max-940"><h1 class="brand-font font-size-xtra no-margin"><i class="icon-circle-red"></i>Access Denied - Sucuri Website Firewall</h1>
<p class="medium-text code-snippet">If you are the site owner (or you manage this site), please whitelist your IP or if you think this block is an error please <a href="https://support.sucuri.net/?utm_source=firewall_block" class="color-green underline">open a support ticket</a> and make sure to include the block details (displayed in the box below), so we can assist you in troubleshooting the issue. </p><h2>Block details:</h1>
<table class="property-table overflow-break-all line-height-16">
<tr>
<td>Your IP:</td>
<td><span>XX.XXX.XXX.XXX</span></td>
</tr>
<tr><td>URL:</td>
<td><span>www.corruptionwatch.org.za/news-views/</span></td>
</tr>
<tr>
<td>Your Browser: </td>
<td><span>SAS/9</span></td>
</tr>
<tr><td>Block ID:</td>
<td><span>BNP004</span></td>
</tr>
<tr>
<td>Block reason:</td>
<td><span>Bad bot access attempt.</span></td>
</tr>
<tr>
<td>Time:</td>
<td><span>2021-07-15 09:21:34</span></td>
</tr>
<tr>
<td>Server ID:</td>
<td><span>18006</span></td></tr>
</table>
</div>
</section>
<footer>
<span>© 2019 Sucuri Inc. All rights reserved.</span>
<span id="privacy-policy"><a href="https://sucuri.net/privacy-policy?utm_source=firewall_block" target="_blank" rel="nofollow noopener">Privacy</a></span>
</footer>
</div>
</body>
</html>
1 REPLY 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apparently they don't want automated processes "scraping" that site. The reason in the body looks like there site thinks you are a BOT of some sort.