BookmarkSubscribeRSS Feed
kaziumair
Quartz | Level 8

Hi everyone , 

I am trying to scrape "https://www.corruptionwatch.org.za/news-views/" website . However my access is getting blocked.

I have attached the proc http code I used to fetch the HTML Code and the extracted HTML for your reference below. 

Please suggest a solution to resolve this problem?

filename extract "Location/corruptionwatch.txt";

proc http
	method="GET"
	out=extract
	url="https://www.corruptionwatch.org.za/news-views/";
run;
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="https://cdn.sucuri.net/sucuri-firewall-block.css" />
<section class="center clearfix">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sucuri WebSite Firewall - Access Denied</title>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700" rel="stylesheet" type="text/css">
</head>
<body>
<div id="main-container">
<header class="app-header clearfix">
<div class="wrap">
<a href="https://www.sucuri.net/?utm_source=firewall_block" class="logo"></a>
<span class="logo-neartext">Website Firewall</span>
<a href="https://sucuri.net/?utm_source=firewall_block" class="site-link">Back to sucuri.net</a>
</div>
</header>

<section class="app-content access-denied clearfix"><div class="box center width-max-940"><h1 class="brand-font font-size-xtra no-margin"><i class="icon-circle-red"></i>Access Denied - Sucuri Website Firewall</h1>
<p class="medium-text code-snippet">If you are the site owner (or you manage this site), please whitelist your IP or if you think this block is an error please <a href="https://support.sucuri.net/?utm_source=firewall_block" class="color-green underline">open a support ticket</a> and make sure to include the block details (displayed in the box below), so we can assist you in troubleshooting the issue. </p><h2>Block details:</h1>
<table class="property-table overflow-break-all line-height-16">
<tr>
<td>Your IP:</td>
<td><span>XX.XXX.XXX.XXX</span></td>
</tr>
<tr><td>URL:</td>
<td><span>www.corruptionwatch.org.za/news-views/</span></td>
</tr>
<tr>
<td>Your Browser: </td>
<td><span>SAS/9</span></td>
</tr>
<tr><td>Block ID:</td>
<td><span>BNP004</span></td>
</tr>
<tr>
<td>Block reason:</td>
<td><span>Bad bot access attempt.</span></td>
</tr>
<tr>
<td>Time:</td>
<td><span>2021-07-15 09:21:34</span></td>
</tr>
<tr>
<td>Server ID:</td>
<td><span>18006</span></td></tr>
</table>
</div>
</section>

<footer>
<span>&copy; 2019 Sucuri Inc. All rights reserved.</span>
<span id="privacy-policy"><a href="https://sucuri.net/privacy-policy?utm_source=firewall_block" target="_blank" rel="nofollow noopener">Privacy</a></span>
</footer>
</div>
</body>
</html>

 

1 REPLY 1
ballardw
Super User

Apparently they don't want automated processes "scraping" that site. The reason in the body looks like there site thinks you are a BOT of some sort.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1713 views
  • 0 likes
  • 2 in conversation