BookmarkSubscribeRSS Feed
kaziumair
Quartz | Level 8

Hi everyone , 

I am trying to scrape "https://www.corruptionwatch.org.za/news-views/" website . However my access is getting blocked.

I have attached the proc http code I used to fetch the HTML Code and the extracted HTML for your reference below. 

Please suggest a solution to resolve this problem?

filename extract "Location/corruptionwatch.txt";

proc http
	method="GET"
	out=extract
	url="https://www.corruptionwatch.org.za/news-views/";
run;
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="https://cdn.sucuri.net/sucuri-firewall-block.css" />
<section class="center clearfix">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sucuri WebSite Firewall - Access Denied</title>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700" rel="stylesheet" type="text/css">
</head>
<body>
<div id="main-container">
<header class="app-header clearfix">
<div class="wrap">
<a href="https://www.sucuri.net/?utm_source=firewall_block" class="logo"></a>
<span class="logo-neartext">Website Firewall</span>
<a href="https://sucuri.net/?utm_source=firewall_block" class="site-link">Back to sucuri.net</a>
</div>
</header>

<section class="app-content access-denied clearfix"><div class="box center width-max-940"><h1 class="brand-font font-size-xtra no-margin"><i class="icon-circle-red"></i>Access Denied - Sucuri Website Firewall</h1>
<p class="medium-text code-snippet">If you are the site owner (or you manage this site), please whitelist your IP or if you think this block is an error please <a href="https://support.sucuri.net/?utm_source=firewall_block" class="color-green underline">open a support ticket</a> and make sure to include the block details (displayed in the box below), so we can assist you in troubleshooting the issue. </p><h2>Block details:</h1>
<table class="property-table overflow-break-all line-height-16">
<tr>
<td>Your IP:</td>
<td><span>XX.XXX.XXX.XXX</span></td>
</tr>
<tr><td>URL:</td>
<td><span>www.corruptionwatch.org.za/news-views/</span></td>
</tr>
<tr>
<td>Your Browser: </td>
<td><span>SAS/9</span></td>
</tr>
<tr><td>Block ID:</td>
<td><span>BNP004</span></td>
</tr>
<tr>
<td>Block reason:</td>
<td><span>Bad bot access attempt.</span></td>
</tr>
<tr>
<td>Time:</td>
<td><span>2021-07-15 09:21:34</span></td>
</tr>
<tr>
<td>Server ID:</td>
<td><span>18006</span></td></tr>
</table>
</div>
</section>

<footer>
<span>&copy; 2019 Sucuri Inc. All rights reserved.</span>
<span id="privacy-policy"><a href="https://sucuri.net/privacy-policy?utm_source=firewall_block" target="_blank" rel="nofollow noopener">Privacy</a></span>
</footer>
</div>
</body>
</html>

 

1 REPLY 1
ballardw
Super User

Apparently they don't want automated processes "scraping" that site. The reason in the body looks like there site thinks you are a BOT of some sort.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1663 views
  • 0 likes
  • 2 in conversation