SAS Programming

kaziumair · Posted 07-15-2021 09:54 AM

Hi everyone ,

I am trying to scrape "https://www.corruptionwatch.org.za/news-views/" website . However my access is getting blocked.

I have attached the proc http code I used to fetch the HTML Code and the extracted HTML for your reference below.

Please suggest a solution to resolve this problem?

filename extract "Location/corruptionwatch.txt";

proc http
	method="GET"
	out=extract
	url="https://www.corruptionwatch.org.za/news-views/";
run;

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="https://cdn.sucuri.net/sucuri-firewall-block.css" />
<section class="center clearfix">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sucuri WebSite Firewall - Access Denied</title>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700" rel="stylesheet" type="text/css">
</head>
<body>
<div id="main-container">
<header class="app-header clearfix">
<div class="wrap">
<a href="https://www.sucuri.net/?utm_source=firewall_block" class="logo"></a>
<span class="logo-neartext">Website Firewall</span>
<a href="https://sucuri.net/?utm_source=firewall_block" class="site-link">Back to sucuri.net</a>
</div>
</header>

<section class="app-content access-denied clearfix"><div class="box center width-max-940"><h1 class="brand-font font-size-xtra no-margin"><i class="icon-circle-red"></i>Access Denied - Sucuri Website Firewall</h1>
<p class="medium-text code-snippet">If you are the site owner (or you manage this site), please whitelist your IP or if you think this block is an error please <a href="https://support.sucuri.net/?utm_source=firewall_block" class="color-green underline">open a support ticket</a> and make sure to include the block details (displayed in the box below), so we can assist you in troubleshooting the issue. </p><h2>Block details:</h1>
<table class="property-table overflow-break-all line-height-16">
<tr>
<td>Your IP:</td>
<td><span>XX.XXX.XXX.XXX</span></td>
</tr>
<tr><td>URL:</td>
<td><span>www.corruptionwatch.org.za/news-views/</span></td>
</tr>
<tr>
<td>Your Browser: </td>
<td><span>SAS/9</span></td>
</tr>
<tr><td>Block ID:</td>
<td><span>BNP004</span></td>
</tr>
<tr>
<td>Block reason:</td>
<td><span>Bad bot access attempt.</span></td>
</tr>
<tr>
<td>Time:</td>
<td><span>2021-07-15 09:21:34</span></td>
</tr>
<tr>
<td>Server ID:</td>
<td><span>18006</span></td></tr>
</table>
</div>
</section>

<footer>
<span>&copy; 2019 Sucuri Inc. All rights reserved.</span>
<span id="privacy-policy"><a href="https://sucuri.net/privacy-policy?utm_source=firewall_block" target="_blank" rel="nofollow noopener">Privacy</a></span>
</footer>
</div>
</body>
</html>

ballardw · Posted 07-15-2021 10:17 AM

Apparently they don't want automated processes "scraping" that site. The reason in the body looks like there site thinks you are a BOT of some sort.

SAS Programming

Website access blocked

Re: Website access blocked

Using FILENAME URL to Access Internet Information

Using ODS inside IML submit/endsubmit block

Using PROC HTTP 2 - Accessing Internet Information and Debugging

Demo: SAS Viya Workbench and SAS code to access Microsoft 365

Accessing standardized residuals: arima

Follow Us

What is...

SAS Programming

Website access blocked

Re: Website access blocked

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Using FILENAME URL to Access Internet Information

Using ODS inside IML submit/endsubmit block

Using PROC HTTP 2 - Accessing Internet Information and Debugging

Demo: SAS Viya Workbench and SAS code to access Microsoft 365

Accessing standardized residuals: arima

Follow Us

What is...