SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
kaziumair
Quartz | Level 8

Hi everyone , 

I am trying to scrape "https://www.corruptionwatch.org.za/news-views/" website . However my access is getting blocked.

I have attached the proc http code I used to fetch the HTML Code and the extracted HTML for your reference below. 

Please suggest a solution to resolve this problem?

filename extract "Location/corruptionwatch.txt";

proc http
	method="GET"
	out=extract
	url="https://www.corruptionwatch.org.za/news-views/";
run;
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="https://cdn.sucuri.net/sucuri-firewall-block.css" />
<section class="center clearfix">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sucuri WebSite Firewall - Access Denied</title>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700" rel="stylesheet" type="text/css">
</head>
<body>
<div id="main-container">
<header class="app-header clearfix">
<div class="wrap">
<a href="https://www.sucuri.net/?utm_source=firewall_block" class="logo"></a>
<span class="logo-neartext">Website Firewall</span>
<a href="https://sucuri.net/?utm_source=firewall_block" class="site-link">Back to sucuri.net</a>
</div>
</header>

<section class="app-content access-denied clearfix"><div class="box center width-max-940"><h1 class="brand-font font-size-xtra no-margin"><i class="icon-circle-red"></i>Access Denied - Sucuri Website Firewall</h1>
<p class="medium-text code-snippet">If you are the site owner (or you manage this site), please whitelist your IP or if you think this block is an error please <a href="https://support.sucuri.net/?utm_source=firewall_block" class="color-green underline">open a support ticket</a> and make sure to include the block details (displayed in the box below), so we can assist you in troubleshooting the issue. </p><h2>Block details:</h1>
<table class="property-table overflow-break-all line-height-16">
<tr>
<td>Your IP:</td>
<td><span>XX.XXX.XXX.XXX</span></td>
</tr>
<tr><td>URL:</td>
<td><span>www.corruptionwatch.org.za/news-views/</span></td>
</tr>
<tr>
<td>Your Browser: </td>
<td><span>SAS/9</span></td>
</tr>
<tr><td>Block ID:</td>
<td><span>BNP004</span></td>
</tr>
<tr>
<td>Block reason:</td>
<td><span>Bad bot access attempt.</span></td>
</tr>
<tr>
<td>Time:</td>
<td><span>2021-07-15 09:21:34</span></td>
</tr>
<tr>
<td>Server ID:</td>
<td><span>18006</span></td></tr>
</table>
</div>
</section>

<footer>
<span>&copy; 2019 Sucuri Inc. All rights reserved.</span>
<span id="privacy-policy"><a href="https://sucuri.net/privacy-policy?utm_source=firewall_block" target="_blank" rel="nofollow noopener">Privacy</a></span>
</footer>
</div>
</body>
</html>

 

1 REPLY 1
ballardw
Super User

Apparently they don't want automated processes "scraping" that site. The reason in the body looks like there site thinks you are a BOT of some sort.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1579 views
  • 0 likes
  • 2 in conversation