Hi, I'm trying to scrape some data via PROC HTTP from a page that requires authentication, but am stuck at the login phase, where I get a 403 response for a CSRF error:
CSRF verification failed. Request aborted. You are seeing this message because this HTTPS site requires a Referer header to be sent by your web browser but none was sent ... ...
I'm very much a novice to this, so hopefully it is an easy fix. Below is my code plus the output from PROC HTTP debug:
filename src temp;
proc http
method = "POST"
url = "https://www.awebsite.com/accounts/login/?next=/accounts/login/"
out = src
webusername = "user"
webpassword = "pass";
headers "Connection"="keep-alive";
debug level=2;
run;
data test; *from a SAS blog post;
infile src length = len lrecl = 32767;
input line $varying32767. len;
line = strip(line);
if len > 0;
run;
And the debug output from my log:
> POST /accounts/login/?next=/accounts/login/ HTTP/1.1 > User-Agent: SAS/9 > Host: www.awebsite.com > Accept: */* > Content-Length: 0 > Cookie: csrftoken=oMSSRKxD3Z5e58DWDhd8cMhPkZ8IHqBFePBaZycwyAzxD3jFVKG2ZG8LFnUMRQyh > Connection: keep-alive > Content-Type: application/x-www-form-urlencoded > < HTTP/1.1 403 Forbidden < Date: Fri, 05 Nov 2021 15:11:34 GMT < Server: Apache/2.4.34 (Red Hat) OpenSSL/1.0.2k-fips mod_wsgi/4.6.8 Python/3.8 < X-Frame-Options: SAMEORIGIN < Content-Length: 1889 < Keep-Alive: timeout=5, max=100 < Connection: Keep-Alive < Content-Type: text/html; charset=UTF-8
I'm not finding much on the CSRF error for SAS, mainly discussions on other platforms.
I assume with the keep-alive connection, once I'm able to successfully authenticate then I'll be able to pull the data from other pages, so this appears to be my main roadblock.
Thanks for any help!
I assume it has to do with the HTTPS call, and https://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003286920.htm has some instructions on how to deal with that, but I just don't really understand how to implement them.
Edit to add: This site (https://medium.com/@codebyamir/the-java-developers-guide-to-ssl-certificates-b78142b3a0fc) also seems to offer some hints in where the JRE certificates may be stored as well as the default password, but pasting that path/password into the code from the SAS support doc above and running from the Windows command line does not give any change to the site's response.
I also tried manually editing SASV9.cfg to add in the certificate path and password, and still get no change in the page's response.
If this site requires authentication then your approach might depend on a couple of things.
First, it might be simple and require a cookie-based session like this example.
But it could be more complex. If by logging into the site you receive a token, you may need to pass that token in the URL or headers for any subsequent calls.
A quick search on the url pattern in your example reveals it may be a site powered using the Django framework, a Python-language site builder/API framework. If the site offers an API or perhaps serves up data that you can get to in another way (database or another API closer to the source), then that might be a more fruitful approach.
Otherwise you're left using PROC HTTP to mimic the interactions of a browser, logging in and capturing session data that can be passed into subsequent calls. Sometimes you can learn more by opening your browser developer console and observing the Network tab, so you can see where browser-to-site network calls go as the site serves up data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.