In my previous post, Using FILENAME URL to Access Internet Information, I noted that I preferred to use PROC HTTP to access internet resources because it was more flexible and powerful. So I decided that next, I'd write a series of articles about using PROC HTTP in Base SAS to access Internet resources and how to troubleshoot common problems. But before we discuss PROC HTTP, it might be useful to review how the HTTP (HyperText Transfer Protocol) protocol works, the required syntax, and to see a couple of examples using the commonly available command-line tool, cURL (client for URL). CURL is available in Windows and Linux, so the examples here should work from the command line no matter what operating system you are using. We'll see more of cURL in a future post about troubleshooting a PROC HTTP step that is not working as expected. Knowing the basics of how to use cURL can help you better determine whether the problem lies with PROC HTTP or other parts of the system.
HTTP (HyperText Transfer Protocol) is a foundational technology for communicating with remote servers and transferring data over the Internet. You use this protocol, without even having to think about it, every time you open your web browser. The protocol defines how messages are formatted and transmitted, and how web servers and browsers should respond to various commands and requests. It is stateless, meaning that each request from a client to a server must be completely independent - the server doesn’t have to remember anything from any previous request. HTTP uses TCP/IP (Transmission Control Protocol/Internet Protocol) to do the actual data transfer. The information passed between machines is not encrypted, and so it is vulnerable to interception. This is a definite security concern.
HTTPS (HyperText Transfer Protocol Secure) is HTTP with an added layer of security. It uses TLS (Transport Layer Security) to encrypt the data to be exchanged before using TCP/IP to transmit the data between the client and server. TLS encryption obscures data being transferred, keeping it safe even if intercepted by a third party. It also provides authentication, to ensure that the parties exchanging information are who they claim to be, and data integrity, by verifying that the data you receive hasn’t been forged or tampered with in any way. HTTPS is crucial for protecting sensitive information like login credentials and payment details, and in modern systems is the default.
In summary, TCP/IP handles the data transmission TLS adds a security layer to ensure data is transmitted securely, remains confidential, and retains integrity. HTTP ensures that messages and data are formatted and transmitted in a format that web servers and browsers can handle when requesting and receiving data.
HTTP uses headers to provide essential information about the request or response, such as the type of data being sent, the encoding used, and instructions for caching. I like to think of headers as providing metadata about your HTTP request. Headers consist of key-value pairs sent between the client and server in addition to the HTTP request or response. They ensure that the client and server both understand how to process the data being exchanged.
A URI (Uniform Resource Identifier) identifies a specific resource you want to access on the internet. It consists of several components:
Here's an example of a full URI:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Here are some of the most common authentication schemes:
This is the simplest form of authentication. It involves sending the username and password encoded in Base64 to the server in the Authorization header. It’s easy to implement, but not secure unless used over HTTPS because your credentials are not encrypted – they pass over the network as clear text.
API keys are unique identifiers issued to users by an API administrator to control and monitor access. The key is sent with each request, either in the query string, as a request header, or as a cookie. While this method is pretty straightforward, again it should only be used with HTTPS to ensure the key is not exposed to third parties.
OAuth 2.0 is a secure, flexible method, frequently used for third-party access to resources. You obtain an access token from an authorization server, then use it to access the API. The process for obtaining an access token can sometimes be challenging.
Bearer tokens are included in the Authorization header to authenticate to the server. The token grants access to the API without needing to send your credentials with each request. While getting this initially set up can be intimidating, once you have your token it’s very easy to use.
When using mutual TLS, both the client and the server must authenticate to each other using certificates. This method is highly secure and often used where very strong authentication is required. It’s a bit trickier to set up and use than the other methods.
A proxy server acts as an intermediary between a client and a backend service, such as an API. A client API request is sent to the proxy server, which forwards it to the API server. The API server response is sent back to the proxy server, which then forwards it to the client. This setup offers several benefits:
To send an API request through a proxy server that requires authentication, add the Proxy-Authorization header to your HTTP request. The header usually includes the authentication scheme (e.g., Basic, Bearer Token, etc.) followed by the credentials. Basic authentication credentials are encoded in Base64.
Example Proxy Authorization Header:
GET /api/resource HTTP/1.1
Host: api.example.com
Proxy-Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
In this example, dXNlcm5hbWU6cGFzc3dvcmQ= is the Base64-encoded string containing username:password.
Bad – requests a non-existent resource:
curl -X GET "https://httpbin.org/bad "
Good – requests a known resource:
curl -X GET http://httpbin.org/get
curl -X GET "httpbin.org/xml" -H "Accept:application/xml" -H "Content-Type:application/json" -H "User-Agent:MyCustomAgent"
Bad – incorrect userID and password combination:
curl -X GET "https://httpbin.org/basic-auth/UserID/Pa55w0rd " -H "accept: application/json" -u "UserID:Password"
Good – correct userID and password combination:
curl -X GET "https://httpbin.org/basic-auth/UserID/Pa55w0rd " -H "accept: application/json" -u "UserID:Pa55w0rd"
Bad – bearer token not provided:
curl -X GET "https://httpbin.org/bearer " -H "accept: application/json"
Good – proper bearer token provided:
curl -X GET "https://httpbin.org/bearer " -H "accept: application/json" -H "Authorization: Bearer gakdfdadfkae213913"
curl -X DELETE "https://httpbin.org/delete?file=badfile.html " -H "accept: application/json"
I hope you found this information useful. My next post in this series will cover how to get the same results as these cURL examples using PROC HTTP in SAS.
Until next time, may the SAS be with you!
Mark
PS: Download a ZIP file containing a PDF of this article and the cURL code in a text file. You can copy and paste those commands onto the command line to see them in action. Get it here: https://bit.ly/SASJediPROCHTTP
Find more articles from SAS Global Enablement and Learning here.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.