Using PROC HTTP 1 - Review of the HTTP Protocol

7 Likes

Introduction

In my previous post, Using FILENAME URL to Access Internet Information, I noted that I preferred to use PROC HTTP to access internet resources because it was more flexible and powerful. So I decided that next, I'd write a series of articles about using PROC HTTP in Base SAS to access Internet resources and how to troubleshoot common problems. But before we discuss PROC HTTP, it might be useful to review how the HTTP (HyperText Transfer Protocol) protocol works, the required syntax, and to see a couple of examples using the commonly available command-line tool, cURL (client for URL). CURL is available in Windows and Linux, so the examples here should work from the command line no matter what operating system you are using. We'll see more of cURL in a future post about troubleshooting a PROC HTTP step that is not working as expected. Knowing the basics of how to use cURL can help you better determine whether the problem lies with PROC HTTP or other parts of the system.

HTTP and HTTPS

HTTP (HyperText Transfer Protocol) is a foundational technology for communicating with remote servers and transferring data over the Internet. You use this protocol, without even having to think about it, every time you open your web browser. The protocol defines how messages are formatted and transmitted, and how web servers and browsers should respond to various commands and requests. It is stateless, meaning that each request from a client to a server must be completely independent - the server doesn’t have to remember anything from any previous request. HTTP uses TCP/IP (Transmission Control Protocol/Internet Protocol) to do the actual data transfer. The information passed between machines is not encrypted, and so it is vulnerable to interception. This is a definite security concern.

HTTPS (HyperText Transfer Protocol Secure) is HTTP with an added layer of security. It uses TLS (Transport Layer Security) to encrypt the data to be exchanged before using TCP/IP to transmit the data between the client and server. TLS encryption obscures data being transferred, keeping it safe even if intercepted by a third party. It also provides authentication, to ensure that the parties exchanging information are who they claim to be, and data integrity, by verifying that the data you receive hasn’t been forged or tampered with in any way. HTTPS is crucial for protecting sensitive information like login credentials and payment details, and in modern systems is the default.

In summary, TCP/IP handles the data transmission TLS adds a security layer to ensure data is transmitted securely, remains confidential, and retains integrity. HTTP ensures that messages and data are formatted and transmitted in a format that web servers and browsers can handle when requesting and receiving data.

HTTP Headers

HTTP uses headers to provide essential information about the request or response, such as the type of data being sent, the encoding used, and instructions for caching. I like to think of headers as providing metadata about your HTTP request. Headers consist of key-value pairs sent between the client and server in addition to the HTTP request or response. They ensure that the client and server both understand how to process the data being exchanged.

Common HTTP request header components:

Host: The domain name of the server (e.g., Host: httpbin.org).
User-Agent: The client software making the request (e.g., User-Agent: Mozilla/5.0).
Accept: The media type the client can process (e.g., Accept: application/json).
Content-Type: The media type of the request (e.g., Content-Type: application/json).
Authorization: Credentials for authenticating the client to the server (e.g., Authorization: Bearer <token>).

Common HTTP response header components:

Content-Type: The media type of the response (e.g., Content-Type: application/json).
Content-Length: The size of the response in bytes (e.g., Content-Length: 999).
Set-Cookie: Sends cookies from the server to the client (e.g., Set-Cookie: sessionId=abc123).
Cache-Control: Provides caching directives (e.g., Cache-Control: no-cache).
Server: The server software handling the request (e.g., Server: Apache/2.4.1).

Commonly Used HTTP Methods

GET: Requests data from a specified resource on the server. GET requests are read-only and do not alter the state resources. Example: fetching information from an API.
POST: Sends data to a server. Example: submitting a form to create a new user.
PUT: Updates an existing resource with new data or, if the resource does not yet exist, it creates it. Example: updating user information.
DELETE: Deletes a specified resource. Example: removing a user from a database.

Components of a URI

A URI (Uniform Resource Identifier) identifies a specific resource you want to access on the internet. It consists of several components:

Scheme: Indicates the protocol to be used (e.g., http, https, ftp).
Host: The domain name or IP address of the server (e.g., httpbin.org).
Port: The port number on the server (optional, HTTP default is 80, HTTPS default is 443).
Path: The specific resource within the host (e.g., /users).
Query: A string of key-value pairs used to pass data to the server (e.g., ?id=123).
Fragment: A reference to a specific part of the resource (optional, e.g., #section1).

Here's an example of a full URI:

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Common Authentication Methods

Here are some of the most common authentication schemes:

HTTP Basic

This is the simplest form of authentication. It involves sending the username and password encoded in Base64 to the server in the Authorization header. It’s easy to implement, but not secure unless used over HTTPS because your credentials are not encrypted – they pass over the network as clear text.

API Keys

API keys are unique identifiers issued to users by an API administrator to control and monitor access. The key is sent with each request, either in the query string, as a request header, or as a cookie. While this method is pretty straightforward, again it should only be used with HTTPS to ensure the key is not exposed to third parties.

OAuth 2.0

OAuth 2.0 is a secure, flexible method, frequently used for third-party access to resources. You obtain an access token from an authorization server, then use it to access the API. The process for obtaining an access token can sometimes be challenging.

Bearer Tokens

Bearer tokens are included in the Authorization header to authenticate to the server. The token grants access to the API without needing to send your credentials with each request. While getting this initially set up can be intimidating, once you have your token it’s very easy to use.

Mutual TLS (mTLS)

When using mutual TLS, both the client and the server must authenticate to each other using certificates. This method is highly secure and often used where very strong authentication is required. It’s a bit trickier to set up and use than the other methods.

Proxy Servers

A proxy server acts as an intermediary between a client and a backend service, such as an API. A client API request is sent to the proxy server, which forwards it to the API server. The API server response is sent back to the proxy server, which then forwards it to the client. This setup offers several benefits:

Security: Proxy servers add an extra layer of security by hiding the backend server's actual IP address and filtering out malicious traffic.
Load Balancing: Proxy servers can distribute incoming requests across multiple backend servers, balancing the load to improve performance.
Caching: Proxy servers can cache responses from the backend server, reducing the load on the backend and improving response times for repeated requests.
Rate Limiting: Rate limits can be enforced to prevent abuse and ensure fair usage of the API.
Logging and Monitoring: Proxy request and response logs can provide valuable insights to the system administrator for monitoring and debugging.

Proxy Server Authentication

To send an API request through a proxy server that requires authentication, add the Proxy-Authorization header to your HTTP request. The header usually includes the authentication scheme (e.g., Basic, Bearer Token, etc.) followed by the credentials. Basic authentication credentials are encoded in Base64.

Example Proxy Authorization Header:

GET /api/resource HTTP/1.1

Host: api.example.com

Proxy-Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

In this example, dXNlcm5hbWU6cGFzc3dvcmQ= is the Base64-encoded string containing username:password.

Some cURL Examples

Simple GET request:

Bad – requests a non-existent resource:

curl -X GET "https://httpbin.org/bad "

Good – requests a known resource:

curl -X GET http://httpbin.org/get

GET request with custom header information:

curl -X GET "httpbin.org/xml" -H "Accept:application/xml" -H "Content-Type:application/json" -H "User-Agent:MyCustomAgent"

GET request with basic authorization:

Bad – incorrect userID and password combination:

curl -X GET "https://httpbin.org/basic-auth/UserID/Pa55w0rd " -H "accept: application/json" -u "UserID:Password"

Good – correct userID and password combination:

curl -X GET "https://httpbin.org/basic-auth/UserID/Pa55w0rd " -H "accept: application/json" -u "UserID:Pa55w0rd"

GET request with bearer token authorization:

Bad – bearer token not provided:
curl -X GET "https://httpbin.org/bearer " -H "accept: application/json"

Good – proper bearer token provided:
curl -X GET "https://httpbin.org/bearer " -H "accept: application/json" -H "Authorization: Bearer gakdfdadfkae213913"

POST request:
curl -X POST "https://httpbin.org/post " -H "accept: application/json" -d "Posted text."

PUT request:
curl -X PUT "https://httpbin.org/put " -H "accept: application/json" -d "Updated text.

DELETE request:

curl -X DELETE "https://httpbin.org/delete?file=badfile.html " -H "accept: application/json"

I hope you found this information useful. My next post in this series will cover how to get the same results as these cURL examples using PROC HTTP in SAS.

Until next time, may the SAS be with you!

Mark

PS: Download a ZIP file containing a PDF of this article and the cURL code in a text file. You can copy and paste those commands onto the command line to see them in action. Get it here: https://bit.ly/SASJediPROCHTTP

Find more articles from SAS Global Enablement and Learning here.