Editor's note: The SAS Analytics Cloud is a new Software as a Service (SaaS) offering from SAS using containerized technology on the SAS Cloud. You can find out more or take advantage of a SAS Analytics Cloud free trial.
This is one of several related articles the Analytics Cloud team has put together on while operating in this new digital realm. These articles address enhancements made to a production Kubernetes cluster by the support team in order to meet customer's application needs. They also provide guidance through a couple of technical issues encountered and the solutions they developed to solve these issues.
How to secure, pre-seed and speed up Kubernetes deployments
Implementing Domain Based Network Access Rules in Kubernetes
Detect and manage idle applications in Kubernetes (current article)
Extending change management into Kubernetes
While developing SAS Analytics Cloud, there was an ongoing challenge to balance application performance against efficient hardware usage. Under certain conditions, we observed artificial resource starvation in our Kubernetes clusters. With the system executing no real work, Kubernetes was unable to schedule more tasks. To address these problems, SAS Analytics Cloud has adopted a repeatable pattern detecting idle applications reserving resources and shut them down to release those resources.
With the multi-fold advantages of cloud computing, it’s no wonder why many businesses are migrating applications en masse. Many companies have started the movement by adopting the lift-and-shirt approach and free up old infrastructure. While this doesn't magically make an application cloud-native, it allows developers to iteratively improve the applications, making use of cloud-native features and techniques.
As Kubernetes cluster operators, we welcome developers to join our utopian world of self-healing, maintenance during business hours, and efficient use of resources. However, we must balance the stability of our clusters with applications not yet fully cloud native. In this article, we’ll describe a pattern we've developed to automatically start and stop applications running in Kubernetes, without requiring modification to the application code.
For one of our primary platforms, we provision per-user copies of an application with a substantial footprint. The resource usage is quite low when the application is not active. However, user work causes substantial spikes in CPU and/or memory usage. To ensure performance and stability for each user, we use Kubernetes’ resource requests mechanism with high values. While this ensures stability, it may impede Kubernetes’ ability to schedule a new pod in a cluster doing very little.
Unfortunately, most cloud-native approaches to resource management were a poor fit. Lowering the request values runs the risk of poor performance on crowded nodes (i.e. noisy neighbors). Horizontal Pod Autoscaling requires a fully cloud-native application, which is beyond the control of cluster operators. Vertical Pod Autoscaling requires pod restarts which can be disruptive to user workloads and may not work for some resource usage patterns.
With no upstream solutions identified, we developed a pattern to detect idle pods and shut them down. We further detect when traffic streams to a non-existent application and start it dynamically.
To effectively solve the problem, we first enumerate relevant properties of the components and our operational environment.
By disallowing code modification, we target applications in early phases of “lift-and-shift", allowing developers to establish their own timelines for updating the applications. This requirement also allows targeting party applications where the code is forever beyond reach. Requiring additional layers may be avoidable but it does afford a desirable level of observation and control to the operation teams. Requiring an Ingress heavily informed our approach and yields unavoidable in this described solution.
The problem bisects into two independent tasks: shutting down idle applications and starting applications as needed.
The first step to determining idleness is adding metrics about the application. Our targeted application uses Apache which we used as a hook for metrics generation. In a new Dockerfile, we start with the provided application image as a base, adding /etc/httpd/conf.d/metrics.conf along with a custom script. The configuration file specifies a logging module that will pipe formatted content to our script. This approach easily adapts to other web servers and proxies. An alternate approach is tailing existing logs from any source, though this presents less room to control to and interpret the format.
As the script runs, it parses the logs and serves pertinent Prometheus metrics. After filtering out some noise, a simple gauge-type metric for “seconds since last hit” fit our needs perfectly. Other applications may require more complex metrics.
With raw metrics available, we need a component to consume them and apply some logic. In addition to the large application we are targeting, we provide a core services pod in the namespace. An additional service is added, polling the metrics endpoint and comparing the response against a configurable threshold. If the “seconds since last hit” is greater than “seconds until idle”, the service scales the target application threshold to 0 replicas. You could adapt this technique to run as a sidecar or in the metrics-creating script.
Our solution relies on NGINX ngress Controllers managing application Ingresses. In the configuration for the Ingress Controller requires two specific CLI flags: --default-backend-service and --configmap. The first flag allows specifying a service that functions as a catch-all service for traffic that can’t get to its destination. The configmap uses the property custom-http-errors, determining the HTTP response codes to intercept. For this solution we need 503 to be in that list of response codes.
When ingress traffic streams towards a shutdown application, NGINX determines the backing service and typically responds with a 503. However, with the configurations specified, the traffic collects additional headers and transmits to the custom backend. Using the headers, the backend infers the targeted application and begins a workflow to start the application and redirect traffic to a holding page. The holding page polls the back-end service periodically until it is available, and finally redirects the user to the page.
For applications that start quickly, it is possible to handle the redirect without the holding page. While not yet explored, it might also be possible to reassemble the original request and let it the newly launched application handle it as normal. Both scenarios are highly dependent on the nature of the application.
The following is a simplified version of the default backend endpoint, within a python flask application. For clarity in the code snippet below, I've included parameterized elements in-line.
@app.route('/', methods=['GET', 'POST'])
def index():
headers = request.headers
// Only handle "Service Unavailable" scenarios
if 'X-CODE' in headers and headers['X-CODE'] == '503':
try:
core_api = client.CoreV1Api()
apps_api = client.AppsV1Api()
// Verify we should be working on this namespace
namespace = core_api.read_namespace(headers['X-Namespace'])
if namespace.metadata.labels.get('sas.com/acloud-type', None) == \
'acloud-project':
// Check for target deployment.
deployment = apps_api.read_namespaced_deployment(
name='acloud-application-name',
namespace=namespace.metadata.name)
if deployment.spec.replicas == 0:
t = threading.Thread(
target=launch_environment,
kwargs={'namespace': namespace, 'deployment': deployment})
t.start()
// Redirect user to a holding page
url = 'https://{}.{}/startup?app={}'.format(
SUBDOMAIN,
DOMAIN,
namespace.metadata.labels['sas.com/acloud-name'])
return redirect(url)
By implementing this idling pattern, SAS Analytics Cloud ensures a minimum performance level without excessive risk of artificial resource starvation. The flexibility of the pattern has allowed us to apply similar logic to multiple applications. The ability to automatically relaunch the application when traffic is detected has minimized the visible customer impact. Overall, the technique has proven useful and will continue with new applications as they are available in SAS Analytics Cloud.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.