How to secure, pre-seed and speed up Kubernetes deployments

Editor's note: The SAS Analytics Cloud is a new Software as a Service (SaaS) offering from SAS using containerized technology on the SAS Cloud. You can find out more or take advantage of a SAS Analytics Cloud free trial.

This is one of several related articles the Analytics Cloud team has put together on while operating in this new digital realm. These articles address enhancements made to a production Kubernetes cluster by the support team in order to meet customer's application needs. They also provide guidance through a couple of technical issues encountered and the solutions they developed to solve these issues.

Articles in the sequence:

How to secure, pre-seed and speed up Kubernetes deployment (current article)

Implementing Domain Based Network Access Rules in Kubernetes

Detect and manage idle applications in Kubernetes

Extending change management into Kubernetes

How to secure, pre-seed and speed up Kubernetes deployments

This article addresses two key areas of need for a Kubernetes deployment:

How to keep the application launch responsive and snappy
How to keep customer data secure

Managing the deployed application

Background

In theory, any containerized SAS application is portable to a Kubernetes environment. Unlike traditional hosting methods, the Analytics Cloud is a true, on-demand environment. Customers are not only consistently logging in and out, but also purchasing, consuming, and decommissioning projects without the intervention of a systems administrator. Online ordering, approvals, and automation take the place of contracts, procurement, and virtual machine creation. When operating near optimum subscription level the environment can be very busy, so it is important to make perfect use of every resource.

Therefore, one application requirement for placement of an application into the Analytics Cloud marketplace is the ability to idle out and spin down the application. In this state the application consumes no resources outside of small ingress watchers listening for user sessions to begin.

Since applications spin down regularly, they must also spin up rapidly and regularly as well. Ideally, the end user validates their login and the Analytics Cloud watcher and launcher applications start the end-user's applications. And ideally, this occurs before the user even notices the applications were ever idled down.

Continuous delivery and constant change

Continuous development means these applications images are subject to constant revisions and deployment. Each new deployment results in a image with new tagging. Different customers might be at different revision levels, further contributing to image sprawl. The images exist in a rather busy central repository (in most cases that repository is https://harbor.unx.sas.com).

Examining a typical Kubernetes worker node and list the docker images in cache, we typically find hundreds of images like the one below:

Image	Tag	Last Update	Size
harbor.unx.sas.com/infra-dev/adx/poac	v1.1.20	8 days ago	11.2GB
harbor.unx.sas.com/infra-dev/adx/poac	Prod	11 mins ago	10.1GB

Figure 1: Example images and their tags from a working node. Note the large size of the containerized application.

A full output's list would show hundreds of images, ranging in size from a few GB to very large images in the double digits of GB.

Keep it expedient

Problems arise quickly if a node receives the call to start a Programming Only, Analytics Container (POAC) or similar containerized application and the application is not able to be rapidly retrieved from cache. If it’s the first launch on this node since a version release, this requires image retrieval from the central repository before launching. Additionally, if bringing on any dynamically added host capacity to address load, these new nodes will start with no application images in cache at all.

Before we adopted this project, even with the state-of-the-art hardware available in our production Kubernetes environment, the user still received a dialogue stating, “Please wait while the environment is prepared…” In addition, we incurred wait times of around two to five minutes while the image copied from the repository and launched. The UI and launcher code render the wait message while checking in the background for the environment's availability, and then launch the environment when services are responding.

To the team, it was clear we would need a solution that would work in the background to scan for images which did not yet exist on the node. Since the program would be pre-seeding the images on the nodes or making sure the image exists preceding the launch, it seemed only fitting to launch a code-branch for project "Cedar."

Components and solutions

The workflow for Cedar is as follows:

Establish a privileged container with access to the Docker socket on each node.
Set up the container for docker use.
Retrieve a list of utilized images.
Update the nodes’ cache on a schedule.
Clean up orphaned images.

Create a privileged container

We deploy the Cedar application as a Daemonset, meaning it will run on all the worker nodes. Since the utility and control nodes do not run customer images, we do not need to worry about the docker cache there.

Cedar runs as a privileged container, because it will need the ability to update the docker cache by connecting to the docker socket. We must connect to this docker socket by carrying it into the container as a volume.

The code below depicts a Docker YAML deployment file (important parts in bold).

apiVersion: v1 
kind: ConfigMap 
metadata: 
  name: sas-adxr-cedar 
  namespace: sas-adxr-system 
data: 
--- 
apiVersion: extensions/v1beta1 
kind: DaemonSet 
metadata: 
  name: sas-adxr-cedar 
  namespace: sas-adxr-system 
  labels: 
    tier: node 
    k8s-app: sas-adxr-cedar 
spec: 
  template: 
    metadata: 
      labels: 
        tier: node 
        k8s-app: sas-adxr-cedar 
    spec: 
      hostNetwork: true 
      serviceAccount: sas-adxr-system 
      containers: 
        - image: registry.unx.sas.com/infra-dev/adx/cedar:prod 
          name: sas-adxr-cedar 
          imagePullPolicy: Always 
          securityContext: 
            privileged: true 
          volumeMounts: 
            - mountPath: /var/run/docker.sock 
              name: dockersock 
            - mountPath: /lib/modules 
              name: modules 
              readOnly: true 
            - mountPath: /etc/secret 
              name: imagesecret 
      volumes: 
        - name: modules 
          hostPath: 
            path: /lib/modules 
        - name: imagesecret 
          secret: 
            secretName: image-pull 
        - name: dockersock 
          hostPath: 
            path: /var/run/docker.sock

Figure 2: Cedar Deployment File (YAML format)

Set the container’s docker setup

Our code repository, Harbor, uses a set of login credentials we defined in a volume mount above. We create a symbolic link to this information for use with the docker CLI. We also initialize a daytimer value to zero. And finally, after 20 runs, we will clear out orphaned (unbounded) images.

#!/bin/bash

#Create the SymLinks and configuration directory for docker.  The symlink will be created from our transported harbor-creds (see the deployment yaml)

mkdir /root/.docker
ln -s /etc/secret/.dockerconfigjson /root/.docker/config.json

daytimer=0

Code 1: Initialization of the docker configuration to allow docker to run in the container.

Decide which images to cache

Ironically, the hardest part of Cedar's execution is performed in just a few simple lines. The kubectl command provides us access to the replication sets (RS) and deployments for any running images. By scanning the output for anything after the Image: text, we retrieve a list of images and their tags.

We then use the bash mapfile utility to create an array containing the values for the images, sort it, and de-duplicate it with simple Linux commands.


while true
do


# Step 1, retreive the manifests in use.  This has changed quite a bit, but this seems to be the best way for now:

mapfile -t image_array1 < <(kubectl get rs --all-namespaces -o yaml | grep 'harbor.unx.sas.com' | grep -oP '(?<=image: ).*' )
mapfile -t image_array2 < <(kubectl get rs --all-namespaces -o yaml | grep 'docker.sas.com' | grep -oP '(?<=image: ).*' )
mapfile -t image_array3 < <(kubectl get deployment --all-namespaces -o yaml | grep 'harbor.unx.sas.com' | grep -oP '(?<=image: ).*' )

image_array=( "${image_array1[@]}" "${image_array2[@]}" "${image_array3[@]}" )

# Step 2, sort this highly redundant array

sorted_image_array=($(echo "${image_array[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

# Now we have our target array, we can go to work.

echo Printing Current Image List:
echo ${sorted_image_array[@]}

Code 2: Creating a sorted image array from the existing replica sets and deployments

The variable SORTED_IMAGE_ARRAY now contains a list of all the running images on our worker node.

Update the nodes’ cache on a schedule

After sorting and de-duplicating the array, we make a quick check to make sure the repos are up. Failing to check could result in some extensive timeouts as each image would have to fail first and potentially throw off the execution cycle.

A privileged docker process then retrieves and/or updates each node’s images. We evaluated checking the image on the server to see if it was even necessary to pull the image, but the http GET checks proved to be just as heavy as letting the docker command determine the version was already up to date.

After retrieving all the images, we sleep for 60 minutes and increment our daily timer to indicate at least another hour has ticked by (sleep time + execution time).

# test for repo availability before we try and pull anything down
/usr/bin/curl --silent --connect-timeout 3 https://registry.unx.sas.com > /dev/null
if [ $? -eq 0 ]; then
  echo 'Repo registry.unx.sas.com is available'
  REGISTRY_REACH=TRUE
else
  echo 'unable to connect to registry.unx.sas.com'
  REGISTRY_REACH=FALSE
fi

/usr/bin/curl --silent --connect-timeout 3 https://harbor.unx.sas.com > /dev/null
if [ $? -eq 0 ]; then
  echo 'Repo harbor.unx.sas.com is available'
  HARBOR_REACH=TRUE
else
  echo 'unable to connect to harbor.unx.sas.com'
  HARBOR_REACH=FALSE
fi


# Step 3, if the repos are up, let's go out and do our fetching fetch.  After 20 runs we'll do a system prune.

if [ $REGISTRY_REACH -a $HARBOR_REACH ]; 
 then
   echo 'Repos are Available, Beginning Control Loop'
   echo 'Attempting Docker Pull from: harbor.unx.sas.com'
   for i in ${sorted_image_array[@]}
   do
    echo "Attempting Pull from $i"
    docker pull $i 
   done
  
   echo 'Sleeping a few seconds'
   sleep 3600
   echo "Total Daily Run Time = $daytimer"
   daytimer+=1

Code 3: Pre-seeding the nodes using the docker daemon on each node

Finally, clean up stale and orphaned images

After 20 passes, roughly a day’s worth of sleep and processing time, we pass the Docker system prune command and filter it based on the maintainer label (indicating our team manages the image and is thus an Analytics Cloud image). This prevents us from removing system level and infrastructure images necessary for core functions. With the 72-hour filter, we remove images not accessed or referenced within three days.

# Increment day timer until reaching one day.  Then, do a deep system prune 

   if (daytimer -ge 20); then 
      echo "RunTimer has reached $daytimer, beginning system prune" 
      docker system prune --filter "until=72h" --filter=label=maintainer="PDT <pdt@wnt.sas.com>" 
      echo "Prune Complete, resetting..." 
      daytimer=0 
   fi 
fi 
done

Code 4: Daily cleanup of images

Codebase: You can find the complete code repository for project Cedar located at https://gitlab.sas.com/adx/cedar. Feel free to drop the code owner an email at chris.johnson@sas.com for any questions.

Securing customer data

The next challenge for our team was how to secure all this customer data used by the application images referenced above. In the shared Kubernetes environment, the application images are common to many tenants, but the customer data is dynamic and unique to each tenant. Due to the constraint of storing the data on a shared network filer, the Kubernetes team decided to take every reasonable precaution to secure the data the application received via NFS mounts.

Enter SELinux

Security Enhanced Linux (SELinux), was originally developed by the NSA and released into the general Linux kernel in 2003. Today, nearly every major Linux contributor supports and develops SELinux as an enhancement to the inherent security in place inside the Linux operating system.

SELinux comes enabled by default on Linux installs, so chances are good you’ve encountered it already in your daily administration and use, but maybe you didn't realize it. If you've ever had read/write permission on a file, but found yourself unable to edit it - chances are great SELinux was acting as the gatekeeper.

SELinux expands the normal read/write/execute permissions available for files and resources by allowing a user to label a resource and process with a certain security context. The analytics cloud team uses a type of SELinux known as Multi-Category Security (MCaS). There are four types of SELinux enforcement, but the other three are beyond the scope of the article. Multi-Category security is a hybrid adaption of all the other three types. MCaS allows us to define a resource with the extended attributes of intended user, intended role, type, and security level. In addition, the security level has both a category and subcategory.

Before SELinux, in order to read a file, you only needed to be the owner of that file, or a member of a group with read access to that file. With SELinux enabled, you not only need the proper file level permissions, you need the proper user role, system role, type/label and context.

For instance, when we look at a one of the most sensitive files on the system, /etc/password for instance, we’ll see something like:

[centos@adx-us-p1-cmp06 etc]$ ls -alZ /etc/passwd 
-rw-r--r--. root root system_u:object_r:passwd_file_t:s0 passwd

Figure 3: Permissions for the /etc/passwd file

In addition to our normal rw permissions, we see in order to access the file, you need to be running as a system user; and that an object role would apply. The file is of type password_file, and it has a security context of 0 (high).

When we examine the SELinux permissions of a process such as httpd, a common web server known vulnerable to hacks, we note a few extended attributes.

[centos @adx-us-p1-cmp06 etc]$ ps -axZ | grep httpd 
system_u:system_r:httpd_t        3234         Ss     0:00 /usr/sbin/httpd

Figure 4: Permissions on the httpd process

We note this process also has a system user level, and a system role – however it has a type of http_t. With SELinux MCaS enabled, this process would be denied access to the password file, with the security event logged. This is precisely the behavior we want.

Analytics Cloud and SELinux

Here on the AC team (Customer Zero now), we use all the defaults found with SELinux, which are pretty good straight out of the box. But we wrote a custom policy file which would allow us to extend this same SELinux context to the NFS mounts for customer data, the tenant namespace, and all the relevant processes in that namespace.

If you aren’t familiar with the concept of a Kubernetes namespace, that’s okay for this article – just know that it’s a great way to separate customer resources and process from each other. You may also refer to the documentation on namespaces in Kubernetes.

This custom policy currently defines nine options, and all of them pertain to the way we mount and use the NFS data.

What’s key for us is that when we mount the NFS mount via a persistent volume claim (PVC), we assign the PVC the system level user and role. We present it as a “container” type, and we assign it the same pseudo random category and subcategory that is unique to the customer’s namespace and processes.

When we look at a volume mount, we’ll see something like this:

Figure 3: Typical NFS Data Mount for a container

And when we examine the customer namespace, we’ll see a matching security context (Figure 6).

centos@adx-us-p1-utl02  [mgmt:kube-system] $ k describe ns sas-adxc-t3000134 
Name:    sas-adxc-t3000134 
Labels:   adx.sas.com/expiration-date=1571529599 
              adx.sas.com/global=true 
              adx.sas.com/id=abb1f6a1-ccb7-4261-a567-b44078e58ae9 
              adx.sas.com/order-number=t3000134 
              adx.sas.com/region=us-p1 
              adx.sas.com/selinux-context=593-693 
              adx.sas.com/tenant=t3000134 

centos@adx-us-p1-utl02  [mgmt:kube-system] $ ps  -auxZ | grep container | grep 593 
system_u:system_r:container_t:s0:c593,c693 nfsnobo+ 45988 0.0  0.0 7428 6112 ? S    Sep14   0:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx 
system_u:system_r:container_t:s0:c593,c693 nfsnobo+ 45989 0.0  0.0 7924 2092 ? S    Sep14   0:00 nginx: worker process

Figure 6: Kubectl namespace description with SELinux context visible

Together, these two matching parameters mean only the proper tenant container, and further – only the processes that we want inside the container, can access the customer data. This is precisely the level of control we need. Should any hack attempt change the security context, it would in turn prevent access to the customer data. Watchdog programs would then detect a change in context, which would be logged and reported to our Global Information Security team.

Life cycle

When the client's license expires and cleanup begins, then the namespace and its labels are deleted. This returns the context to the available pool for assignment. By using a category and a subcategory, it is possible to create unique security contexts for 1024 x 1024, or over 1 million unique entries. Future enhancements or expansion of the SELinux concepts and options could allow for even more enhanced security. For instance, we could specify that category had to be an exact match, and subcategory could be used to allow multiple access based on a greater than or equal to match instead of exact matching. Such use might prove useful for future applications which required multi-tiered access levels for data based on user, group, or even application type. SELinux's robust feature set, and custom policy files make it an excellent fit and allow this platform and our team much room to grow.

Conclusions

The SAS Analytics Cloud environment has proven to be cutting-edge, delivering functionality and compute power in a new and innovative way. The technology behind Kubernetes has allowed our development team to present SAS users with a platform that can be created in a matter of literal minutes. That platform, images, and its applications are orchestrated by automatons such as Cedar to provide the best user startup and initialization experience possible. Mature features like SELinux work behind the scenes to keep large amounts of data secure in a leveraged environment.

It truly is an exciting time to be in IT!