BookmarkSubscribeRSS Feed

How to secure, pre-seed and speed up Kubernetes deployments

Started ‎11-27-2019 by
Modified ‎12-01-2019 by
Views 3,094

Editor's note: The SAS Analytics Cloud is a new Software as a Service (SaaS) offering from SAS using containerized technology on the SAS Cloud. You can find out more or take advantage of a SAS Analytics Cloud free trial.

This is one of several related articles the Analytics Cloud team has put together on while operating in this new digital realm. These articles address enhancements made to a production Kubernetes cluster by the support team in order to meet customer's application needs. They also provide guidance through a couple of technical issues encountered and the solutions they developed to solve these issues.

Articles in the sequence:

How to secure, pre-seed  and speed up Kubernetes deployment (current article)

Implementing Domain Based Network Access Rules in Kubernetes

Detect and manage idle applications in Kubernetes

Extending change management into Kubernetes


How to secure, pre-seed and speed up Kubernetes deployments

This article addresses two key areas of need for a Kubernetes deployment:


  1. How to keep the application launch responsive and snappy
  2. How to keep customer data secure 


Managing the deployed application



In theory, any containerized SAS application is portable to a Kubernetes environment. Unlike traditional hosting methods, the Analytics Cloud is a true, on-demand environment. Customers are not only consistently logging in and out, but also purchasing, consuming, and decommissioning projects without the intervention of a systems administrator. Online ordering, approvals, and automation take the place of contracts, procurement, and virtual machine creation. When operating near optimum subscription level the environment can be very busy, so it is important to make perfect use of every resource. 


Therefore, one application requirement for placement of an application into the Analytics Cloud marketplace is the ability to idle out and spin down the application. In this state the application consumes no resources outside of small ingress watchers listening for user sessions to begin.


Since applications spin down regularly, they must also spin up rapidly and regularly as well. Ideally, the end user validates their login and the Analytics Cloud watcher and launcher applications start the end-user's applications.  And ideally, this occurs before the user even notices the applications were ever idled down. 


Continuous delivery and constant change 

Continuous development means these applications images are subject to constant revisions and deployment.  Each new deployment results in a image with new tagging.  Different customers might be at different revision levels, further contributing to image sprawl.  The images exist in a rather busy central repository (in most cases that repository is


Examining a typical Kubernetes worker node and list the docker images in cache, we typically find hundreds of images like the one below: 




Last Update



8 days ago



11 mins ago


Figure 1: Example images and their tags from a working node.  Note the large size of the containerized application.


A full output's list would show hundreds of images, ranging in size from a few GB to very large images in the double digits of GB.


Keep it expedient

Problems arise quickly if a node receives the call to start a Programming Only, Analytics Container (POAC) or similar containerized application and the application is not able to be rapidly retrieved from cache. If it’s the first launch on this node since a version release, this requires image retrieval from the central repository before launching.  Additionally, if bringing on any dynamically added host capacity to address load, these new nodes will start with no application images in cache at all.


Before we adopted this project, even with the state-of-the-art hardware available in our production Kubernetes environment, the user still received a dialogue stating, “Please wait while the environment is prepared…” In addition, we incurred wait times of around two to five minutes while the image copied from the repository and launched. The UI and launcher code render the wait message while checking in the background for the environment's availability, and then launch the environment when services are responding.  


To the team, it was clear we would need a solution that would work in the background to scan for images which did not yet exist on the node. Since the program would be pre-seeding the images on the nodes or making sure the image exists preceding the launch, it seemed only fitting to launch a code-branch for project "Cedar."


Components and solutions


The workflow for Cedar is as follows:


  1. Establish a privileged container with access to the Docker socket on each node. 
  2. Set up the container for docker use. 
  3. Retrieve a list of utilized images. 
  4. Update the nodes’ cache on a schedule. 
  5. Clean up orphaned images.

Create a privileged container


We deploy the Cedar application as a Daemonset, meaning it will run on all the worker nodes. Since the utility and control nodes do not run customer images, we do not need to worry about the docker cache there. 


Cedar runs as a privileged container, because it will need the ability to update the docker cache by connecting to the docker socket. We must connect to this docker socket by carrying it into the container as a volume. 


The code below depicts a Docker YAML deployment file (important parts in bold).


apiVersion: v1 
kind: ConfigMap 
  name: sas-adxr-cedar 
  namespace: sas-adxr-system 
apiVersion: extensions/v1beta1 
kind: DaemonSet 
  name: sas-adxr-cedar 
  namespace: sas-adxr-system 
    tier: node 
    k8s-app: sas-adxr-cedar 
        tier: node 
        k8s-app: sas-adxr-cedar 
      hostNetwork: true 
      serviceAccount: sas-adxr-system 
        - image: 
          name: sas-adxr-cedar 
          imagePullPolicy: Always 
            privileged: true 
            - mountPath: /var/run/docker.sock 
              name: dockersock 
            - mountPath: /lib/modules 
              name: modules 
              readOnly: true 
            - mountPath: /etc/secret 
              name: imagesecret 
        - name: modules 
            path: /lib/modules 
        - name: imagesecret 
            secretName: image-pull 
        - name: dockersock 
            path: /var/run/docker.sock

Figure 2: Cedar Deployment File (YAML format)


Set the container’s docker setup 

Our code repository, Harbor, uses a set of login credentials we defined in a volume mount above. We create a symbolic link to this information for use with the docker CLI. We also initialize a daytimer value to zero. And finally, after 20 runs, we will clear out orphaned (unbounded) images.



#Create the SymLinks and configuration directory for docker.  The symlink will be created from our transported harbor-creds (see the deployment yaml)

mkdir /root/.docker
ln -s /etc/secret/.dockerconfigjson /root/.docker/config.json


Code 1: Initialization of the docker configuration to allow docker to run in the container.


Decide which images to cache

Ironically, the hardest part of Cedar's execution is performed in just a few simple lines.  The kubectl command provides us access to the replication sets (RS) and deployments for any running images.  By scanning the output for anything after the Image: text, we retrieve a list of images and their tags. 


We then use the bash mapfile utility to create an array containing the values for the images, sort it, and de-duplicate it with simple Linux commands.


while true

# Step 1, retreive the manifests in use.  This has changed quite a bit, but this seems to be the best way for now:

mapfile -t image_array1 < <(kubectl get rs --all-namespaces -o yaml | grep '' | grep -oP '(?<=image: ).*' )
mapfile -t image_array2 < <(kubectl get rs --all-namespaces -o yaml | grep '' | grep -oP '(?<=image: ).*' )
mapfile -t image_array3 < <(kubectl get deployment --all-namespaces -o yaml | grep '' | grep -oP '(?<=image: ).*' )

image_array=( "${image_array1[@]}" "${image_array2[@]}" "${image_array3[@]}" )

# Step 2, sort this highly redundant array

sorted_image_array=($(echo "${image_array[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

# Now we have our target array, we can go to work.

echo Printing Current Image List:
echo ${sorted_image_array[@]}

Code 2: Creating a sorted image array from the existing replica sets and deployments


The variable SORTED_IMAGE_ARRAY now contains a list of all the running images on our worker node.


Update the nodes’ cache on a schedule

After sorting and de-duplicating the array, we make a quick check to make sure the repos are up. Failing to check could result in some extensive timeouts as each image would have to fail first and potentially throw off the execution cycle. 


A privileged docker process then retrieves and/or updates each node’s images. We evaluated checking the image on the server to see if it was even necessary to pull the image, but the http GET checks proved to be just as heavy as letting the docker command determine the version was already up to date.


After retrieving all the images, we sleep for 60 minutes and increment our daily timer to indicate at least another hour has ticked by (sleep time + execution time).


# test for repo availability before we try and pull anything down
/usr/bin/curl --silent --connect-timeout 3 > /dev/null
if [ $? -eq 0 ]; then
  echo 'Repo is available'
  echo 'unable to connect to'

/usr/bin/curl --silent --connect-timeout 3 > /dev/null
if [ $? -eq 0 ]; then
  echo 'Repo is available'
  echo 'unable to connect to'

# Step 3, if the repos are up, let's go out and do our fetching fetch.  After 20 runs we'll do a system prune.

   echo 'Repos are Available, Beginning Control Loop'
   echo 'Attempting Docker Pull from:'
   for i in ${sorted_image_array[@]}
    echo "Attempting Pull from $i"
    docker pull $i 
   echo 'Sleeping a few seconds'
   sleep 3600
   echo "Total Daily Run Time = $daytimer"

Code 3: Pre-seeding the nodes using the docker daemon on each node


Finally, clean up stale and orphaned images

After 20 passes, roughly a day’s worth of sleep and processing time, we pass the Docker system prune command and filter it based on the maintainer label (indicating our team manages the image and is thus an Analytics Cloud image). This prevents us from removing system level and infrastructure images necessary for core functions. With the 72-hour filter, we remove images not accessed or referenced within three days.


# Increment day timer until reaching one day.  Then, do a deep system prune 

   if (daytimer -ge 20); then 
      echo "RunTimer has reached $daytimer, beginning system prune" 
      docker system prune --filter "until=72h" --filter=label=maintainer="PDT <>" 
      echo "Prune Complete, resetting..." 

Code 4: Daily cleanup of images


Codebase:   You can find the complete code repository for project Cedar located at   Feel free to drop the code owner an email at for any questions.


Securing customer data


The next challenge for our team was how to secure all this customer data used by the application images referenced above.  In the shared Kubernetes environment, the application images are common to many tenants, but the customer data is dynamic and unique to each tenant. Due to the constraint of storing the data on a shared network filer, the Kubernetes team decided to take every reasonable precaution to secure the data the application received via NFS mounts. 


Enter SELinux

Security Enhanced Linux (SELinux), was originally developed by the NSA and released into the general Linux kernel in 2003. Today, nearly every major Linux contributor supports and develops SELinux as an enhancement to the inherent security in place inside the Linux operating system.


SELinux comes enabled by default on Linux installs, so chances are good you’ve encountered it already in your daily administration and use, but maybe you didn't realize it.  If you've ever had read/write permission on a file, but found yourself unable to edit it - chances are great SELinux was acting as the gatekeeper.


SELinux expands the normal read/write/execute permissions available for files and resources by allowing a user to label a resource and process with a certain security context. The analytics cloud team uses a type of SELinux known as Multi-Category Security (MCaS). There are four types of SELinux enforcement, but the other three are beyond the scope of the article.   Multi-Category security is a hybrid adaption of all the other three types.   MCaS allows us to define a resource with the extended attributes of intended user, intended role, type, and security level. In addition, the security level has both a category and subcategory. 


Before SELinux, in order to read a file, you only needed to be the owner of that file, or a member of a group with read access to that file. With SELinux enabled, you not only need the proper file level permissions, you need the proper user role, system role, type/label and context


For instance, when we look at a one of the most sensitive files on the system, /etc/password for instance, we’ll see something like:


[centos@adx-us-p1-cmp06 etc]$ ls -alZ /etc/passwd 
-rw-r--r--. root root system_u:object_r:passwd_file_t:s0 passwd 

Figure 3: Permissions for the /etc/passwd file


In addition to our normal rw permissions, we see in order to access the file, you need to be running as a system user; and that an object role would apply. The file is of type password_file, and it has a security context of 0 (high). 


When we examine the SELinux permissions of a process such as httpd, a common web server known vulnerable to hacks, we note a few extended attributes.


[centos @adx-us-p1-cmp06 etc]$ ps -axZ | grep httpd 
system_u:system_r:httpd_t        3234         Ss     0:00 /usr/sbin/httpd 

Figure 4:  Permissions on the httpd process


We note this process also has a system user level, and a system role – however it has a type of http_t. With SELinux MCaS enabled, this process would be denied access to the password file, with the security event logged. This is precisely the behavior we want.


Analytics Cloud and SELinux 

Here on the AC team (Customer Zero now), we use all the defaults found with SELinux, which are pretty good straight out of the box. But we wrote a custom policy file which would allow us to extend this same SELinux context to the NFS mounts for customer data, the tenant namespace, and all the relevant processes in that namespace.


If you aren’t familiar with the concept of a Kubernetes namespace, that’s okay for this article – just know that it’s a great way to separate customer resources and process from each other. You may also refer to the documentation on namespaces in Kubernetes


This custom policy currently defines nine options, and all of them pertain to the way we mount and use the NFS data. 


What’s key for us is that when we mount the NFS mount via a persistent volume claim (PVC), we assign the PVC the system level user and role. We present it as a “container” type, and we assign it the same pseudo random category and subcategory that is unique to the customer’s namespace and processes.


When we look at a volume mount, we’ll see something like this:


Figure 3: Typical NFS Data Mount for a container


And when we examine the customer namespace, we’ll see a matching security context (Figure 6). 


centos@adx-us-p1-utl02  [mgmt:kube-system] $ k describe ns sas-adxc-t3000134 
Name:    sas-adxc-t3000134 

centos@adx-us-p1-utl02  [mgmt:kube-system] $ ps  -auxZ | grep container | grep 593 
system_u:system_r:container_t:s0:c593,c693 nfsnobo+ 45988 0.0  0.0 7428 6112 ? S    Sep14   0:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx 
system_u:system_r:container_t:s0:c593,c693 nfsnobo+ 45989 0.0  0.0 7924 2092 ? S    Sep14   0:00 nginx: worker process 

Figure 6: Kubectl namespace description with SELinux context visible


Together, these two matching parameters mean only the proper tenant container, and further – only the processes that we want inside the container, can access the customer data. This is precisely the level of control we need. Should any hack attempt change the security context, it would in turn prevent access to the customer data. Watchdog programs would then detect a change in context, which would be logged and reported to our Global Information Security team.


Life cycle


When the client's license expires and cleanup begins, then the namespace and its labels are deleted. This returns the context to the available pool for assignment. By using a category and a subcategory, it is possible to create unique security contexts for 1024 x 1024, or over 1 million unique entries. Future enhancements or expansion of the SELinux concepts and options could allow for even more enhanced security. For instance, we could specify that category had to be an exact match, and subcategory could be used to allow multiple access based on a greater than or equal to match instead of exact matching. Such use might prove useful for future applications which required multi-tiered access levels for data based on user, group, or even application type. SELinux's robust feature set, and custom policy files make it an excellent fit and allow this platform and our team much room to grow.




The SAS Analytics Cloud environment has proven to be cutting-edge, delivering functionality and compute power in a new and innovative way. The technology behind Kubernetes has allowed our development team to present SAS users with a platform that can be created in a matter of literal minutes. That platform, images, and its applications are orchestrated by automatons such as Cedar to provide the best user startup and initialization experience possible. Mature features like SELinux work behind the scenes to keep large amounts of data secure in a leveraged environment.


It truly is an exciting time to be in IT!




Version history
Last update:
‎12-01-2019 08:08 PM
Updated by:



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags