Editor's note: The SAS Analytics Cloud is a new Software as a Service (SaaS) offering from SAS using containerized technology on the SAS Cloud. You can find out more or take advantage of a SAS Analytics Cloud free trial.
This is one of several related articles the Analytics Cloud team has put together on while operating in this new digital realm. These articles address enhancements made to a production Kubernetes cluster by the support team in order to meet customer's application needs. They also provide guidance through a couple of technical issues encountered and the solutions they developed to solve these issues.
How to secure, pre-seed and speed up Kubernetes deployment (current article)
Implementing Domain Based Network Access Rules in Kubernetes
Detect and manage idle applications in Kubernetes
Extending change management into Kubernetes
This article addresses two key areas of need for a Kubernetes deployment:
In theory, any containerized SAS application is portable to a Kubernetes environment. Unlike traditional hosting methods, the Analytics Cloud is a true, on-demand environment. Customers are not only consistently logging in and out, but also purchasing, consuming, and decommissioning projects without the intervention of a systems administrator. Online ordering, approvals, and automation take the place of contracts, procurement, and virtual machine creation. When operating near optimum subscription level the environment can be very busy, so it is important to make perfect use of every resource.
Therefore, one application requirement for placement of an application into the Analytics Cloud marketplace is the ability to idle out and spin down the application. In this state the application consumes no resources outside of small ingress watchers listening for user sessions to begin.
Since applications spin down regularly, they must also spin up rapidly and regularly as well. Ideally, the end user validates their login and the Analytics Cloud watcher and launcher applications start the end-user's applications. And ideally, this occurs before the user even notices the applications were ever idled down.
Continuous development means these applications images are subject to constant revisions and deployment. Each new deployment results in a image with new tagging. Different customers might be at different revision levels, further contributing to image sprawl. The images exist in a rather busy central repository (in most cases that repository is https://harbor.unx.sas.com).
Examining a typical Kubernetes worker node and list the docker images in cache, we typically find hundreds of images like the one below:
Image |
Tag |
Last Update |
Size |
harbor.unx.sas.com/infra-dev/adx/poac |
v1.1.20 |
8 days ago |
11.2GB |
harbor.unx.sas.com/infra-dev/adx/poac |
Prod |
11 mins ago |
10.1GB |
Figure 1: Example images and their tags from a working node. Note the large size of the containerized application.
A full output's list would show hundreds of images, ranging in size from a few GB to very large images in the double digits of GB.
Problems arise quickly if a node receives the call to start a Programming Only, Analytics Container (POAC) or similar containerized application and the application is not able to be rapidly retrieved from cache. If it’s the first launch on this node since a version release, this requires image retrieval from the central repository before launching. Additionally, if bringing on any dynamically added host capacity to address load, these new nodes will start with no application images in cache at all.
Before we adopted this project, even with the state-of-the-art hardware available in our production Kubernetes environment, the user still received a dialogue stating, “Please wait while the environment is prepared…” In addition, we incurred wait times of around two to five minutes while the image copied from the repository and launched. The UI and launcher code render the wait message while checking in the background for the environment's availability, and then launch the environment when services are responding.
To the team, it was clear we would need a solution that would work in the background to scan for images which did not yet exist on the node. Since the program would be pre-seeding the images on the nodes or making sure the image exists preceding the launch, it seemed only fitting to launch a code-branch for project "Cedar."
The workflow for Cedar is as follows:
We deploy the Cedar application as a Daemonset, meaning it will run on all the worker nodes. Since the utility and control nodes do not run customer images, we do not need to worry about the docker cache there.
Cedar runs as a privileged container, because it will need the ability to update the docker cache by connecting to the docker socket. We must connect to this docker socket by carrying it into the container as a volume.
The code below depicts a Docker YAML deployment file (important parts in bold).
apiVersion: v1
kind: ConfigMap
metadata:
name: sas-adxr-cedar
namespace: sas-adxr-system
data:
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: sas-adxr-cedar
namespace: sas-adxr-system
labels:
tier: node
k8s-app: sas-adxr-cedar
spec:
template:
metadata:
labels:
tier: node
k8s-app: sas-adxr-cedar
spec:
hostNetwork: true
serviceAccount: sas-adxr-system
containers:
- image: registry.unx.sas.com/infra-dev/adx/cedar:prod
name: sas-adxr-cedar
imagePullPolicy: Always
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/run/docker.sock
name: dockersock
- mountPath: /lib/modules
name: modules
readOnly: true
- mountPath: /etc/secret
name: imagesecret
volumes:
- name: modules
hostPath:
path: /lib/modules
- name: imagesecret
secret:
secretName: image-pull
- name: dockersock
hostPath:
path: /var/run/docker.sock
Figure 2: Cedar Deployment File (YAML format)
Our code repository, Harbor, uses a set of login credentials we defined in a volume mount above. We create a symbolic link to this information for use with the docker CLI. We also initialize a daytimer value to zero. And finally, after 20 runs, we will clear out orphaned (unbounded) images.
#!/bin/bash
#Create the SymLinks and configuration directory for docker. The symlink will be created from our transported harbor-creds (see the deployment yaml)
mkdir /root/.docker
ln -s /etc/secret/.dockerconfigjson /root/.docker/config.json
daytimer=0
Code 1: Initialization of the docker configuration to allow docker to run in the container.
Ironically, the hardest part of Cedar's execution is performed in just a few simple lines. The kubectl command provides us access to the replication sets (RS) and deployments for any running images. By scanning the output for anything after the Image: text, we retrieve a list of images and their tags.
We then use the bash mapfile utility to create an array containing the values for the images, sort it, and de-duplicate it with simple Linux commands.
while true
do
# Step 1, retreive the manifests in use. This has changed quite a bit, but this seems to be the best way for now:
mapfile -t image_array1 < <(kubectl get rs --all-namespaces -o yaml | grep 'harbor.unx.sas.com' | grep -oP '(?<=image: ).*' )
mapfile -t image_array2 < <(kubectl get rs --all-namespaces -o yaml | grep 'docker.sas.com' | grep -oP '(?<=image: ).*' )
mapfile -t image_array3 < <(kubectl get deployment --all-namespaces -o yaml | grep 'harbor.unx.sas.com' | grep -oP '(?<=image: ).*' )
image_array=( "${image_array1[@]}" "${image_array2[@]}" "${image_array3[@]}" )
# Step 2, sort this highly redundant array
sorted_image_array=($(echo "${image_array[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
# Now we have our target array, we can go to work.
echo Printing Current Image List:
echo ${sorted_image_array[@]}
Code 2: Creating a sorted image array from the existing replica sets and deployments
The variable SORTED_IMAGE_ARRAY now contains a list of all the running images on our worker node.
After sorting and de-duplicating the array, we make a quick check to make sure the repos are up. Failing to check could result in some extensive timeouts as each image would have to fail first and potentially throw off the execution cycle.
A privileged docker process then retrieves and/or updates each node’s images. We evaluated checking the image on the server to see if it was even necessary to pull the image, but the http GET checks proved to be just as heavy as letting the docker command determine the version was already up to date.
After retrieving all the images, we sleep for 60 minutes and increment our daily timer to indicate at least another hour has ticked by (sleep time + execution time).
# test for repo availability before we try and pull anything down
/usr/bin/curl --silent --connect-timeout 3 https://registry.unx.sas.com > /dev/null
if [ $? -eq 0 ]; then
echo 'Repo registry.unx.sas.com is available'
REGISTRY_REACH=TRUE
else
echo 'unable to connect to registry.unx.sas.com'
REGISTRY_REACH=FALSE
fi
/usr/bin/curl --silent --connect-timeout 3 https://harbor.unx.sas.com > /dev/null
if [ $? -eq 0 ]; then
echo 'Repo harbor.unx.sas.com is available'
HARBOR_REACH=TRUE
else
echo 'unable to connect to harbor.unx.sas.com'
HARBOR_REACH=FALSE
fi
# Step 3, if the repos are up, let's go out and do our fetching fetch. After 20 runs we'll do a system prune.
if [ $REGISTRY_REACH -a $HARBOR_REACH ];
then
echo 'Repos are Available, Beginning Control Loop'
echo 'Attempting Docker Pull from: harbor.unx.sas.com'
for i in ${sorted_image_array[@]}
do
echo "Attempting Pull from $i"
docker pull $i
done
echo 'Sleeping a few seconds'
sleep 3600
echo "Total Daily Run Time = $daytimer"
daytimer+=1
Code 3: Pre-seeding the nodes using the docker daemon on each node
After 20 passes, roughly a day’s worth of sleep and processing time, we pass the Docker system prune command and filter it based on the maintainer label (indicating our team manages the image and is thus an Analytics Cloud image). This prevents us from removing system level and infrastructure images necessary for core functions. With the 72-hour filter, we remove images not accessed or referenced within three days.
# Increment day timer until reaching one day. Then, do a deep system prune
if (daytimer -ge 20); then
echo "RunTimer has reached $daytimer, beginning system prune"
docker system prune --filter "until=72h" --filter=label=maintainer="PDT <pdt@wnt.sas.com>"
echo "Prune Complete, resetting..."
daytimer=0
fi
fi
done
Code 4: Daily cleanup of images
The next challenge for our team was how to secure all this customer data used by the application images referenced above. In the shared Kubernetes environment, the application images are common to many tenants, but the customer data is dynamic and unique to each tenant. Due to the constraint of storing the data on a shared network filer, the Kubernetes team decided to take every reasonable precaution to secure the data the application received via NFS mounts.
Security Enhanced Linux (SELinux), was originally developed by the NSA and released into the general Linux kernel in 2003. Today, nearly every major Linux contributor supports and develops SELinux as an enhancement to the inherent security in place inside the Linux operating system.
SELinux comes enabled by default on Linux installs, so chances are good you’ve encountered it already in your daily administration and use, but maybe you didn't realize it. If you've ever had read/write permission on a file, but found yourself unable to edit it - chances are great SELinux was acting as the gatekeeper.
SELinux expands the normal read/write/execute permissions available for files and resources by allowing a user to label a resource and process with a certain security context. The analytics cloud team uses a type of SELinux known as Multi-Category Security (MCaS). There are four types of SELinux enforcement, but the other three are beyond the scope of the article. Multi-Category security is a hybrid adaption of all the other three types. MCaS allows us to define a resource with the extended attributes of intended user, intended role, type, and security level. In addition, the security level has both a category and subcategory.
Before SELinux, in order to read a file, you only needed to be the owner of that file, or a member of a group with read access to that file. With SELinux enabled, you not only need the proper file level permissions, you need the proper user role, system role, type/label and context.
For instance, when we look at a one of the most sensitive files on the system, /etc/password for instance, we’ll see something like:
[centos@adx-us-p1-cmp06 etc]$ ls -alZ /etc/passwd
-rw-r--r--. root root system_u:object_r:passwd_file_t:s0 passwd
Figure 3: Permissions for the /etc/passwd file
In addition to our normal rw permissions, we see in order to access the file, you need to be running as a system user; and that an object role would apply. The file is of type password_file, and it has a security context of 0 (high).
When we examine the SELinux permissions of a process such as httpd, a common web server known vulnerable to hacks, we note a few extended attributes.
[centos @adx-us-p1-cmp06 etc]$ ps -axZ | grep httpd
system_u:system_r:httpd_t 3234 Ss 0:00 /usr/sbin/httpd
Figure 4: Permissions on the httpd process
We note this process also has a system user level, and a system role – however it has a type of http_t. With SELinux MCaS enabled, this process would be denied access to the password file, with the security event logged. This is precisely the behavior we want.
Here on the AC team (Customer Zero now), we use all the defaults found with SELinux, which are pretty good straight out of the box. But we wrote a custom policy file which would allow us to extend this same SELinux context to the NFS mounts for customer data, the tenant namespace, and all the relevant processes in that namespace.
If you aren’t familiar with the concept of a Kubernetes namespace, that’s okay for this article – just know that it’s a great way to separate customer resources and process from each other. You may also refer to the documentation on namespaces in Kubernetes.
This custom policy currently defines nine options, and all of them pertain to the way we mount and use the NFS data.
What’s key for us is that when we mount the NFS mount via a persistent volume claim (PVC), we assign the PVC the system level user and role. We present it as a “container” type, and we assign it the same pseudo random category and subcategory that is unique to the customer’s namespace and processes.
When we look at a volume mount, we’ll see something like this:
Figure 3: Typical NFS Data Mount for a container
And when we examine the customer namespace, we’ll see a matching security context (Figure 6).
centos@adx-us-p1-utl02 [mgmt:kube-system] $ k describe ns sas-adxc-t3000134
Name: sas-adxc-t3000134
Labels: adx.sas.com/expiration-date=1571529599
adx.sas.com/global=true
adx.sas.com/id=abb1f6a1-ccb7-4261-a567-b44078e58ae9
adx.sas.com/order-number=t3000134
adx.sas.com/region=us-p1
adx.sas.com/selinux-context=593-693
adx.sas.com/tenant=t3000134
centos@adx-us-p1-utl02 [mgmt:kube-system] $ ps -auxZ | grep container | grep 593
system_u:system_r:container_t:s0:c593,c693 nfsnobo+ 45988 0.0 0.0 7428 6112 ? S Sep14 0:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
system_u:system_r:container_t:s0:c593,c693 nfsnobo+ 45989 0.0 0.0 7924 2092 ? S Sep14 0:00 nginx: worker process
Figure 6: Kubectl namespace description with SELinux context visible
Together, these two matching parameters mean only the proper tenant container, and further – only the processes that we want inside the container, can access the customer data. This is precisely the level of control we need. Should any hack attempt change the security context, it would in turn prevent access to the customer data. Watchdog programs would then detect a change in context, which would be logged and reported to our Global Information Security team.
When the client's license expires and cleanup begins, then the namespace and its labels are deleted. This returns the context to the available pool for assignment. By using a category and a subcategory, it is possible to create unique security contexts for 1024 x 1024, or over 1 million unique entries. Future enhancements or expansion of the SELinux concepts and options could allow for even more enhanced security. For instance, we could specify that category had to be an exact match, and subcategory could be used to allow multiple access based on a greater than or equal to match instead of exact matching. Such use might prove useful for future applications which required multi-tiered access levels for data based on user, group, or even application type. SELinux's robust feature set, and custom policy files make it an excellent fit and allow this platform and our team much room to grow.
The SAS Analytics Cloud environment has proven to be cutting-edge, delivering functionality and compute power in a new and innovative way. The technology behind Kubernetes has allowed our development team to present SAS users with a platform that can be created in a matter of literal minutes. That platform, images, and its applications are orchestrated by automatons such as Cedar to provide the best user startup and initialization experience possible. Mature features like SELinux work behind the scenes to keep large amounts of data secure in a leveraged environment.
It truly is an exciting time to be in IT!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.