Running SAS Scoring Runtime Containers through AWS Fargate

4 Likes

After reading a comprehensive series from my colleague @RobCollum, in which he explains AWS Fargate and its role in serverless computing, I wanted to experiment with the idea of deploying SAS scoring runtime containers through AWS Fargate. By encapsulating SAS analytical models within lightweight, ephemeral containers and orchestrating their execution through Fargate, I envisioned a scenario where computational resources could be provisioned dynamically, on an as-needed basis, without the overhead of managing underlying infrastructure.

Let me guide you through a demo where I started with publishing a SAS analytical model to an AWS Elastic Container Registry (AWS ECR). This comes down in publishing the SAS model to a SAS Container Runtime image. After I used the SCR image as the source image in a Kubernetes deployment-service yaml that was applied on the same AWS EKS Kubernetes cluster where Viya is running. But I deployed that in a dedicated Kubernetes namespace named "sas-modelops-deployments". One of the reasons I used that namespace is because I configured it in such way that pods running here will launch Kubernetes nodes by using AWS Fargate. These nodes spin up quickly and once the pod is terminated the Fargate node will be terminated within seconds. I will launch the scoring pod and call the REST API endpoint with a "scoring.sh" script that is executed by launching a container with AWS Elastic Container Service (AWS ECS). By making use of AWS ECS, I'm copying the technique Rob used in his AWS serverless series . When you read the articles you will see that an AWS Batch Job and AWS Batch Queue is created. Also, you will understand it's another usage of AWS Fargate as the AWS Batch Job will run the Docker container in a AWS Fargate Batch Compute environment. The data that will be scored is coming from an AWS S3 bucket and the output scoring result is copied as well to the same AWS S3 bucket. Finally the scoring process can be monitored with AWS Cloudwatch as any event originating from the scoring job is sent to a specific log file that we can tail to see the latest updates.

Here's the AWS architecture that was used for the demo. For the rest of the article I will go in a bit more detail to each step of the process.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

1. Publishing a SAS analytical model to a AWS ECR SCR destination.

A SAS Gradient Boosting model was registered in Model Manager and published to a target destination named AWS Demo

Checking the details of the publishing destination one sees it's an AWS ECR container repository. How to create that is explained here.

Once it's published is ready to be consumed as container image. Copy the correct url so you can re-use that one in step 3 when deploying the SCR image as a pod/service on the Kubernetes cluster.

2. Configure an EKS Kubernetes "sas-modelops-deployments" namespace with AWS Fargate

On the EKS Kubernetes cluster a namespace "sas-modelops-deployments" is created.

As mentioned before EKS can be used to run pods on Amazon Fargate by integrating with upstream K8s APIs. At that moment you will run serverless pods using AWS Fargate. Before you can do you will have to extend your EKS cluster using AWS Fargate. I followed instructions from this AWS EKS getting started with Fargate link and this article. If everything went well you can inspect on the Compute tab of your EKS if you can see the Fargate profile.

Checking the details of that EKS Fargate Profile note I've associated it with my new created modelops namespace. Optional I could have added specific Kubernetes labels to the selector but here it's decided to run every pod from that namespace on AWS Fargate.

3. Deploy the SCR images as a Kubernetes deployment into AWS EKS

Based on a yaml file that I received from my colleague Hans-Joachim Edert I created and applied a similar yaml file below on the AWS EKS cluster. Please take a special note of the SAS_SCR_REST_API_TYPE environment variable. It's set to BATCH. That's a feature that's not yet documented but should come soon. It allows to execute/score batch payload. Another attention point is the AWS-loadbalancer annotation "service.beta.kubernetes.io/aws-load-balancer-scheme". Make sure it's set to internal like explained here. That way we request for an internal AWS-loadbalancer and as such the DNS name associated with the loadbalancer will resolve to an internal IP. In a later step we will make sure that the incoming scoring request will come from a container running in the same VPC. Pay attention as well to "service.beta.kubernetes.io/aws-load-balancer-subnets". I used the same subnets that are reserved for Kubernetes pods. The last extra thing that I had to do is adding the IP CIDR range of the VPC to the LoadbalancerSourceRange of the Kubernetes service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hmeqtestgradientboosting
  namespace: sas-modelops-deployments
  labels:
    app-owner: viya_admin
    app.kubernetes.io/name: hmeqtestgradientboosting
  annotations:
    CapacityProvisioned: 1vCPU 2GB
spec:
  replicas: 1
  selector:
    matchLabels:
      app-owner: viya_admin
      app.kubernetes.io/name: hmeqtestgradientboosting
  template:
    metadata:
      labels:
        app-owner: viya_admin
        app.kubernetes.io/name: hmeqtestgradientboosting
    spec:
      containers:
      - name: hmeqtestgradientboosting
        image: xxxxxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/hmeqtestgradientboosting:latest
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        imagePullPolicy: IfNotPresent
        env:
        # - name: "SAS_SCR_LOG_LEVEL_App.tk.MAS"
        # value: "INFO"
        # - name: "SAS_SCR_LOG_LEVEL_App.TableServices.DS2.Runtime.SQL"
        # value: "INFO"
        # - name: "SAS_SCR_LOG_LEVEL_App.TableServices.DS2.Runtime.Log"
        # value: "INFO"
        - name: "SAS_SCR_APP_PATH"
          value: "/score"
        - name: "SAS_SCR_REST_API_TYPE"
          value: "BATCH"
        securityContext:
          capabilities:
            drop:  
            - ALL
          privileged: false
          runAsUser: 1001
          runAsNonRoot: true
          allowPrivilegeEscalation: false
      restartPolicy: Always
      securityContext: {}
      imagePullSecrets:
      - name: ecr-secret
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.azure.com/mode
              operator: NotIn
              values:
              - system
---
apiVersion: v1
kind: Service
metadata:
  name: hmeqtestgradientboosting
  namespace: sas-modelops-deployments
  labels:
    app-owner: viya_admin
    app.kubernetes.io/name: hmeqtestgradientboosting
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: internal
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internal
    service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxx, subnet-yy
spec:
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 8080
  selector:
    app-owner: viya_admin
    app.kubernetes.io/name: hmeqtestgradientboosting
  type: LoadBalancer
  loadBalancerSourceRanges:
    - 192.168.0.0/16

So after the yaml is applied to your Kubernetes cluster you should see an internal AWS loadbalancer is created. When pinging the DNS check if it's resolving to an internal IP address.

You can also double check by inspecting the Kubernetes service in the sas-modelops-deployment namespace. It should have an "external IP" pointing to DNS name of the AWS loadbalancer.

Finally have a detail look to where the scoring pod is running:

So there's a new node that has a prefix fargate in it's name. So with that we confirm that the pod is running on Fargate.

4. Create Dockerfile with a customized "scoring.sh" script.

In a next step scoring.sh script was created. A payload.json is copied from an AWS S3 bucket. The kubeconfig of the EKS is downloaded. With that one can connect to the EKS cluster and can scale up the scoring deployment. After we wait till the scoring pod is pulled and running. Once running we can submit scoring request by executing the curl command sending the payload to the correct AWS loadbalancer endpoint that was created when the deployment was applied in step 3. Finally, results are sent to the same AWS S3 bucket, and the scoring deployment can be scaled back to 0.

aws s3 ls
aws s3 cp s3://xxx/aws-summit-london/sample_scr_batch_payload.json /tmp/sample_scr_batch_payload.json
aws eks --region eu-west-2 update-kubeconfig --name summit-london-eks
kubectl get nodes
ls -ltr /tmp

kubectl scale deployment -n sas-modelops-deployments hmeqtestgradientboosting --replicas=1
kubectl wait pods -n sas-modelops-deployments -l app.kubernetes.io/name=hmeqtestgradientboosting --for condition=Ready --timeout=120s

echo $(curl ident.me)

echo "50sec sleep start"
sleep 50
echo "sleep over"
cd /tmp
timestamp=$(date +""%d%m%y_%H%M%S)
echo $timestamp
cat /etc/hosts

curl -X POST -H "Content-Type: application/json" -m 10 -d @sample_scr_batch_payload.json http://k8s-sasmodel-hmeqtest-xxxx-yyyy.elb.eu-west-2.amazonaws.com/score -o /tmp/out.json
cat out.json | jq '.' > out_$timestamp.json

aws s3 cp /tmp/out_$timestamp.json s3://xxx/aws-summit-london/out_$timestamp.json
cat /tmp/out_$timestamp.json
kubectl scale deployment -n sas-modelops-deployments hmeqtestgradientboosting --replicas=0
#sleep 120

The scoring script is used in a Dockerfile.

#Build this container to run in AWS Batch

#Start with the latest AWS-CLI v2 image
FROM public.ecr.aws/aws-cli/aws-cli:latest
#Consider using "--platform=linux/amd64" if building on Apple Silicon

#Install additional software as needed
RUN yum update -y && yum install -y wget unzip

WORKDIR /tmp
RUN curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.20.4/2021-04-12/bin/linux/amd64/kubectl
RUN chmod 755 ./kubectl
RUN mv ./kubectl /usr/local/bin/kubectl
RUN kubectl version --short --client

#Copy your project code into the container
RUN yum install -y jq
COPY scoring.sh /

#Default execution
ENTRYPOINT ["/bin/bash","/scoring.sh"]

#Default parameters
CMD ["aws"]

You can build the container with this command:

docker build -t scoring .

5. Create AWS ECS, AWS Batch Queue and AWS Batch Job definition.

This step is more or less exactly what I've learnt from @RobCollum Serverless Operations in AWS, part 2: Deploy and Run .

If you followed Rob's series you will end up with an AWS Batch-hello-aws-fargate AWS Elastic Container Service. If you then repeat all his steps but using the scoring Docker image created in step4 you will have a 2nd AWS ECS like I have:

And with that when checking AWS Batch you should see extra scoring compute environment and and extra scoring queue.

Finally an extra AWS Job Definition was created which refers to the scoring Docker Container image that is pushed to the correct AWS ECR:

6. Launch AWS Batch Job running on AWS ECS using AWS Fargate.

It's time to launch the AWS Batch job and see the scoring in action. You can submit the job from a terminal window where AWS CLI is installed.

PROJECT=scoring
JOBDEF_NAME="$PROJECT-jobdef" 
QUEUE_NAME="$PROJECT-queue" 
NOW=`date +"%H%M%S"` # current time showing only MMSS.
JOB_NAME="$PROJECT-$NOW"
echo -e "\naws batch submit-job --job-name ${JOB_NAME} 
--job-definition ${JOBDEF_NAME} --job-queue ${QUEUE_NAME}"

aws batch submit-job --job-name scoring-110406 
--job-definition scoring-jobdef --job-queue scoring-queue

If you check from in AWS Console the AWS Batch Dashboard you should see that a new job is starting:

7. Monitor scoring execution with AWS CloudWatch

Going to AWS Cloud Watch allows you to follow all output events from the JOB launched in step6. AWS Fargate launch a container with an AWS role having the permission to interact with AKS. So remember in the scoring.sh script we download the Kubernetes kubeconfig and as such we can interact with the EKS Kubernetes cluster. Which means we can scale hmeqtestgradientboosting deployment to 1. That's also the moment that an AWS Fargate EKS node will start. So same happened when we deployed the SCR pod in step 3.

Once the scoring pod is running the incoming data can be scored. Note that I added a sleeping time of 50sec allowing some time to the SCR pod to startup it's scoring application.

Once scoring is done the special Kubernetes AWS Fargate node will disappear.

Thanks for reading.