Serverless Operations in AWS, part 2: Deploy and Run

1 Like

In Part 1 of this series, "Prep and Build", we decided to deploy some custom project code to run "serverless" in AWS. To that end, we've already built a container for our project called "hello-aws" and tested its operations.

Now in Part 2, "Deploy and Run", we'll proceed with registering and pushing the “hello-aws” container repository up to AWS ECR where it can be referenced to run by the AWS Batch Service. Then, we’ll define a compute environment in Fargate and register job definitions that reference our project container. At that point we'll be able to submit ad-hoc jobs and specify overriding parameters to the container so it produces the output desired.

Serverless architecture in AWS

You should recognize this illustration from the last post:

Select any image to see a larger version.

Mobile users: To view the images, select the "Full" version at the bottom of the page.

So far, we've accomplished that part at the very bottom of the image: creating a Docker container image. Let's get that pushed up to ECR where it can be called for running jobs in Fargate.

Register the container repository in ECR

Now we need to upload the "hello-aws" container repository up to the Amazon Elastic Container Registry.

> IAM permissions for ECR

Configuring IAM appropriately is a deep and important topic in its own right. As needed, I'll point out critical IAM documentation in this series to help point you in the right direction.

Picking up where we left off from Part 1, "Prep and Build", we have a container that's tested and ready to go. So now we will tag that container properly and then direct Docker to push it up to ECR and create a repository for "hello-aws".

On your PC (or wherever your build environment is), renew your logon with the AWS CLI (using "aws configure sso"). Then return to your working directory that contains the "Dockerfile" and "hello-aws.sh" files to execute the following commands:

AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
REGION="us-east-1"
MY_IMAGE="hello-aws"

# login ECR
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com

# create new repo in ECR
aws ecr create-repository --repository-name $MY_IMAGE --region $REGION

# tag the image so it finds its way home
docker tag ${MY_IMAGE}:latest $AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/$MY_IMAGE

# push image to ECR 
docker push $AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/$MY_IMAGE

When naming containers for private registries, it's expected to tag the image using the repo's host server name so that the "docker push" command can get it to its intended destination. Here, take care to note that naming/tagging the Docker image with your AWS account id and choice of region is important to ensure that everything lands in ECR where you want it to.

With the "hello-aws" repo in ECR, it'll be easy to reference for the jobs we'll define in AWS Batch.

Let's use the AWS CLI to confirm:

aws ecr describe-images --repository-name hello-aws

{
    "imageDetails": [
        {
            "registryId": "182999999954",
            "repositoryName": "hello-aws",
            "imageDigest": "sha256:766a6a4665ab3579d6bb0b0e3e7b3ca7baa743aa71caac52d0ac9f7d630a55c1",
            "imageTags": [
                "latest"
            ],
            "imageSizeInBytes": 246976214,
            "imagePushedAt": "2024-02-22T17:21:03-05:00",
            "imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "artifactMediaType": "application/vnd.docker.container.image.v1+json"
        }
    ]
}

If you prefer the point-and-click approach, then logon to the AWS Console > Elastic Container Registry > and click on "hello-aws" from the list of private registries. Click around a bit and you'll find that this is one case where you must employ the AWS CLI to get the task done - that's the only way to get setup to push the docker image up to ECR. When you're in the ECR Console page looking at your "hello-aws" image repo, there's a button at the top-right to "View push commands". That's what informs the CLI commands given above.

Note: This will be the case for most of the work we perform in this exercise: that there's a point-and-click UI approach that closely - but often not perfectly - mirrors the CLI approach. This post will focus on the CLI with the expectation that most values shown in JSON should map to UI elements in the AWS Console (when they're specified). If there's a major difference between the Console UI and AWS CLI, I'll try to point that out.

Your network access in AWS

Controlling network access into and out of your AWS environment is important to get right. For this post, however, I have no idea what your environment might be configured to offer. So I'm going to use a least-common denominator approach that I think will work for most people: referencing your account's default VPC, default subnet, and default security group.

# get the default VPC
MY_VPC=$(aws ec2 describe-vpcs --filters "Name=isDefault,Values=true" --query 'Vpcs[*].VpcId' --output text)

# get the default subnet(s)
MY_SUBNET=$(aws ec2 describe-subnets --filter "Name=vpc-id,Values=$MY_VPC" --query 'Subnets[*].SubnetId' --output json | grep -v '\[\|\]')

# get the default security group
MY_SG=$(aws ec2 describe-security-groups --filters "Name=group-name,Values=default" --query 'SecurityGroups[0].GroupId' --output text)

# Announce what we've got:
echo -e "Default VPC\t\t= $MY_VPC \nDefault Subnet(s)\t\t= $MY_SUBNET \nDefault Security Group\t= $MY_SG \n"

With results similar to:

Default VPC		        = vpc-f4999990
Default Subnet(s)       = "subnet-054bdfffffff45768"
Default Security Group	= sg-c999999b

We'll reference those environment variables in later scripts - so keep them handy. And if you're attempting to follow along using the AWS Console's web-based UI, you should expect to see "default" included in the names for these items when selecting them in prompts (even if you don't know what their ARN name is).

AWS Batch > Create Fargate compute environment

AWS Batch offers a service for managing the execution of your containerized workload. It allows for the use of your choice of compute backplane, including EKS, EC2, and Fargate. We've elected to go with Fargate for this effort because it's very low effort being a "serverless" offering that automatically handles the infrastructure for us. In addition to needing minimal administration on our part, it's also low-cost and yet scalable while able to get jobs to transition from submitted to running in very little time (seconds vs. minutes).

> IAM permissions for AWS Batch

Let's get ready to create our Fargate cluster:

PROJECT="hello-aws"
CE_NAME="$PROJECT-fargate"
JSON_FILE="createCE-$CE_NAME.json"

cat << EOF > $JSON_FILE
{
  "computeEnvironmentName": "$CE_NAME",
  "state": "ENABLED",
  "type": "MANAGED",
  "computeResources": {
    "type": "FARGATE_SPOT",
    "maxvCpus": 16,
    "subnets": [
      $MY_SUBNET
    ],
    "securityGroupIds": [
      "$MY_SG"
    ]
  }
}

EOF

echo -e "\nVerify JSON file [$JSON_FILE], then exec this command:"
echo -e "aws batch create-compute-environment --cli-input-json file://$JSON_FILE\n"

The result of this will be the creation of a file named "createCE-hello-aws-fargate.json" in your current directory. You're also instructed to review the contents of the JSON file, and if it looks okay, then execute the command to create the Fargate cluster in AWS. Be sure to confirm that the subnet and security group variables were resolved correctly.

When you execute the given AWS CLI command:

aws batch create-compute-environment --cli-input-json file://createCE-hello-aws-fargate.json

...then a successful result should look similar to:

{
    "computeEnvironmentName": "hello-aws-fargate",
    "computeEnvironmentArn": "arn:aws:batch:us-east-1:182999999954:compute-environment/hello-aws-fargate"
}

Now we've got a place to run our "hello-aws" container. It won't cost any money until we actually use it.

AWS Batch > Create job queue

AWS Batch provides queues to organize, prioritize, and direct workload. We'll need one to run our job in Fargate, too.

To create a job queue:

PROJECT="hello-aws"
QUEUE_NAME="$PROJECT-queue"
JSON_FILE="createJQ-$QUEUE_NAME.json"

CE_NAME="$PROJECT-fargate"
CE_ARN=$(aws batch describe-compute-environments --compute-environments "$CE_NAME" --query 'computeEnvironments[0].computeEnvironmentArn' --output text)

cat << EOF > $JSON_FILE
{
    "jobQueueName": "$QUEUE_NAME",
    "state": "ENABLED",
    "priority": 1,
    "computeEnvironmentOrder": [
        {
            "order": 1,
            "computeEnvironment": "$CE_ARN"
        }
    ],
    "tags": {
        "project": "hello-aws",
        "owner": "your.email@example.com"
    }
}
EOF

echo -e "\nVerify JSON file [$JSON_FILE], then exec this command:"
echo -e "aws batch create-job-queue --cli-input-json file://$JSON_FILE\n"

Note that we've elected to create tags for this resource - one specifically to identify your email address as the "owner" - so edit the JSON as appropriate. It's always a good idea to tag resources, especially if your IT team has standardized on the practice.

After that, executing the given command to create the job queue should return a result similar to:

{
    "jobQueueName": "hello-aws-queue",
    "jobQueueArn": "arn:aws:batch:us-east-1:182999999954:job-queue/hello-aws-queue"
}

This gives us a queue that will direct jobs to run in our Fargate cluster.

AWS Batch > Register job definition

The next step is to explain to AWS Batch how to run the code in our "hello-aws" project's container by registering a job definition.

> IAM role: ecsTaskExecutionRole

The AWS Console > Batch UI for registering a job definition defaults to the "ecsTaskExecutionRole" - so we'll use that in this example as it should be available to everyone. Ensure that it has the necessary permissions, policies, and trust relationships to run your project.

To register a job definition:

PROJECT="hello-aws"
REPO_URI="$(aws ecr describe-repositories --repository-name $PROJECT --query 'repositories[0].repositoryUri' --output text)"
ROLE_ARN="$(aws iam get-role --role-name ecsTaskExecutionRole --query 'Role.Arn' --output text)"
JSON_FILE="registerJobDef-$PROJECT.json"

cat << EOF > $JSON_FILE
{
  "jobDefinitionName": "$PROJECT-jobdef",
  "type": "container",
  "containerProperties": {
    "image": "$REPO_URI",
    "command": [
      "echo","hello, job definition in AWS Batch"
    ],
    "jobRoleArn": "$ROLE_ARN",
    "executionRoleArn": "$ROLE_ARN",
    "resourceRequirements": [
      {
        "value": "1.0",
        "type": "VCPU"
      },
      {
        "value": "2048",
        "type": "MEMORY"
      }
    ],
    "networkConfiguration": {
      "assignPublicIp": "ENABLED"
    },
    "fargatePlatformConfiguration": {
      "platformVersion": "LATEST"
    },
    "runtimePlatform": {
      "operatingSystemFamily": "LINUX",
      "cpuArchitecture": "X86_64"
    },
    "ephemeralStorage": {
      "sizeInGiB": 21
    },
    "environment": [],
    "secrets": [],
    "linuxParameters": {},
    "mountPoints": [],
    "logConfiguration": {
      "options": {},
      "logDriver": "awslogs",
      "secretOptions": []
    }
  },
  "platformCapabilities": [
    "FARGATE"
  ],
  "timeout": {
    "attemptDurationSeconds": 7200
  },
  "retryStrategy": {
    "attempts": 1
  },
  "parameters": {}
}
EOF

echo -e "\nVerify JSON file [$JSON_FILE], then exec this command:"
echo -e "aws batch register-job-definition --cli-input-json file://$JSON_FILE"

There's a lot going on here... so I'll call out some important items for you to confirm in the JSON:

The REPO_URI provided for the "image" should look similar to:
182999999954.dkr.ecr.us-east-1.amazonaws.com/hello-aws
The "jobRoleArn" and "executionRoleArn" should look similar to:
arn:aws:iam::182999999954:role/ecsTaskExecutionRole

Configuring these two parameters to specify the same role is a choice I made to keep things simple. The "executionRoleArn" is what gives AWS Batch the permission for the execution environment to pull images, create log streams, etc. The "jobRoleArn" is what gives the application code the permissions it needs to interact with AWS services.

"assignPublicIP" is "enabled". Remember earlier where we gathered your Default VPC, subnet, and security group identifiers. Well, the default rules don't allow traffic outbound to the Internet which gets in the way of AWS Batch accessing your container in ECR. Enabling a public IP for the job definition works around that.
"ephemeralStorage" = 21 GB. The "hello-aws" project doesn't use any storage, however this is a required parameter when registering a job definition using the AWS CLI and the minimum allowed is 21 GB.
We provided an override for the "hello-aws" container's default CMD. The job definition's execution will specify echo'ing "hello, job definition in AWS Batch" instead.

If the JSON looks okay, then execute the given command to register the job definition with results similar to:

{
    "jobDefinitionName": "hello-aws-jobdef",
    "jobDefinitionArn": "arn:aws:batch:us-east-1:182999999954:job-definition/hello-aws:1",
    "revision": 1
}

If you change this job definition and then re-register it with the same name, AWS Batch will automatically increase the revision number, defaulting to using the latest unless otherwise specified in the ARN reference.

Our job has a definition now - and importantly, we've demonstrated one of the override mechanisms to pass arguments into the container to change what it does.

AWS Batch > Submit a job

Finally, we made it! It's time to actually submit a job using our "hello-aws" container to run in Fargate. No more ado - let's go:

PROJECT="hello-aws"
QUEUE_NAME="$PROJECT-queue"
JOBDEF_NAME="$PROJECT-jobdef"  # defaults to latest jobDef revision
#JOBDEF_NAME="$PROJECT:1"      # including revision number is optional

NOW=`date +"%M%S"`             # current time showing only MMSS. 
JOB_NAME="$PROJECT-$NOW"

echo -e "\nSubmit job using:"
echo -e "\naws batch submit-job --job-name ${JOB_NAME} \
         --job-definition ${JOBDEF_NAME} --job-queue ${QUEUE_NAME}"

If you approve of how the "submit-job" command is structured, then execute it. You should get back a response similar to:

{
    "jobArn": "arn:aws:batch:us-east-1:182999999954:job/80436b30-c392-4306-8705-c9046fbdf8b3",
    "jobName": "hello-aws-2532",
    "jobId": "80436b30-c392-4306-8705-c9046fbdf8b3"
}

At this point, it's likely easiest to monitor your job and review its log from the AWS Console > Batch > Jobs interface. But the AWS CLI does offer interesting info, such as:

Get the job status:

aws batch describe-jobs --jobs YOUR_JOB_ID_HERE --query 'jobs[].status' --output text<

Get the job's log stream name:

aws batch describe-jobs --jobs YOUR_JOB_ID_HERE --query 'jobs[].container.logStreamName' --output text

View the content of the log stream:

aws logs get-log-events --log-group-name /aws/batch/job --log-stream-name YOUR_LOGSTREAM_NAME_HERE

AWS Batch > Submit a job with custom override

We're not done quite yet. Remember, we configured the "hello-aws" container to accept runtime parameters. And when we defined the job definition earlier, we specified an override there so the container would echo "hello, job definition in AWS Batch" instead of the original "hello, aws".

We can also specify an override to the CMD parameters when we submit the job, too:

NOW=`date +"%M%S"`           # current time showing only MMSS. 
JOB_NAME="$PROJECT-$NOW"

echo -e "\nSubmit job using:"
echo -e "\naws batch submit-job --job-name ${JOB_NAME} \
         --job-definition ${JOBDEF_NAME} --job-queue ${QUEUE_NAME} \
         --container-overrides '{\"command\":[\"aws\",\"ecr list-images\",\"--repository-name $PROJECT\"]}'"

You might recall that our "hello-aws" project basically does two things: 1) echo out a string or 2) execute the "aws" command you provide.

The latter is what we've attempted with this override. Looking at the log stream (and grep'ing out the most interesting bits, we should see it works:

aws logs get-log-events --log-group-name /aws/batch/job --log-stream-name hello-aws/default/2c55df205c504eeb8ec60d99d701e640 | grep message

            "message": "---",
            "message": "aws ecr list-images --repository-name hello-aws",
            "message": "---",
            "message": "{",
            "message": "    \"imageIds\": [",
            "message": "        {",
            "message": "            \"imageDigest\": \"sha256:b7caf09b5d526cdebc2d1b193f30fd20568c98223050f61745a5b697a6fa1b90\",",
            "message": "            \"imageTag\": \"latest\"",
            "message": "        },",
            "message": "        {",
            "message": "            \"imageDigest\": \"sha256:766a6a4665ab3579d6bb0b0e3e7b3ca7baa743aa71caac52d0ac9f7d630a55c1\"",
            "message": "        }",
            "message": "    ]",
            "message": "}",

That's the AWS CLI running inside the "hello-aws" container in AWS Fargate and showing the list of images from our repo in ECR. It's looking at "our" repo because the job inherits the role "we" used to submit the job (via "jobRoleArn" in the jobDef).

Troubleshooting

Since this is the first time you've run your project code in AWS, then you might discover a problem with it. Maybe the output doesn't look right, or perhaps it's simply failing to run.

> Problem cause: Specifying the wrong container build platform

Symptoms:

- Job status: FAILED

- Log error message: exec /bin/bash: exec format error

As a Mac user, I'm running on a machine which uses Apple Silicon technology for the Apple M2 chip as the main processor. As far as chip design goes, this is an ARM processor which is fundamentally different than the x86_64 architecture processors used by Fargate in AWS (and as specified in the job definition above).

So, to fix this error, when I use Docker to build my container, I need to specify the target architecture. In my Dockerfile, I must include the "--platform" parameter on my image's FROM: directive. Like so:

# Start with the latest AWS-CLI v2 image
FROM --platform=linux/amd64 public.ecr.aws/aws-cli/aws-cli:latest

After changing the Dockerfile, then:

Re-build the container image for your project
Re-push the image to the existing repo in ECR
Re-submit the job using the existing jobDef, queue, and compute env.

> Problem cause: Lacking IAM permissions

Symptoms:

- Job status: SUCCEEDED

- Log error message: Unable to locate credentials. You can configure credentials by running 'aws configure'.

There's a lot of possible scenarios here. One that I ran into is when the job says it ran okay, but when looking at the log stream, I saw the familiar error message above.

You might remember that from the Part 1, "Prep and Build" post where we were trying to run the AWS CLI inside the container and it gave that same message. We fixed it there by bind mounting your AWS credentials saved on your host PC to the container so they'd be available to the AWS CLI inside. That won't work here - instead we need to ensure that the "hello-aws" container is running with the appropriate IAM permissions. The idea is that it should inherit your current permissions used to submit the job.

In my case, I had neglected to include the "jobRoleArn" directive in the job definition. Without it, the application code was unable to inherit the permissions needed. Beyond that, simply specifying a valid IAM role might not be sufficient if it's not associated with the appropriate permissions and policies that the application requires.

After you fix it, then:

Re-register the job definition to include the "jobRoleArn" directive (and/or correct the IAM permissions behind it).
Re-submit the job using the updated jobDef, and existing queue and compute env.

> Problem cause: Code behaving badly

Symptom: You'll hafta figure that out for yourself - it's your code.

After you fix it, then:

Re-build the container image for your project
Re-push the image to the existing repo in ECR
Re-submit the job using the existing jobDef, queue, and compute env.

Debrief

We've really made a lot of progress here. We've gotten our project code deployed into AWS as a container repository in ECR. From there, we can use AWS Batch to submit jobs that run it in Amazon Fargate. Even better, we know how to send overrides to the container so we can change its behavior on the fly.

Look for Part 3 of this series, "Schedule and Monitor" where we will establish a schedule to run the job on a regular basis. Then we can monitor its activity using tools from CloudWatch and CloudTrail.

For more on this topic, @FrederikV takes these concepts and applies them to a SAS topic in Running SAS Scoring Runtime Containers through AWS Fargate.

Find more articles from SAS Global Enablement and Learning here.

SAS Communities Library