SAS Event Stream Processing 2020 and later is now cloud-native and works in a Kubernetes environment like the rest of SAS Viya. While this type of major deployment change should be transparent to the end users, there are a few things that an ESP project developer needs to know to fully leverage this new environment. Let’s review some of them in this post.
Previously, in SAS ESP 6.2 and before, at least one ESP server must have been started initially for developers to be able to design an ESP project in good conditions, discover connectors parameters dynamically, query ASTORE files inputs and outputs and so on. Also, the same server was used to test the project as well.
Users’ ability to start their own ESP server on a linux server was probably a hurdle at some customer sites. So, some of them who didn’t want users to directly connect to the linux machines, may have implemented ESP servers as daemons to provision multiple ESP servers, shared between users. While it does the job, it probably offers less flexibility in terms of configuration options for those ESP servers (logging, python context, etc.).
With ESP 2020 on Kubernetes, an ESP developer doesn’t need to think about starting an ESP server for designing a project. He just has to connect to the ESP cluster provided by default (named according to the Kubernetes namespace).
Behind the scenes, there’s a continuously running ESP server available in a specific pod (named sas-event-stream-processing-client-config-server if you are curious) that handles all the tasks the “factory” server was previously doing: discover connectors’ properties, online analytical algorithms and ASTORE’s parameters, input and output variables.
When it comes to the testing of the project, the ESP developer will observe a new behavior. When the user triggers a test, a brand-new ESP server is started on demand in a new Kubernetes pod. Thus, the ESP server used to design the project and the ESP server used to test are 2 separate servers. If a user tests 5 ESP projects at the same time, ESP will launch 5 additional Kubernetes pods each running a single ESP server.
Since developers might have less control about the way ESP servers are started in Kubernetes, new customization capabilities have been added to the User Interface in both ESP Studio and Event Stream Manager so that users can specify ESP options or environment variables in their settings before testing or running a project. Here is how it looks like:
With ESP 2020 on Kubernetes comes a new execution model: an ESP server runs 1 and only 1 ESP project. Whether you are testing an ESP project or you are running an ESP project from ESP Studio or from Event Stream Manager (ESM), a new Kubernetes pod is instantiated with a single ESP server that will execute this ESP project. ESP engines are no longer used.
That’s one of the big impacts of Kubernetes. Accessing files in a physical directory on the host is no longer “easy”. All the paths you see inside the ESP environment are virtual paths. They don’t exist outside the cluster. So, you’ll need to work with your Kubernetes administrator to setup Kubernetes mechanisms like Persistent Volumes (PV) and Persistent Volumes Claims (PVC) to make files visible to the ESP pods and containers.
Why would you want to access files outside the Kubernetes cluster?
ESP adapters are no longer available outside of Kubernetes. Most of them have a corresponding connector. Thus, it is recommended to use the connector.
For adapters that don’t have a corresponding connector, you will have to use the Adapter Connector which allows you to call an adapter from a connector configuration.
It is still possible to interact with an ESP server using the REST API. However, finding the right endpoint is trickier than before.
Before, you had an ESP server listening on a machine on a HTTP port (5702 for example). The endpoint was independent of the projects running in this ESP server and looked like:
Now, with Kubernetes, you need the project name to build the endpoint. Indeed, the project name is used as the Kubernetes service name for that particular ESP server:
Nothing insurmountable. Except when you start to use ESP project names that are not compliant with DNS subdomains: names including capital letters, underscores, etc. In this case, the project name will be hashed to build a compliant service name. You can find this service name by going in the ESP Studio User Interface and discover that “host” name:
Project Name: my_IoT_Project
Service Name: my-5f-49o-54-5f-50roject
Similarly to REST API, using Python and ESPPy to communicate with ESP on Kubernetes is slightly different. In order to connect to an existing ESP server or to start dynamically a new ESP server in Kubernetes, you will need an additional object: a kubectl proxy that will be used as an interface between Python and ESP. Once you have it, the connection to ESP will look like this (assuming the kubectl proxy is listening on a given machine on port 8001):
This Python statement will connect to an existing “myproject” server or will create it on the fly.
ESP can be deployed in 2 ways:
When deployed without Viya, 2 main capabilities are not available:
While it was probably already an issue before when ESP developers didn’t have access to the linux boxes to get the full ESP log, it will be even more difficult for them to get access to the ESP logs residing in containers in pods in the Kubernetes cluster.
Good news with the recent versions of ESP Studio, the ESP developer can now access the ESP server log in ESP Studio.
Like in ESP Studio, a user doesn’t need to add ESP servers explicitly when they are instantiated in the Kubernetes cluster. ESP Streamviewer auto-discovers ESP servers.
In previous versions of ESP, developers might have used some of the ESP utilities provided out-of-the-box like the following ones:
While dfesp_xml_client can be easily replaced by the curl utility and dfesp_analytics role is handled by ESP Studio dynamically in the UI, the other 2 ones have no equivalent in the new SAS Viya.
One cool thing about Kubernetes is that it provides built-in mechanisms for autoscaling. ESP leverages them when you run ESP projects with Event Stream Manager. You can specify how many replicas are started initially and the maximum number of replicas that the project can use when it reaches some thresholds in terms of CPU usage or memory consumption. This works well with some types of ESP projects that do not have inter-dependencies.
Here is how you set this up in Event Stream Manager:
With ESP on Kubernetes, there are a few adjustments that a user should be aware of to fully take advantage of this new architecture. This article briefly highlights some of them. If you think of some other challenges, feel free to comment.
Thanks for reading.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.