In this detailed "how to" video, Joydeep Bhattacharya explores the complexities of how SAS Event Stream Processing (ESP) handles streaming data aggregation. The presentation highlights two main approaches: in-memory aggregation and aggregation using external data stores, each with its own unique benefits and specific use cases. The primary role of ESP is to enable real-time analysis of streaming data, which is a crucial aspect of modern data processing.
Key Themes and Ideas:
- Streaming Data Aggregation as a Core Operation:
- Streaming data aggregation is depicted as a vital step for nearly every streaming use case, adding significant value and enabling real-time insights.
- The fundamental concept involves collecting and processing streaming events to derive meaningful summaries or statistics, such as calculating the average price of a stock over time.
- In-Memory Aggregation with ESP:
- Process: ESP performs data aggregation in memory based on incoming events, using "aggregate windows." These windows group events according to a specified "key field" (e.g., stock symbol) and then apply aggregation functions (e.g., average price) to each group.
- Performance: Aggregation is executed with high throughput and low latency because the data is stored in memory, ensuring rapid processing.
- Stateful Nature: Aggregate windows are stateful, meaning they retain incoming events and continuously update the aggregations.
- State Management with Retention: Due to their stateful nature, ESP employs retention policies to manage the size of in-memory data. Retention can be:
- Time-based (sliding or jumping): Data is retained for a specified duration (e.g., "last 5 seconds").
- Count-based (sliding or jumping): Only a certain number of the most recent events are kept.
- Aggregation with ESP using External Data Store
- Process: ESP uses StateDB windows that can write and read data from in-memory external data stores that can match ESP performance.
- Stateless nature: The ESP projects can themselves be stateless but store all the concerned data in the external data stores.
- State Management: The external dedicated data stores allow huge data for extended time keeping the ESP project performant and robust.
SAS ESP provides a flexible platform for real-time streaming data aggregation, capable of accommodating various requirements through either in-memory or external data storage options. The choice between these approaches depends on the specific needs of the use case. ESP empowers users to process streaming data efficiently and in real time.
Click on this link to learn more.