This article illustrates how you can perform a multi-way lookup in SAS Event Stream Processing (ESP).
It is well known that SAS ESP is an advanced streaming analytics tool that allows users to perform processing, management and analysis of data in motion. But what makes it that fast? Well, one of the reasons is having all incoming data automatically indexed using high-performance indexing mechanism (data indexing is a core requirement for SAS ESP). This essentially uniquely identifies every data entry incoming to SAS ESP and allows for a faster way of searching for data held in memory by ESP's engine for multi-thread processing of it later. Although this indexing mechanism allows you to perform operations with greater speed, it brings some limitations.
Imagine you have to correlate (join) data coming from two separate streams. No matter what type of join you perform, LEFT or RIGHT you will always have to use all indices from the dimensions table in a join condition statement. This is required in order to avoid breaking the uniqueness of the data entries held in ESP. Because of that, one can realise straight away that many-to-many (M-2-M) joins are very hard to achieve using out of the box functionality of SAS ESP.
Having said that, a valid question might pop up to one's mind: Is there a need for having M-2-M joins among ESP capabilities? Well, some might say that sometimes you need it. You might want to have such capabilities in case when you want to link a specific IP address from a network device to a set of previousely noted activities in the same region for exmple without having an exact match with regards to join keys (for example, in case you perform Cyber threat or network activity abnormalities analysis), or if you want to map (without exact match on key fields) purposefully amended financial transactions, made by a sender to avoid falling into sanctioned list (also known as Payment Stripping), to a list of those transactions which are identified as a potentially fraudulent during a Fraud analysis.
So how can we achieve this? Well, one way to do this is through utilising SAS ESP's procedural window capabilities i.e. a plug-in handler function.
You can find the complete source code for the plugin here. There you can also find instructions on how to build it for a target version of ESP.
So let us have a look at the following example. Let's assume that one has 4 streams of data: 1 fact stream (a stream that contains constantly updated data) and 3 dimension streams (streams that contain rarely updated or static reference data).
So in order to correlate (join) data in ESP, out of the box, one will use ESP join window and will build a directed graph (continuous query or ESP model) as follows:
Remark: As one can see the result of the join of 3 dimension streams is cascaded flow (diagram). Also worth mentioning that in case number of dimensions is significant the diagram may become less practical or easy in terms of the further modification. Also as was mentioned this does not support M-2-M joins.
Note: read more here on out-of-the-box stream correlation (joining) capabilities of SAS ESP.
Now, let us have a look at how one can achieve the same thing with the "multi-way lookup plugin" mentioned above. In this case, one will end up with the vertical-like flow (diagram).
Let's agree that fact stream data is being received by fact window. 3 dimension streams data is being received by dim1,dim2, and dim3 windows respectively. Data joining is performed by mwl procedural window. Let's have a closer look at the mwl procedural window.
mwl is a procedural window that is connected to (edged with) the factwindow. In order to enable multi-way lookup capabilities, one needs to register an input handler in form of a plug-in called lookup from the plugin library called proc-mwl-x.y (depends on the version of the ESP engine being used, in our case x.y = 5.1). One also needs to enable a context plugin parameter for mwl from proc-mwl-x.y library called initMWLcontext. In order to define join conditions and selection one should also use a procedural property map for mwl in the following way:
for join conditions: parameter nameis jc4w_(++), where (++) represents an incremental condition counter e.g. jc4w_1, jc4w_2 etc.; parameter value should comply with the following pattern:
esp_project/esp_query/esp_dimension_window : fact_join_field == dimension_join_field e.g. if dimension data resides in a window called dim1which is located in query mwl of the project test_project, fact data contains a field called data1 and dimension data contains a field called key the value of the parameter should look like this: test_project/mwl/dim1 : data1 == key;
for join selections: parameter nameis js4w_(++), where (++) represents an incremental selection counter e.g. js4w_1, js4w_2 etc.; parameter value should comply with the following pattern:
esp_project/esp_query/esp_dimension_window : dimension_selection_field, mwl_field e.g. if dimension data resides in a window called dim1which is located in query mwl of the project test_project, multi-way lookup window schema contains a field called data1_j1 and dimension data contains a field called data1 the value of the parameter should look like this: test_project/mwl/dim1 : data1, data1_j1;
in case one of the join conditions is not using a key from a dimensional data (M-2-M case) agenkeynameparameter needs to be defined (otherwise join will be returning first entry from the found match group) with parameter value pointing to a beforehand created key field in the multi-way lookup schema. e.g. if multi-way lookup window schema contains a beforehand defined field called a_key the value of the parameter should be a_key
Look at an example of ESP project (XML model) that uses multi-way lookup here.
... View more