Computing a new variable based on values coming from different rows of data is something very usual in traditional SQL-based data management. And SAS provides powerful and easy-to-use techniques to do that, among them RETAIN, LAG, etc. In real-time contexts, this need is probably less frequent but I found at least one use case where I needed to calculate a geographical distance between 2 GPS positions coming from 2 different events. And I will show you 2 methods to achieve this in the SAS Event Stream Processing engine. There are probably additional methods.
Analyzing an iPhone streaming data with SAS ESP, I wanted to go further and add some location data to the collected metrics. SensorLog (the iPhone app used to stream your iPhone real-time data to a socket or a file) allows you to do that. You can then run that app in your car while driving and simulate some “connected car” location data.
Well, let me get to the point. I have streaming data coming from my iPhone at a 30Hz speed. Each event contains the location (latitude and longitude) coordinates of the iPhone position. And of course I want to compute the distance between 2 consecutive points (meaning 2 consecutive events) as well as accumulate those distances into another field to derive the total distance travelled by my iPhone.
Here is an extract of the stream:
Here is my continuous query and the 2 main approaches I’ve used:
The first thing to do is to gather the 2 GPS coordinates coming from 2 different events into 1 event. To do that we will use special ESP aggregation functions:
Here is the Aggregate window XML code:
As a result, I will get locationLatitude, locationLongitude, previousLatitude, previousLongitude on the same event.
I can then do simple computations using the Compute window:
NB: geodistance_haversine is a new DataFlux Expression Language 2.7 function. It computes the distance between two geographical points. This formula provides greater accuracy, particularly for shorter distances. 1.609344 is used to convert miles to kilometers.
You can also use User Defined Function and accumulate distances in a separate variable.
You have to initialize some of your variables in an “Expression Engine Initialization Block”.
The second method is more familiar to SAS developers, because it uses ESP capabilities to run SAS DS2 code. Thus, you can rely on standard SAS capabilities to retain values from previous events.
DS2 code can be used in an ESP Procedural window using a DS2 Code Handler. DS2 is fully integrated into ESP, meaning that DS2 code is executed by the ESP engine. Always prefer using DS2 code instead of traditional Data Step code in a Procedural window. Traditional Data Step has to be executed in a SAS session outside of ESP, with a severe performance impact.
The following DS2 code enables us to:
Both methods are different not only in the design but also in the results. Computations results are different because of using different distance calculation functions (DataFlux vs SAS). But also in the number of output events. With the Aggregate window, we only maintain 1 event per device, which is continuously updated. With the Procedural window, we keep track of all the incoming events, assuming here that they are coming from only 1 device.