Hi everyone,
I'm working on SAS ESP Studio on SAS Viya 4 to create an online model using the Train window, and I need some guidance on handling streaming data.
I have location data (latitude, longitude) for several vehicles, read into ESP Studio, a few rows at a time, from a flat CSV file. I've assigned opcodes for each event: the first event for each vehicle has an opcode of 'I' (insert), and subsequent events for that vehicle are marked with 'U' (update). The goal is to maintain only the latest location ping for each vehicle at any given time. Here is a snapshot of the csv file:
The Source window is correctly handling the events, maintaining a single record per vehicle by inserting the first event and updating/deleting previous ones as new data comes in. Following is a snapshot of how the source window results look like filtered for vehicle 1:
It can be seen that the opcode of the latest record changes to 'UB' (Update Block) for vehicle 1 (row_key 2). My aim is to use the most recent event for each vehicle while periodically training the online model on the fly. However, when I pass the output from the Copy window to the Train window, I encounter an error stating that the input window for Train window must only produce inserts.
To tackle this issue, I tried using Remove State window to reassign the Insert opcode to both Insert and Update events. However, this results in a list of all events associated with a vehicle, rather than just the most recent one.
Results of RemoveState window filtered for vehicle 1:
How can I ensure that the Train window only receives the most recent event for each vehicle, so that my online model is trained correctly?
Any advice or suggestions would be greatly appreciated!
Hi,
Can you please specify which online algorithm are you using?
I am using K-Means algorithm.
Ok I am not an analytics expert, but does K-Means algorithm consider a change in data which it has already seen? Will it not treat the updated data as new data?
Will it be possible for you to explain your requirement? That may help to suggest a better approach. If you want, you can set a call with me. My email id is joydeep.bhattacharya@sas.com
Whether you're already using SAS Event Stream Processing or thinking about it, this is where you can connect with your peers, ask questions and find resources.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.