BookmarkSubscribeRSS Feed
b1oldie
Calcite | Level 5

How/where do you store your realtime data? And how do you provide access to it to end users? We are probably going to store the realtime that into the datalake, but I am not sure how to provide user-friendly access to the end users.

We use relational database for the batch data so users can easily interact with this data and maybe it would be great to ingest real-time data in this database as well, but idk.. Isn't it an antipattern a little bit? Or its a "normal"/usual way? Or maybe keep batch data in the RDMBS and use e.g. Trino to provide access to the realtime data in the datalake? But then again - users would have to query 2 data sources.. Or maybe join both (RDBMS and datalake) into Trino? (We use PostgreSQL + S3 as datalake).

Not really sure, appreciate any comment. Thank you.

1 REPLY 1
cornelius2
SAS Employee

Thanks for your question.
Basically, real time data can come in huge amounts. The question is, what do you want to do with it and how fast do you need to generate results?

1. Datastorage (SAS Viya or DBMS)
Certainly it makes sense to store real time data in a database. E.g. for the basis for the creation of analytical models. It should be noted, however, that databases can quickly become overwhelmed by the amount of data and the frequency of delivery.
can quickly become overwhelmed. A process that aggregates, compresses or, if possible, samples the data in real time and then stores it in the database will help.
2. Analyze data (Batch) (SAS Viya)
In order to work with the data, it is necessary that users have a system available that has at least the following features:
- Authorization concept
- Access interfaces to data sources
- Datamangement (Filter, Join, Calculations, Quality)
- Analysis
- Visualization
- for AI: Model creation, model management
3. Analyze data (Realtime) (SAS Event Stream Processing)
For real-time analyses, a system is also needed that requires the following properties in real time:
- Interfaces to Realtimedata
- Datamanagement (Filter, Join, Calculations, Quality)
- for AI: many different algorithms (Scoring, Machine Learning, Image&Video, Audio, Text, etc.)
- Interfaces to visualization, Datalake

Conclusion: When analyzing real time data, one should think about documenting the data routes and keeping them maintainable. Data sources should be usable by different users for individual questions.
Also, as few different systems as possible should be used for many requirements in order to avoid system discontinuities.

Note: SAS can fulfill all these requirements. I have entered the SAS solutions in parentheses as a guide.

Whether you're already using SAS Event Stream Processing or thinking about it, this is where you can connect with your peers, ask questions and find resources.

 

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 426 views
  • 1 like
  • 2 in conversation