08-03-2017 05:47 AM
Good Afternoon Community Members,
I am seeking opinions on the use of the Data Builder for data preparation.
There seems to be 2 schools of thought on the matter, based on developers I have discussed this with recently. Below are their positions as well as the pros and cons as they see it.
08-03-2017 08:44 AM
both points of view are correct And if you give a closer look, it is the same question as with most tools which intend to provide high performance to the users (SAS and non-SAS). The first one can be related to technical teams (administrators, installers, architects) and the other one is closer to users and marketing.
The solution to your question is both and none, but "managing expectations" and "risk management". Let me explain.
SAS VA has come to the market as a quite innovative tool in the SAS scenario, because it provides high-performance but also easiness of usage. But of course, careful with it, because as in any other similar tool, what you will need is what I mentioned above by:
- Making a separation between what you can be in control (batch jobs/queries and interative sessions from power users), and what you cannot be in control that easily (interactive sessions from non-SAS-expert users). From the first group you can be incontrol of usage of resources and performance. From the second group, not that much and you would need to consider the risks of any self-service service.
- This separation can be done by splitting the SAS Application Servers (SASApps) and the LASR servers, and granting/denying permissions on the SAS metadata. On each of them you can separate who can access, how many resources, even logging options.
If you ask me, I am more into the first group. As much data preparation to be done upfront, filtering all the data-lake, DWH, all data origins, and loading the necessary data for the users and reports into LASR/memory. No more, no less. With this, which we can consider what is loaded into LASR as the VA datamart, just a view of all the data from your company, a small fraction.
Then, the users will want to load their own data, or create their own queries with self-service. That is OK, but my advice for you is to do it on a different LASR server, allocating and limiting resources, and communicating to all your users about the architecture and capabilities.
As long as all have what they want, you will have happy users
08-04-2017 10:00 AM - edited 08-14-2017 07:02 AM
Since both these alternatives are declared as absolute, I don't agree with any of them.
VA itself hosts several types of use cases. Primarily enterprise wide reporting (good ol' BI), and explorative analytics.
For the first use case, data should be fed from a trusted and quality assured source - a data warehouse. Then it makes sense to use a standard/the same ETL for the whole data lifecycle. Like DI Studio if you are a 100% SAS shop.
But for other use cases when there are more of ad hoc analysis, you don't want to force users to specify their requirements on beforehand. In this scenario, i can't see why a self service approach shouldn't be acceptable. But of course, the environment needs a dialogue between users and the maintenance team, and guidelines so that it can eveolve in a cotrolled manner.