We are working on a project which has SAS VA (7.1) distributed architecture using hadoop. The stack also has SAS DI. My understanding is all data processing, transformation, detailed table creation should happen in SAS DI and is recommended approach. The data is then loaded in LASR server which is consumed by dashboard developer.
Another approach is do the processing, transformation, detailed table creation in SAS VA data builder and load in LASR server. So most of the DI jobs would be done in SAS VA data builder.
1. Which approach is recommended and what are the pros and cons.
2. Can DI jobs be created in SAS VA data builder. Is yes are there additional procs that needs to be used specific to hadoop
The answer to your question, should come from comparing the listed capabilities/features of each of these products.
SAS DI : SAS Data Integration Studio
SAS VA Data Builder: https://support.sas.com/documentation/cdl/en/vaug/67500/PDF/default/vaug.pdf
The list of features are too long to put in one answer, but I would recommend DI Studio when you have to
- Collaborate amongst multiple developers (Change Management - Check in/out)
- Scheduling Jobs
- Developing User Custom Transformations
Just my 2 cents.
Need to know the nature of your SAS/DW environment to give you an adequate answer.
But if you have a multi level DW, I would recommend to use DIS all the way to data marts (star schemas).
The you could use data builder to load selected parts of data mart to LASR.
There are some LASR loaders in DI, I haven't used them, so I can't really tell the pros/cons of this last part.
One Pro could be that you have a single point of metadata (DIS).
Data builder is a good tool for doing proof of concept work, but not the right tool for developing production processes. Specifically, there are two main drawbacks to using the VA data builder. First, data builder it is not a full functioning ETL tool and is not nearly as powerful as DI. Second, using the data builder tool produces a query that must be executed every time you want to load this table into memory. This produces both load on your data warehouse and needless network traffic. It is possible that you may wish to swap datasets in and out of memory as necessary and in this case you don't want to have to rebuild the table every time you load it if the data in it hasn't changed (like for a month-end snapshot table).
For this reason I recommend doing all data preparation in DI, culminating with the creation of a single "Analytical Base Table" capable of supporting your analysis task. This table would be in a "VA Ready State", would be registered in metadata and can be lifted directly into memory without any additional processing. In this scenario you would use the "Administrator Load" approach outlined on page 14 of the SAS Analytics Server Administration Guide.
I hope this helps.
PS. DI jobs can't be created in SAS VA data builder.
I have exact same question, I use SAS Enterprise Guide to generate data that are VA ready state, but I would like to know a way to keep it dynamic. I've already figured out how to make EG automatically run so the data would be updated, the question is, how do I get the data into VA automatically every time they have been updated or at a scheduled time.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.