10-13-2014 12:27 PM
I'm working with SAS VA 6.4 loading data coming from a DWH
In the Visual Data Builder I make an extensive use of joins between fact tables and dimensional table (a fact table joined with less then 10 dimensional tables), to load star schema in memory, and, I apply transformation logic on some resulting fileds to prepare the data for analysis/exploration.
I guess what are the pros and cons in loading data from a join of several tables.
Infact when I execute this kind of joins I notice that it takes biblical time to execute the operation.
That's the reason of my question.
I should prepare the data in the DWH environment, before to load it in memory as a denormalized table?
The option should be to build a star schema (selecting this operation fom the Data builder menu) but this option:
Thanks in advance for your support.
10-16-2014 03:09 PM
Hi, yes, VA will use implicit passthrough for most databases (assuming one database) and attempt to push down the join and transformation logic (if supported in that database). In 6.4 if you use the self service importer (in exploration or reporting) the resulting table will be loaded via parallel load (for a distributed environment) for faster loading speed. Alternatively, you can transform and prune in your DWH, load the dimension and fact tables separately and do an in-memory join. Tradeoffs will depend on your specific situation. What kinds of transformation logic do you need to do?
10-16-2014 04:36 PM
Thanks I-Kong for the answer.
For transformation logic I mean the ability to define calculated fields based on the initial ones using the functions that SAS provides.
The scenario is the following.
I am loading data from a Teradata DWH.
My users want to analyze facts, the sales, from different point of view: the Customer, the Territory, the Product etc.
They want to attach and detach, on the fly, one or more o these dimensional tables from the fact table to produce several analyses on the data.
The dimensional tables are the same for the entire enterprise and can be used attaching them to several others facts tables.
To reduce data redundancy coming from entirely denormalized tables, obtained pushing down the join and transformation logic in Teradata, I'm thinking to make the join after loading separately the facts and dimensional tables in LASR memory in the Visual Data Builder for example.
In this way each Data Steward, deciding which in memory join execute, can define the appropiate star schema, with eventually calculated fields, to publish to the data analyst and report builders.
This one is the more flexible way, I have found, that let data analysis and exploration.
The other option should be the construction of a Star Schema in the Data Builder. But this option limit on the selection of the fields forcing to move the build of calculated fields to the report or to the exploration phase.
What is your opinion pros and cons of this two scenario?
What is more efficient way in term of speed of analysis?
.....and in term of memory allocated in the TD720 nodes?