Can someone please tell what is the difference as well as advantage and disadvantage between DI and EG in term of doing ETL? I see EG can do ETL also, but I really wonder the advantage using DI to EG in this matter. Thanks a bunch!!
Here's what I tell people (and I'm in R&D remember, not marketing...):
Data Integration (or Master Data Management, or ETL, or whatever you call it...) is more than just using tools to manage data -- it's a discipline. The DataFlux and Data Integration Studio solutions from SAS support that discipline as part of their workflow.
EG supports many of the same activities associated with ETL, but doesn't facilitate the workflow often required for large scale data management.
For ETL activities, especially those on a larger scale, I prefer the rigour that DIS brings. It's a complete metadata-driven solution.
On a practical level, I have found the following differences when using EG4.1 for ETL tasks:
- EG does not have a loop construct
- EG does not have pre-built auditing, pre and post processing, data validation, lookup, and data loader transformations.
- Creating and using indexes in DIS is a couple of mouse clicks......more difficult in EG.
- Deploying code is more cumbersome in EG. For each job, an EG project must first be run, exported, and then deployed in SAS management console (if using Platform scheduling). DIS allows multiple jobs to be deployed with a single 'click'.
- Modifying EG projects can be a little tricky. An EG project can't be edited until it has first been run in the current session (except for custom code). If custom code is being edited, the project must be re-run to ensure the correct code is exported. An easy step to forget.
- DIS SQL transformation is more advanced than the EG query window. eg for performance tuning, multi-column outer joins (changed under EG4.3?)
- No proc append transform in EG
I find EG preferable to DIS for speed of development and debugging, although DIS4.2 appears much improved in this regard (along with some other nice enhancements).
For most of the differences I have listed, there are alternative solutions or workarounds. As Chris mentions above, the scale of your ETL tasks and whether or not you choose to adopt the 'discipline' are probably more important factors.
A very fundamental difference is that EG is based on the physical structures of the atrifacts you're working with while DIS is based on their registered metadata. This means DIS is better equiped for a formal workflow of design-build-test while EG is more targeted at ad-hoc, analytical and end-user work.
What we often do is to prototype our ideas and concepts in EG and then implement them in DIS for formal development.