10-01-2015 09:10 AM
Does data need to be ingested through SAS data loader in order to provide all the features listed here: (browse data, DQ profiling, etcC)
Or can it just be integrated on an existing hadoop platform with data in hdfs. We already have an ingestion/ETL tool for all our data in and out of hadoop so not looking for another one.
10-01-2015 09:43 AM
Short answer: yes.
More elaborately: ETL is usefully a tool for a centrally managed repository, such as a data warehouse- A place where everything is integrated, and possible to audit. Big Data on the other hand is a more free environment, where you should be able to quickly analyze different kind of data. Ad hoc analysis and, perhaps even ad hoc loading of data. So this is where I think Data Loader is positioned. It's not positioned as an ETL tool (but technically, it is kinda).
10-02-2015 11:50 AM
Assuming your Hadoop environment meets the specifications for SAS Data Loader, you don't actually need to ingest data through SAS Data Loader to use its features. SAS Data Loader can work with data that has been put in Hadoop by any tool as long as the data has been registered in Hive. More specifically, you need to have the data layout described in HCatalog in Hive. If you have data in HDFS that has not been described in Hive you can use tools like Hue to describe the data in HDFS.