Hi all,
My company is thinking of implementing a data warehouse for BI report generation with advanced analytics in mind.
We are thinking of using SAS as data warehouse due to analytics they provide and the supports available (especially security).
However, Hadoop seems to be the choice as data warehouse/lake. I am worried about the security though considering its still a maturing software and we dont have dedicated Hadoop/server developer yet.
So my questions is, would SAS be okay to use as data warehouse? We dont have that large set of data, few hundred Gb and they are quite structured. But we are looking to get unstructured data in the future (social media)
From my rather limited knowledge, best practicse seems to be using Hadoop as data lake and use SAS to process the data. But I dont know if this is correct.
If you could part with your knowledge and experience to help this lost one out, that will be much appreciated!
Hadoop is really designed for managing large amounts of data - terabytes not gigabytes. IMO if your data requirements are modest, say in the gigabyte range, then Hadoop isn't really required for good performance. You would be adding extra complexity with Hadoop for small performance improvements.
In my company we manage a small datamart of 2-3 terrabytes in total. We just use SAS, including VA and performance is still great. If we grew to over 5 terrabytes then Hadoop might then be worth considering.
From below link: "When you use SAS with Hadoop, you combine the power of analytics with the key strengths of Hadoop".
http://support.sas.com/documentation/cdl/en/hadoopov/68100/PDF/default/hadoopov.pdf
You've got the perfect question for contacting your local SAS office. I'm sure they'll be more than happy to support you in your decision process.
Hadoop is really designed for managing large amounts of data - terabytes not gigabytes. IMO if your data requirements are modest, say in the gigabyte range, then Hadoop isn't really required for good performance. You would be adding extra complexity with Hadoop for small performance improvements.
In my company we manage a small datamart of 2-3 terrabytes in total. We just use SAS, including VA and performance is still great. If we grew to over 5 terrabytes then Hadoop might then be worth considering.
Many thanks, guys, for your replies.
My main stumbling block to making a decision was whether not using Hadoop will mean relatively inefficient implementation. I had a feeling that the size of the data my company is dealing with would not benefit much by having Hadoop (especially considering ROI as we will need to recruit necessary IT people for it) .
Thanks SASKiwi for sharing your experience it helps to put my mind at ease when questions come flying in about why arent you using Hadoop.
@Patrick: Thank you for the link. I will be sure to read up on that as well! Although it seems to just explain the benfit of using Hadoop and SAS...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!