09-30-2017 06:56 PM
I have to work with SAS in a very large datasets environment and we consider different options in order to have a good performance.
1) SAS Grid: we think is a good option because we don't have to rewrite SAS programs and the performance is good
2) SAS and Hadoop: in this envirornment we can execute sas programs in a computing distributed environment like Hadoop, but we are not sure ig our SAS programs will need to ne rewritten....does it depends on the kind of data steps or procs?
Another question: proc ds2 and proc fedsql can be used in both environments?
Last question: costs questions, now we have SAS EG and SAS Base licences in 1) we have to buy SAS Grid and 2) we have to buy SAS ACCESS to Hadoop. In a first estimation we find 2)optin more economical, are we right??, somebody has compared both?
Any advice will be greatly appreciated.
Thanks in advance
09-30-2017 09:17 PM
You're asking a few big questions here. The answer what's right for you will very much depend on your current SAS usage, what problem you want to solve and your strategy for future SAS usage.
SAS Grid and Hadoop are not two exclusive alternatives but have different use cases and can very well both be part of the same architecture (I know of such topologies).
I'd suggest you contact your local SAS office and ask for their guidance/a proposal.
10-01-2017 04:40 AM
My main doubt is about if my actual SAS processes need to be rewritten if I choose SAS Access to Hadoop. My processes are data management and analytical (data steps, common procedures and statistycal procedures). My volume of data is going to grow and in the future perhaps still be growing.
Also I want to know if SAS Grid can be as scalable as Hadoop.
Before contacting SAS office I am gathering information. Any similar experience?
If anybody can help me I will be gratefuk (sorry for my bad english)
10-01-2017 04:51 AM - edited 10-01-2017 04:52 AM
I would like to know the different uses cases of both architectures that you mentioned, can you explain a little more??
10-01-2017 06:15 AM - edited 10-01-2017 07:47 AM
How I see it:
A SAS Grid is about Scalability, High Availability and Workload Balancing, Hadoop is more about data storage/management and data lake.
If you're considering architectural changes to your current SAS platform which also will have license implications then you should really start talking to your local SAS office so that you can make an informed decision and end-up with something that's right for you.
10-04-2017 03:35 PM
I think it would be helpful to provide more details on the problems you are having processing large datasets in your current environment. How big are these and how long is it taking to process them? What improvement are you aiming for? Have you investigated other options for speeding up processing like dataset compression, SPDE, SPD Server, storage hardware and so on?
I think you may be limiting your options by just focusing on SAS Grid and Hadoop.
10-01-2017 09:24 AM
10-01-2017 10:25 AM
This is a very big question, with major cost and performance implications.
Reaching out to find others who have gone down these roads is a very good step. However, you probably won't find anybody who can either explain the difference in a way that is relevant to your requirement, or who has implemented either of these alternatives in a way that will completely shed light on your situation.
As others have suggested, talking to your SAS office is a good idea, as they have the customer knowledge to find the most relevant comparables. But be careful that in any comparisons that come up, you're doing a true apples to apples comparison. This will be very difficult.
If SAS is in a position to make some of their resources available for benchmarking, using either your data or synthetic test data that simulates your environment, that may help towards making a decision.
Once you've reached the end of these steps, if you don't have a clear indication of which alternative is superior, you may have to undertake a proof on concept yourself. It will be difficult, but the consequences of selecting an option that won't meet your ongoing needs are worse!