BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

Hi,

 

I have to work with SAS in a very large datasets environment and we consider different options in order to have a good performance.

 

1) SAS Grid: we think is a good option because we don't have to rewrite SAS programs and the performance is good

2) SAS and Hadoop: in this envirornment we can execute sas programs in a computing distributed environment like Hadoop, but we are not sure ig our SAS programs will need to ne rewritten....does it depends on the kind of data steps or procs?

 

Another question: proc ds2 and proc fedsql can be used in both environments?

 

Last question: costs questions, now we have SAS EG and SAS Base licences in 1) we have to buy SAS Grid and 2) we have to buy SAS ACCESS to Hadoop. In a first estimation we find 2)optin more economical, are we right??, somebody has compared both?

 

Any advice will be greatly appreciated. 

 

Thanks in advance

 

8 REPLIES 8
Patrick
Opal | Level 21

@juanvg1972

You're asking a few big questions here. The answer what's right for you will very much depend on your current SAS usage, what problem you want to solve and your strategy for future SAS usage.

 

SAS Grid and Hadoop are not two exclusive alternatives but have different use cases and can very well both be part of the same architecture (I know of such topologies).

 

I'd suggest you contact your local SAS office and ask for their guidance/a proposal.

juanvg1972
Pyrite | Level 9

Thanks Patrick,

 

My main doubt is about if my actual SAS processes need to be rewritten if I choose SAS Access to Hadoop. My processes are data management and analytical (data steps, common procedures and statistycal procedures). My volume of data is going to grow and in the future perhaps still be growing.

 

Also I want to know if SAS Grid can be as scalable as Hadoop.

 

Before contacting SAS office I am gathering information. Any similar experience?

 

If anybody can help me I will be gratefuk (sorry for my bad english)

juanvg1972
Pyrite | Level 9

@Patrick

Patrcick,

 

I would like to know the different uses cases of both architectures that you mentioned, can you explain a little more??

 

Thanks 

Patrick
Opal | Level 21

@juanvg1972

How I see it:

A SAS Grid is about Scalability, High Availability and Workload Balancing, Hadoop is more about data storage/management and data lake.

 

If you're considering architectural changes to your current SAS platform which also will have license implications then you should really start talking to your local SAS office so that you can make an informed decision and end-up with something that's right for you.

SASKiwi
PROC Star

I think it would be helpful to provide more details on the problems you are having processing large datasets in your current environment. How big are these and how long is it taking to process them? What improvement are you aiming for? Have you investigated other options for speeding up processing like dataset compression, SPDE, SPD Server, storage hardware and so on?

 

I think you may be limiting your options by just focusing on SAS Grid and Hadoop.

LinusH
Tourmaline | Level 20
Just want to add that Grid not necessarily are good with handling large data sets. It can be considered if you have quite many queries and wish to scale up (and down?) quite easy. So if your use case is few but heavy on resource consumption Grid might not be you fit.
Also, Grid is available with Hadoop by using Yarn. Best of two worlds? Dont know since I haven't seen any in depth use case exposition.
Data never sleeps
TomKari
Onyx | Level 15

This is a very big question, with major cost and performance implications.

 

Reaching out to find others who have gone down these roads is a very good step. However, you probably won't find anybody who can either explain the difference in a way that is relevant to your requirement, or who has implemented either of these alternatives in a way that will completely shed light on your situation.

 

As others have suggested, talking to your SAS office is a good idea, as they have the customer knowledge to find the most relevant comparables. But be careful that in any comparisons that come up, you're doing a true apples to apples comparison. This will be very difficult.

 

If SAS is in a position to make some of their resources available for benchmarking, using either your data or synthetic test data that simulates your environment, that may help towards making a decision.

 

Once you've reached the end of these steps, if you don't have a clear indication of which alternative is superior, you may have to undertake a proof on concept yourself. It will be difficult, but the consequences of selecting an option that won't meet your ongoing needs are worse!

 

Tom

AndyWilliams86
Fluorite | Level 6

Hi.
I've worked on 2 sites now where the customer already had SAS grid for Hadoop installed. If the question is should my company be using SAS grid on Hadoop using yarn or the classic SAS grid on LSF then if you are going to have 20+ users use the LSF one otherwise you won't get any resource utilisation out of the servers and they will not be able to accept anymore jobs.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3085 views
  • 9 likes
  • 6 in conversation