In 2016, I wrote about the integration between CAS and YARN. This was for the first version of Viya (3.1).
As we are seeing more and more customers with large Hadoop clusters adopting Viya and CAS, it is a good time to revisit this topic, bust a few myths, and discuss some real use cases and architecture considerations.
The focus of this article is where:
Since SAS and Hadoop processing takes place on a set of shared host machines, it is important to consider the implications when the computing resources are utilized by both applications (see the diagram below).
|example of CAS and Hadoop YARN shared environment|
In this article we will discuss CAS and YARN integration, but to be clear, it is also possible to configure CAS successfully and execute processing with co-located Hadoop only for the storage part with HDFS, without the need for YARN integration.
But that is not discussed further as it is assumed that the reader wishes to understand how CAS and Hadoop resource management can be controlled using one approach, namely YARN. If CAS is not co-located with a Hadoop cluster, then you can stop reading now 🙂
Okay, let's start by reviewing the key points from my previous post that are still relevant in the latest Viya releases:
CAS can tell YARN to reserve computing resources for the CAS processing on each node. But to be clear, CAS should not be considered a YARN application.
The Apache Hadoop documentation provides a pretty good explanation and architecture overview for YARN (which stands for Yet Another Resource Negotiator). In a YARN cluster, you have two types of roles for the machines: NodeManager or ResourceManager.
The NodeManager is the machine where the task will be executed, inside a YARN Container.
The ResourceManager is not running tasks but it has two main components with different roles: Scheduler and Applications Manager.
|YARN Architecture - Source : Apache documentation|
So the scheduler is where you can define queues/resource pools (where to assign application), with specific rules like maximum percent of cluster usage for jobs in this queue or specific minimum/maximum containers size (Memory or CPU). Whereas the Application Manager is the one who keeps track of what is running where.
There are different kind of schedulers available in YARN, the most well-known are the "FairShare Scheduler" (default with Cloudera) and the "Capacity scheduler" (default with Hortonworks HDP) and they are working in different ways.
|The YARN scheduler web interface|
By default both Capacity scheduler and Fair Share scheduler, only take memory into account when doing their calculations (CPU requirements are ignored). All the math of allocations is reduced to just examining the memory required by resource-requests and the memory available on the node that is being looked at during a specific scheduling-cycle.
Since Viya 3.1 you can configure CAS with the cas.USEYARN configuration parameter set to "true". Here is what happens when you start the CAS service:
The CAS Controller log provides messages to confirm (for example with cas.MEMORYSIZE set to 4g):
|YARN Application for the CAS Server|
|YARN Containers for the CAS Server|
Note: Even if you limit the CAS memory usage with cas.MEMORYSIZE, it does not prevent you to load more data than this limit. Thanks to the files memory-mapped in the CAS Disk Cache, when the limit is exceeded, CAS can "page out" and "page in" required data blocks that would not be already available in the active or cached memory.
YARN works with the concept of vcores (short for virtual cores).
A vcore, is a usage share of a host CPU which YARN Node Manager allocates to use all available resources in the most efficient possible way.
The number of vcores that a NodeManager can use is defined in yarn.nodemanager.resource.cpu-vcores. It is generally set to the number of physical cores on the host, but administrators can increase or decrease it up if they wish to accommodate more or less containers on nodes. It is very important to find the proper value for this parameter.
In our research environment, we identified performance issues with Hadoop : only 2 or 3 people out of 20 concurrent users were able to perform their data management activities in the same period of time (Serial and parallel loading from Hive).
This initially surprised me since the users were simply running a Hive query against a small table. Similarly the initial performance when observing parallel loads of data using with the Embedded Process were not encouraging...Given the Hadoop cluster contains 3 workers (each with 2 cores and 16 GB RAM), I was expecting better performance!!!
Actually this observed sub-optimal performance was caused by the default YARN settings, the default values specified meant that only a very small number of containers were able to start…
Each container requires a minimum of 1 vcore, and with yarn.nodemanager.resource.cpu-vcores set to "2", it was only possible to start a maximum of 6 (2x3, for 2 vcores and 3 NodeManagers) containers across the whole cluster. Once all the 6 slots were taken the remaining jobs were failing or pending.
Given that 2 containers were required to start the MapReduce Job generated by each Hive queries, and 4 for each SAS EP mapreduce Job (in addition of the ApplicationMaster container), it became clear where the bottleneck was…
Changing the value of yarn.nodemanager.resource.cpu-vcores to "8" made instantaneously 24 vcores available to start containers in the cluster and then, all of a sudden, our 20 users were able to perform their hands-on in a reasonable time 🙂
There are many other parameters like this one that you can tune in YARN either at the NodeManager level or at the cluster level (for example maximum and minimum container sizes in term of vcores and memory) and they can impact the containers resource allocation, whatever the client application is requesting. (See the excellent SAS® Analytics on Your Hadoop Cluster Managed by YARN for detailed explanations of those parameters in the context of LASR/HPA usage.) When a client application requests resources to YARN, it generally provides not only a memory amount but also a number of vcores.
When CAS starts, to enforce the license restriction, it sets a cpu mask on the created threads telling the threads they can only run on certain cores. So no matter how YARN allocate the "vcores" on the CAS nodes, the CAS process will only be able to use the licensed CAS physical cores of the server.
With cas.USEYARN enabled, in Viya 3.3 when CAS starts up, the number of "vcores" asked to YARN is fixed to 1 (see log extract above). In Viya 3.4, there is a new parameter called env.CAS_ADDITIONAL_YARN_OPTIONS, which, as the name suggests, will allow additional parameters to be specified to YARN when starting the CAS server. For example:
Finally your Hadoop administrator, although it is not enabled by default, might have also configured YARN to use cgroups to limit the containers in the resource usage.
When activated in YARN, you can limit the cumulative CPU usage of all containers on each NodeManager to a defined percent of the machine.
If you have managed to read the first half of the article and are wondering why all of this is important 😊: it serves as a reminder how setting technical expectations during the technical architecture design of a Viya implementation early in the discussions with the various project stakeholders can be beneficial (e.g. saves time, can speed implementation time, avoids a poor end user experience experience).
This is key consideration and one which must be clearly understood : Once you have started CAS with the cas.USEYARN configuration parameter set to "true", then the amount of capacity in YARN is reserved for the duration of the CAS server process. Even if there are zero users utilizing CAS, a large part of the memory cannot be used by Hadoop related processing.
While this is true, it is important to understand that CAS is a service, not another Hadoop batch job that will run, terminate and release the resources. The CAS server needs to be up and running so it can provide a 24/7 Analytics service to the end-users community.
In the Hadoop vocabulary, CAS will be known as a "Long Running Services" and the YARN community is aware of the need to also run them in Hadoop and is working to improve their support (e.g. see this page).
When CAS starts up, it needs to start a container on each YARN NodeManager where CAS is installed.
For some Hadoop administrators who are not as familiar with SAS in-memory processing requirements, the request to ring fence resources for CAS (“long running service”) will require an assessment prior to implementing.
The third point seems obvious but on very large and busy Hadoop clusters there can be situations where: over the whole cluster there is more than enough memory to support the CAS memory request but a number of the NodeManagers are already using all the host machine RAM/CPU capacity. In such case, CAS will not start, because it needs to start a container on each node where CAS is deployed.
With Viya 3.3 only the CAS memory usage information is passed to YARN (there is no information about the CPU), the number of cores is hard-coded to 1 per container. However keep in mind that CAS will not be allowed to use more cores than what is defined in the license.
In addition with SAS Viya 3.4 there is also a new parameter cas.CPUSHARES to specify cgroup shares for the CAS server as a whole. This new parameter could be used in conjunction with YARN.
For example YARN jobs could be put in a cgroup with "X" shares, and start CAS with CPU shares "Y". YARN jobs would get X/(X+Y) of the CPU, and CAS would get Y/(X+Y) (assuming there were no other top level CPU cgroups on the host).
Finally with Viya 3.4, thanks to the new CAS environment variable CAS_ADDITIONAL_YARN_OPTIONS it will be possible to specify number of cores, queue, etc... for our "placeholder" containers.
There are many knobs that a Hadoop administrator can tune in YARN and in the Scheduler configuration using queues, node labels, placement rules, etc…depending on the distribution and associated Scheduler.
Also as already discussed with Viya 3.4, it will also be possible to pass more parameters to YARN at the CAS server startup.
Well... what can be done from the CAS side is documented and hopefully understood with the previous explanations.
The unknown, though is what can be done on the YARN side and will it be supported with CAS? The YARN capabilities depend on the YARN version and on the YARN community to continue developing advanced resource management features for YARN, such as "node labeling" and "pre-emption". (See this additional explanation on the node labels.)
YARN preemption can be configured in at the scheduler level:
|YARN Preemption configuration|
Depending on your Hadoop distribution and version, you will be using a more or less recent version of Hadoop and the YARN capabilities will depends on it.
For example, according to this ticket non-exclusive node labeling (and non-exclusive node labeling preemption) will work starting from 2.8.0 Apache Hadoop. And according to Hortonworks documentation in the latest Hortonworks HDP platform Apache Hadoop 2.7.3 is used.
Also Yarn is currently not able to pre-empt containers correctly if the requested containers are on specific nodes (which is what we do when we start the CAS Server) - Open Ticket
So clearly, "your mileage will vary" and there is no default support for such features.
However it is important to remember that even though CAS notifies YARN with "place holder" containers, CAS has its own focus in terms of resource consumption: if it is not limited through the license to a subset of cores it will try to use all the available CPU resource (whatever has been communicated to YARN :)).
First I'd like to thank Steve Krueger, James Kochuba, Stefan Park and Simon Williams for their help and contribution during the write-up of this article.
As explained in this post, CAS is NOT running inside a YARN container and there are still some limitations that the SAS and Hadoop technical teams working on Viya implementation project need to take into account.
So make sure that the correct expectations are set when discussing this integration and the coexistence challenges of CAS and Hadoop Applications inside the same YARN cluster.
The technical teams must clearly understand that on each host machine where CAS is installed, they will need to give up a part of the Hadoop cluster memory resource for the CAS usage. This amount will remain reserved for CAS (whatever the usage is) and CAS will not use more than this amount of memory. This amount is NOT dynamic depending on the CAS activity.
Finally, YARN does not manage everything in Hadoop like HDFS and Hive servers process/other servers. That is why YARN does not equal full OS resources, but is generally configured to use 60/70 % of the resources on each node manager.
Although this article focuses on YARN and workload management of CAS and Hadoop processing, the other important dimension when considering a co-located environment is the storage.
CAS does not only requires memory and CPU, but it also needs disk space for the CAS Disk Cache. So it is very important to make sure all the Hadoop node storage can provide enough space on local standard File Systems for the CAS Disk Cache, where CAS will "memory-map" its tables. When Hadoop is deployed on Vendors appliances with Highly Performant storage and many local drives attached (like Oracle BDA), it is very likely that 95% of this storage has been already formatted for HDFS.
So with CAS, it will be necessary to find storage that is exposed as a local directory to CAS for our CAS Disk Cache.
For all these reasons, technical teams working to deliver SAS implementations need to discuss if requirements can be best met by co-locating CAS with an existing Hadoop environment, or whether it is better to place CAS and Hadoop on different host machines.
Keeping CAS and Hadoop environments separated from one another (aka "remote deployments") may prove to be more appropriate when the existing Hadoop environment is consistently using large amounts of CPU and RAM. This ”remote deployment” configuration is made possible due to the Data connect accelerator and Embedded process) which allow effective movement and processing of data.When Embedded Process are deployed it can also help to have some dedicated networking infrastructure between CAS and Hadoop.
So, to sum up and I'll quote one of my favorite songs by The Offspring, the best practice is "unless you have no choice, you got to keep 'em separated."
Thanks for reading! Your questions and comments are welcome!
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.