Red Hat’s GFS2 shared file system is a popular shared file system for SAS Grid Manager. If you have recently implemented, or are planning an architecture that requires individual nodes to perform roughly 800 MB/sec or faster throughput for an individual shared file system (e.g. each SASWORK or SASDATA), you need to understand about a new issue we have found with GFS2. Recent throughput testing has uncovered throttled WRITE performance due to a spin lock that controls journal writes to the specific GFS2 file system. The spin lock can become overwhelmed by handling journal locking for competing processes with high throughput. The ceiling for a single node, against single SASWORK or SASDATA file system has shown to be around 800 MegaBytes per second.
Short Facts:
- This does not affect READ Performance.
- This issue exists in all versions of GFS2, in both RHEL6 and RHEL7.
- Red Hat is currently working on the issue. Unfortunately there are changes to several non-trivial items in this process chain that need to be made. It is currently unknown how much the individual changes will affect the outcome of the fix plan and effectiveness. An estimated date for changes has not yet been published, and the change plan will be different for RHEL6 to RHEL7.
- Any SAS GRID host node trying to push more than 800 Megabytes per second towards any single file system (SASWORK, SASDATA) will find throughput throttled at that rate.
- Higher core counts (e.g. 8 cores and above) and high node application write requests are situations that can drive this throughput rate, and possibly cause this issue to present.
More information will be shared as soon as it is available.