BookmarkSubscribeRSS Feed
shirishkamath
Obsidian | Level 7

Hi All,

 

We use SAS grid (interactive and batch) and IBM LSF (independent of SAS) in our enterprise. Consider one node in the grid that has RAM of 376GB. Any process that is spawned is allocated 1 GB of memory during start up and the SAS launch script has a mem limit of 7 GB. On the LSF app/ queue setting we have restricted the max mem limit to be 10 GB.

 

We are facing a resource crunch situation due to which one of the nodes goes into closed_busy status. On checking the indices for that node, we see memory utilization as being the plausible cause of closure. However, the output of 'free -g' is as follows:

 

free -g
total used free shared buff/cache available
Mem: 376 18 1 1 356 59
Swap: 15 3 12

 

I think it means that there is enough memory available on the server (buff/ cache col = 356). Also, on checking RTM the physical mem utilization of all process on that node seem reasonable (~1-2 GB for each of the 15-20 process. So total util of ~15-50 GB). However, in RTM for these processes, I also see the following. Is there any relation between V memory and physical memory utilization on a node from LSF/ SAS perspective? I thought LSF only sees the physical mem util on the node and closes it if its too high. In our case,  the physical mem util is not high. Also V memory utilization is high but only for a few secs. So why does LSF close the node?

Max V.Memory Size:7.565G
1 REPLY 1
doug_sas
SAS Employee

The host status 'closed_busy' usually indicates that one of the host's load indices has exceeded the amount specified in the lsf.cluster.<cluster_name> file in the LSF configuration directory. An example would be 

 

Begin   Host
HOSTNAME  model    type        server r1m  mem  swp  RESOURCES    #Keywords
apple    Sparc5S  SUNSOL       1     3.5  1    2   (sparc bsd)   #Example

In this example, if the 'r1m' load index exceeded 3.5 or the available physical memory used went below 1MB the host 'apple' would show as closed_busy.

Load index definitions can be found in the LSF Administration Reference (such as https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/load_indices.html)

 

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 731 views
  • 2 likes
  • 2 in conversation