BookmarkSubscribeRSS Feed
MG18
Lapis Lazuli | Level 10

Hi team ,

 

The Grid Control Server(26) is failed due to  a physical hardware/Network/Power failure . The rest of installation is available ( including the shared file system ) are up and running.

The common answer is the other nodes would take over. But in our case when are trying to make node as controller it is not becoming .

 

LSF services on nodes are running fine but when we tried to login as egosh services ,it is failing with below error.

 

Error:-

[sasinst@lxxxxt27 ~]$ egosh user logon -u Admin -x Admin

 

Cannot contact the master host. If you cannot start the cluster

successfully, refer to VEMKD and LIM log files for troubleshooting

information

I have started the lsadmin  and bsadmin services on 27 28 29 (rest of the node ) and then tried to login to egosh services but it is giving above error .

1 REPLY 1
doug_sas
SAS Employee

The lsf.conf file and the ego.conf file have parameters for the list of master hosts (LSF_MASTER_LIST & EGO_MASTER_LIST). The value of these variables is a space separated list of host names in order of priority.

 

If the dead master machine is the only host listed by these variables, LSF does not know what to do.

 

For example, if you have EGO_MASTER_LIST="myhost1.mydomain.com" then when myhost1 goes down, LSF does not know who to make the master. If, on the other hand, EGO_MASTER_LIST="myhost1.mydomain.com myhost2.mydomain.com", when myhost1 goes down, myhost2 becomes the master.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 588 views
  • 0 likes
  • 2 in conversation