BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JuanS_OCS
Amethyst | Level 16

@doug_sas That was indeed helpfull. OK, I think we are getting closer.

 

2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessService: name=sas_obj_spwn_utf8_ha_svc, numNeeded=3, numRunning=0
2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessService: host AAAAAAAAA is not FREE, it is INVALID
2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessService: host BBBBBBBBB is not FREE, it is INVALID
2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessService: host CCCCCCCCC is not FREE, it is INVALID
2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessService: no hosts available
2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessService: Exit
2020-03-02T13:51:55Z TRACE [00000015] sasinst - haManagerProcessServices: service sas_obj_spwn_utf8_ha_svc has 0 running, 0 launchErrors, 0 runErrors, state=ADDED

I think this explains a little bit more. Do you know why it is recognizing the hosts as "not FREE" and "INVALID"? Or how I can get more information about how SWO gets those messages?

 

 

 

 

doug_sas
SAS Employee

This is where you need to send the entire log to tech support to get it analyzed. 

JuanS_OCS
Amethyst | Level 16

Yes, that is done, although there is not really much more than that.

doug_sas
SAS Employee

You may need to turn on the App.Grid.SGMG.Log.Master logger to trace to get the status of the hosts. 

JuanS_OCS
Amethyst | Level 16

Hi @doug_sas ,

 

the hosts are OPEN-OK.

 

could you please explain the message "host is not FREE, it is INVALID", and the meaning of FREE/not FREE and INVALID? as they are, I think, undocumented.

 

Thank you in advance.

 

doug_sas
SAS Employee

FREE means the host is available to be used for that service.

INVALID means that the service failed to start on that host so many times it has been taken out of consideration for running the service.

 

The real problem is that the service fails to run on the daemons. Hopefully the reason why will be in the daemon log files when you run with the App.Grid.SGMG.Logs.HA logger set to trace on the daemons.

JuanS_OCS
Amethyst | Level 16

Thanks so much, this helps me to better understand, certainly.

 

For me to feel sure, when you refer to make the changes on the daemons, and setting trace to daemons... what you mean is to not change this setting in the SWO GUI interface, but in the config/Lev1/Grid/logconfig.xml // config/Lev1/Grid/logconfig.trace.xml , and the daemon would be sgmg.sh?

doug_sas
SAS Employee
Yes. Use <config>/<LevX>/Grid/logconfig.trace.xml on all daemons as the logging configuration.
JuanS_OCS
Amethyst | Level 16

Got you. Thanks, will do.

JuanS_OCS
Amethyst | Level 16

Hi @doug_sas , everyone,

 

an update and good news. I managed to get this working for one ObjectSpawner. 


The issue in SWO was in a mistake/typo, hard to recognize, unless you set up the logs in the daemons as you mentioned.

 

 

User=>sasinst <
haManagerInit: Cannot authenticate user.

As this is really hard to see in the SWO GUI, I updated through JSON, ensuring no funny characters are included, with Notepad++

 

 

In regards of the logs, I like a lot the fact that the strings are delimited.

 

@doug_sas In regards of the GUI, I would suggest an improvement: first, in the front-end, a neat js script check would help, and some further information ("ADDED" without error needs further description IMHO). In the back-end, a trim and a validation that the user can validate, would prevent a lot of headaches and troubleshooting in the future. In the logs, "Cannot authenticate user" should be an error, definitely, not INFO.

 

If you could pass this to the responsible team, that would be great. If you want, I could create an entry in the SASBallot ideas, here in the communities.

 

I will now implement the same for the rest of Object Spawners, and then for the WIP database and I will drop an update to keep the Knowledge Base.

 

For now, a summary:

 

  • Implement the provided sample script with custom improvements to be done to the sample script, in order to capture the PID of the Object Spawner. Beware, the script should not write, at all, the PID file generated by the Object Spawner itself. The script needs only read it once, to be able write the value into its own PID file as the sample script proposes.
nohup $command > $log_filename 2>&1 &
  # Modify to pick up the PID generated by ObjectSpawner.sh itself - Juan Sanchez
  #pid=$!
  #echo $pid > $pid_filename
  sleep 1
  spwn_pid_filename=/sas_application/sasconfig/comp/config/Lev1/ObjectSpawnerUTF8/server.${thisHost}.pid
  spwn_pid=`cat $spwn_pid_filename`
  echo $spwn_pid > $pid_filename
  #
    echo "${now} ${script}: Service ${name} (pid $pid) is started"
}

 

  • Test that the sample script works OK for stop, start, status and restart, from CLI. If script does not work, SWO won't either, of course.
  • Configure SWO accordingly if the HA/failover will be in Active-Active (such as Object Spawners) or Active-Passive (such as the WIP database).
    • For the first case, "Number of instances" must be the number of your Grid nodes (or a lower value in case you don't want/need to enforce all the nodes).
    • For the second case, a value of 1.
  • HA service is configured, it is highly advised double to check all the values as the js support is limited at the moment
  • In case something runs unexpected (Troubleshooting) further understanding of the SWO mechanics must happen. You can do this getting support from SAS Technical Support or by yourself with below tips:
    • (Optional) Disable the SWO HA service created
    • Stop the daemons (sgmh.sh) in every node: /path/config/Lev1/Grid/sgmg.sh stop
    • (Optional) Preferably, clean/archive the current SWO logs
    • Backup logconfig.trace.xml and logconfig.xml
    • Add a line in logconfig.trace.xml under "Grid Debug Loggers" block
    • <logger name="App.Grid.SGMG.Log.HA"        additivity="false"> <level value="trace"/> <appender-ref ref="LOG"/></logger>
    • Overwrite logconfig.xml with logconfig.trace.xml
    • Start SWO: /path/config/Lev1/Grid/sgmg.sh start
    • Enable the HA service and wait for a moment
    • From the logs, look for entries matching "App.Grid.SGMG.Log.HA". 
      • Example: User=>sasinst < and haManagerInit: Cannot authenticate user

Once all is done, rollback the changes and repeat if needed for further troubleshooting.

 

Best regards,

Juan

 

 

 

 

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 1676 views
  • 2 likes
  • 3 in conversation