Hi @doug_sas , everyone,
an update and good news. I managed to get this working for one ObjectSpawner.
The issue in SWO was in a mistake/typo, hard to recognize, unless you set up the logs in the daemons as you mentioned.
User=>sasinst <
haManagerInit: Cannot authenticate user.
As this is really hard to see in the SWO GUI, I updated through JSON, ensuring no funny characters are included, with Notepad++
In regards of the logs, I like a lot the fact that the strings are delimited.
@doug_sas In regards of the GUI, I would suggest an improvement: first, in the front-end, a neat js script check would help, and some further information ("ADDED" without error needs further description IMHO). In the back-end, a trim and a validation that the user can validate, would prevent a lot of headaches and troubleshooting in the future. In the logs, "Cannot authenticate user" should be an error, definitely, not INFO.
If you could pass this to the responsible team, that would be great. If you want, I could create an entry in the SASBallot ideas, here in the communities.
I will now implement the same for the rest of Object Spawners, and then for the WIP database and I will drop an update to keep the Knowledge Base.
For now, a summary:
Implement the provided sample script with custom improvements to be done to the sample script, in order to capture the PID of the Object Spawner. Beware, the script should not write, at all, the PID file generated by the Object Spawner itself. The script needs only read it once, to be able write the value into its own PID file as the sample script proposes.
nohup $command > $log_filename 2>&1 &
# Modify to pick up the PID generated by ObjectSpawner.sh itself - Juan Sanchez
#pid=$!
#echo $pid > $pid_filename
sleep 1
spwn_pid_filename=/sas_application/sasconfig/comp/config/Lev1/ObjectSpawnerUTF8/server.${thisHost}.pid
spwn_pid=`cat $spwn_pid_filename`
echo $spwn_pid > $pid_filename
#
echo "${now} ${script}: Service ${name} (pid $pid) is started"
}
Test that the sample script works OK for stop, start, status and restart, from CLI. If script does not work, SWO won't either, of course.
Configure SWO accordingly if the HA/failover will be in Active-Active (such as Object Spawners) or Active-Passive (such as the WIP database).
For the first case, "Number of instances" must be the number of your Grid nodes (or a lower value in case you don't want/need to enforce all the nodes).
For the second case, a value of 1.
HA service is configured, it is highly advised double to check all the values as the js support is limited at the moment
In case something runs unexpected (Troubleshooting) further understanding of the SWO mechanics must happen. You can do this getting support from SAS Technical Support or by yourself with below tips:
(Optional) Disable the SWO HA service created
Stop the daemons (sgmh.sh) in every node: /path/config/Lev1/Grid/sgmg.sh stop
(Optional) Preferably, clean/archive the current SWO logs
Backup logconfig.trace.xml and logconfig.xml
Add a line in logconfig.trace.xml under "Grid Debug Loggers" block
<logger name="App.Grid.SGMG.Log.HA" additivity="false"> <level value="trace"/> <appender-ref ref="LOG"/></logger>
Overwrite logconfig.xml with logconfig.trace.xml
Start SWO: /path/config/Lev1/Grid/sgmg.sh start
Enable the HA service and wait for a moment
From the logs, look for entries matching "App.Grid.SGMG.Log.HA".
Example: User=>sasinst < and haManagerInit: Cannot authenticate user
Once all is done, rollback the changes and repeat if needed for further troubleshooting.
Best regards,
Juan
... View more