We are using SAS 9.4M6 with having multiple tier architecture (3 meta, 6 compute and 2 web nodes - all in Linux). We have been facing issues frequently with the Web Infrastructure services getting down again and again, as due to High Availability the system tries to start the services in other compute nodes than the existing (it may be due a network glitch that the server might not be providing the correct status of service's availability). Once it tries to start the WIP services in another compute node with a new PID, this locks/corrupts the WIP database as it does not match with the existing PID added in the postmaster.pid file. So, every time we have to run the pg_resetxlog to fix the problem.
Is there any suggestion how this issue can be fixed so that we do not need to reset the lock file every time to make the WIP services and hence the web services (SAS Studio) working?
Configuring the WIP Data Server as a highly available service shouldn't trigger a failover unless the original service goes down, but this does rely on successful communication between the hosts. An issue we sometimes see is that the server is started outside of the HA system (i.e. starting with sas.services), so the HA system is unable to start the service and eventually fails over.
-- Greg Wootton | Principal Systems Technical Support Engineer
This seems not to be the case in this particular issue. The services are being turned on as per the process (i.e. starting with sgmg.sh, but not sas.servers). In our case, the WIP services starts in another node on a weekday in between all of sudden. SAS tech support is still investigating the issue.