Hi,
We are using SAS 9.4M6 with having multiple tier architecture. We recently faced an issue where few of the services on compute nodes were not getting up automatically when we tried to start SAS Grid via sgmg.sh on those nodes. Later, it was figured out that the NTP services on the servers were not turned up after server patching activity was finished (Linux based servers), due to which the time on all the server was not matching. This happened in our Validation environment too. When the NTP services were turned on the VAL servers, we rebooted the SAS services and everything worked into VAL.
However, when we tried the same steps in Production environment (NTP turned on the PROD servers and then SAS services rebooted), though the spawner services were back on compute nodes but, the web services started showing problems as none of the web applications were accessible. Later on it was discovered that the WIP data service was not turning on. Although we did a complete reboot of SAS services, it did not help. We are receiving the below errors every time we go for WIP data services.
Any idea how to get rid of this issue? We have raised a call with SAS Institute on high priority, however, still waiting for their response. So, thought of getting ideas on the SAS community.
2020-08-03 06:35:12.376 UTCLOG: database system was interrupted; last known up at 2020-08-02 20:28:08 UTC
2020-08-03 06:35:20.907 UTCLOG: record with incorrect prev-link 4F94/740FE085 at 8C/CF9BD098
2020-08-03 06:35:20.907 UTCLOG: invalid primary checkpoint record
2020-08-03 06:35:20.908 UTCLOG: invalid xl_info in secondary checkpoint record
2020-08-03 06:35:20.908 UTCPANIC: could not locate a valid checkpoint record
2020-08-03 06:35:22.489 UTCLOG: startup process (PID 107579) was terminated by signal 6: Aborted
2020-08-03 06:35:22.489 UTCLOG: aborting startup due to startup process failure
2020-08-03 06:35:31.134 UTCLOG: incomplete startup packet