About StaceyChan

StaceyChan · ‎09-01-2022

According to this document https://go.documentation.sas.com/doc/en/calcdc/3.5/calserverscas/n08000viyaservers000000admin.htm. env.CAS_HEARTBEAT_LOST_TIMEOUT='interval' If the worker node did not response after the timeout intervals, the controller will treat it as a lost worker. We’d like to know does “treat it as lost” mean the controller is going to use the redundant data block or kill the cas process on that worker? Or is there any possible, the zombie process still on that worker and the controller still trying to get data from that worker? Here is our scenerio. 20:19 hardware error on /var/log/message 20:20 controller logs like this 'A connection to peer node wk01.com was lost due to socket communication error, with status 104 ()' But the next day, we found some of the global tables(with COPY=1) are broken. Do we need to write a script to detect this kind of situation and kill the cas process on that worker? Does it help to trigger the redundant data block to be active? Many thanks and any suggesion is helpful. Best, Stacey

StaceyChan · ‎10-12-2021

Hi Rob, Thanks for the information in this article. If we have an in-memory table on the CAS cluster, and suddenly the cache of one worker in the cluster is down (hardware issue), would the table be accessable(just read, not write)? Or we should do some DR for this situation? Thanks! Best, Stacey

Online Status	Offline
Date Last Visited	2 weeks ago

Question about env.CAS_HEARTBEAT_LOST_TIMEOUT

Re: Provisioning CAS_DISK_CACHE for SAS Viya

Question about env.CAS_HEARTBEAT_LOST_TIMEOUT

Re: Provisioning CAS_DISK_CACHE for SAS Viya