No matter the SAS Software version, there is an inevitable phenomenon that you will routinely observe in your professional life if you are working as an installation engineer for SAS : you launch a deployment process, and then you must wait…and wait, cross your fingers very hard (maybe pray if you are a believer), observe how things are going and then at some point check that it worked (or not) 😊
The part I’d like to talk about in this blog is the "observe how things are going" and I’d like to talk about it in the context of a Viya 4 deployment.
A standard Viya 4 deployment should take around one hour to complete, sometimes more (depending on the type of order, the latency between the cluster nodes and the image registry, the cluster nodes CPU power, etc…).
During this "wait" phase, some colleagues will take the opportunity maybe to grab a coffee or stretch their legs outside of their office. But others, maybe a little bit more anxious (like myself 😊) will want to monitor as closely as possible if things are REALLY going well…
My colleague @ScottMcCauley explained in his article how to assess the sas-readiness pod to determine if the platform has reached a global “readiness” state where you can tell your users to start connecting to the Viya platform to load and start to crunch their data 😊
But here, I’m more interested into looking the sequence of events, making sure everything happens as expected between the various parts of the platform that need to collaborate and identifying any problem as early as possible.
Most of the initial deployment or startup issues can be resolved without having to redeploy everything, so the sooner a problem is detected, the better 😊
Finally, monitoring closely what happens in the cluster when Viya is deployed (or started) really helps to understand how the Viya platform is working and what are the relations between the various components of the platform.
For the moment, there is no order defined in the startup of the various components. All the pods (that contain the Viya services) will be submitted to the K8s system at once when their definition (manifest) is applied with the "Kubernetes apply" commands.
Then all the pods will follow a similar process moving from the "pending" to the "running" (and sometimes "completed") state.
You can check the official Kubernetes documentation for details, but basically, pods first wait for being scheduled by K8s on a given node, then each container images of the pod must be pulled (all that happens during the "pending" phase) and then the containers are created and started on the nodes ‘("running").
Even though there are hard dependencies between components (for example many micro services won’t be able to be ready until the sas-logon pod is ready – which in turn requires postgres to be fully up before reaching readiness), there is no startup order because the vast majority of the services can wait and loop until the service they depend on gets ready.
Note : Although there is no order at the moment, it could be interesting to have one, to avoid to lose all this waiting time spent by the services. Also with the current random start and limited capacity, stateless pods could prevent critical infrastructure services (like crunchy) to start preventing the platform to reach a working state.
While there is no pre-determined order in the way pods are started, we know that there are things that needs to be there to allow other things to come up.
We'll use this knowledge to know what to look at first, when we want to closely monitor a Viya deployment (or startup).
In case you were not able to watch the whole video, or maybe, getting tired of listening the French accent 😉, I have provided, below, the key points to take away.
What to look at in Lens :
We often use this handy command to monitor in real time the sas-readiness log :
It will display different kind of messages depending on the deployment stage.
At the very beginning it just says that the sas-readiness pod is still initializing…
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Once the sas-readiness image will have been pulled and the container started, the log says that sas-logon is not up yet (sas-logon)
Then it tells us how many endpoints are still not available (74 in this example)
Finally, the number of unavailable endpoints should slowly decrease and, at the end , when all the endpoints are responding positively, the log will inform us that all checks are passed and for how long it has been testing the endpoints (41 minutes in the example below).
6. Pods with "Error" : Sometimes we can see some Pod's "Error" status as below. But they are not necessary a problem. For example in the screenshot below, it corresponds to several attempts for the sas-import-data-loader job to run with success. However we see that the last attempt was successful.
The 2 main challenges when monitoring a deployment (or restart of the platform services) are 1) to identify real errors from errors related to the services synchronization, and 2) the time duration factor is relative (depending on your combination of hardware and licensed software, a specific task taking more than 5 minutes could be either normal or indicate a problem).
What should NOT worry you; and could be perfectly normal during the deployment or startup phase:
What should worry you :
Thanks for reading !
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.