BookmarkSubscribeRSS Feed

Monitor your SAS Viya deployment with Lens

Started ‎03-11-2022 by
Modified ‎03-11-2022 by
Views 2,168

 

No matter the SAS Software version, there is an inevitable phenomenon that you will routinely observe in your professional life if you are working as an installation engineer for SAS : you launch a deployment process, and then you must wait…and wait, cross your fingers very hard (maybe pray if you are a believer), observe how things are going and then at some point check that it worked (or not) 😊

 

The part I’d like to talk about in this blog is the "observe how things are going" and I’d like to talk about it in the context of a Viya 4 deployment.

 

Spot a deployment issue as soon as it occurs

 

A standard Viya 4 deployment should take around one hour to complete, sometimes more (depending on the type of order, the latency between the cluster nodes and the image registry, the cluster nodes CPU power, etc…).

 

During this "wait" phase, some colleagues will take the opportunity maybe to grab a coffee or stretch their legs outside of their office. But others, maybe a little bit more anxious (like myself 😊) will want to monitor as closely as possible if things are REALLY going well…

 

My colleague @ScottMcCauley explained in his article how to assess the sas-readiness pod to determine if the platform has reached a global “readiness” state where you can tell your users to start connecting to the Viya platform to load and start to crunch their data 😊

 

But here, I’m more interested into looking the sequence of events, making sure everything happens as expected between the various parts of the platform that need to collaborate and identifying any problem as early as possible.

 

Most of the initial deployment or startup issues can be resolved without having to redeploy everything, so the sooner a problem is detected, the better 😊

 

Finally, monitoring closely what happens in the cluster when Viya is deployed (or started) really helps to understand how the Viya platform is working and what are the relations between the various components of the platform.

 

The sequence

 

For the moment, there is no order defined in the startup of the various components. All the pods (that contain the Viya services) will be submitted to the K8s system at once when their definition (manifest) is applied with the "Kubernetes apply" commands.

 

Then all the pods will follow a similar process moving from the "pending" to the "running" (and sometimes "completed") state.

 

You can check the official Kubernetes documentation for details, but basically, pods first wait for being scheduled by K8s on a given node, then each container images of the pod must be pulled (all that happens during the "pending" phase) and then the containers are created and started on the nodes ‘("running").

 

Even though there are hard dependencies between components (for example many micro services won’t be able to be ready until the sas-logon pod is ready – which in turn requires postgres to be fully up before reaching readiness), there is no startup order because the vast majority of the services can wait and loop until the service they depend on gets ready.

 

Note : Although there is no order at the moment, it could be interesting to have one, to avoid to lose all this waiting time spent by the services. Also with the current random start and limited capacity, stateless pods could prevent critical infrastructure services (like crunchy) to start preventing the platform to reach a working state.

 

While there is no pre-determined order in the way pods are started, we know that there are things that needs to be there to allow other things to come up.

 

We'll use this knowledge to know what to look at first, when we want to closely monitor a Viya deployment (or startup).

 

Demo with Lens (20mn)

 

 

In case you were not able to watch the whole video, or maybe, getting tired of listening the French accent 😉, I have provided, below, the key points to take away.

 

Recap

 

What to look at in Lens :

 

      1. stateful sets (01:47-3:31) : If something is not working there, your deployment will never complete successfully.
      2. sas-logon and Crunchy (3:31-8:16): sas-logon will complain about missing JDBC connection until the postgres instances are up and running.
      3. Storage (8:16-10:32) : make sure all the Persistent Volume are "bound", if Viya volumes are not there your deployment will never complete successfully.
      4. CAS pods status (17:13-18:02) : CAS is a core component of the platform, it could not work for various reasons, so it is always a good idea to do a sanity check of the CAS pods status as soon as possible.
      5. sas-readiness pod log (13:34-14:49 and end of the video) : It will tell you when the users can start to login.

     

    We often use this handy command to monitor in real time the sas-readiness log :

  1.  

    watch -c -n 20 'kubectl -n viya4gcp-auto logs \
         --selector=app.kubernetes.io/name=sas-readiness \
          | tail -n 1  '

     

    It will display different kind of messages depending on the deployment stage.

  2.  

    At the very beginning it just says that the sas-readiness pod is still initializing…

  3.  
  4. rp_1_readiness-1.png

    Select any image to see a larger version.
    Mobile users: To view the images, select the "Full" version at the bottom of the page.

     

    Once the sas-readiness image will have been pulled and the container started, the log says that sas-logon is not up yet (sas-logon)

  5.  
  6. rp_2_readiness-2.png
  7.  

    Then it tells us how many endpoints are still not available (74 in this example)

  8.  
  9. rp_3_readiness-3.png

     

  10. Finally, the number of unavailable endpoints should slowly decrease and, at the end , when all the endpoints are responding positively, the log will inform us that all checks are passed and for how long it has been testing the endpoints (41 minutes in the example below).

  11.  
  12. ro_4_readiness-4.png

     

    6. Pods with "Error" : Sometimes we can see some Pod's "Error" status as below. But they are not necessary a problem. For example in the screenshot below, it corresponds to several attempts for the sas-import-data-loader job to run with success. However we see that the last attempt was successful.

  13.  
  14. rp_5_Errors.png
  15.  
  16. Conclusion

    The 2 main challenges when monitoring a deployment (or restart of the platform services) are 1) to identify real errors from errors related to the services synchronization, and 2) the time duration factor is relative (depending on your combination of hardware and licensed software, a specific task taking more than 5 minutes could be either normal or indicate a problem).

  17.  
  18.  

    What should NOT worry you; and could be perfectly normal during the deployment or startup phase:

  19.  

    • Startup probe failed
    • Errors in the container’s logs
    • Sas-logon pod taking time to start up
    • Sas-logon complaining about no postgres database found (...unless there is a real issue with the postgres database)
    • A lot of pods that were having the "ready" state now get the “pending” state (it means that they are restarted on other nodes, but thanks to the Viya deployments "RollingUpdate" strategy, the service remains available and the initial pod is only terminated when the new instance is ready).

     

  20.  
  21. What should worry you :

  1.  

    • Error messages in the containers logs that does not go away after a little while!
    • Kubernetes "CrashLoopBackoff" or "Error" status that never goes away.
    • Kubernetes pods that stays in "Pending" state forever.
    • Kubernetes Error status ("CreateContainerConfigError", "CreateContainerError", etc...)
    • Unbound persistent volumes
    • A deployment that takes more than 2 or 3 hours (unless it is explained by long times to pull the images because of network slowness or a lot of latency between the cluster and the registry)

     

     


    Thanks for reading !

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎03-11-2022 07:05 AM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags