A SAS Viya deployment is comprised of many individual processes and making sure all of these processes are up and responding is one of the primary concerns for SAS administrators. There are many approaches to this problem, most of which lean heavily on the SAS Configuration Server to assess process availability because each service is supposed to register itself in the SAS Configuration Server when it starts. Relying on the SAS Configuration Server to tell us about the services in our deployment works well as long as each service starts well enough to register itself and that no service is stopped in such a way as to unregister itself. If either of these events occur the SAS Configuration Server has no knowledge of the service so it is effectively ignored when reporting process availability status. This is why administrators have to be especially careful when reviewing the Availability portlet on the SAS Environment Manager dashboard page.
The root problem is that the SAS Configuration Server has no knowledge of the processes that should be present but aren't. With SAS Viya 3.5, the sas-admin healthcheck plug-in provides administrators with the ability to check on processes that should be present without relying on the SAS Configuration Server. Let's take a look at this new administration feature.
This is a key difference to understand. The complex mode not only reports on microservices but also includes checks for applications such as
For this post, I am going to focus on complex mode as it is the mode with which administrators can control the services examined by feeding it a list of services to check.
Let's see how the healthcheck plug-in can help an administrator assess system process availability. After authenticating myself to the sas-admin framework, I am going to issue a sas-admin healthcheck command in complex mode to look at the processes for a healthy, fully functional Viya 3.5 deployment. The results indicate that my deployment has 95 processes that were tested and of those 95, 1 reports as down.
./sas-admin healthcheck system-health check-status complex Searching for services, applications, and infrastructure applications. [Validating Health] ............................................................................................................. Services Endpoint Status HTTP Status Time of Call Duration Discovery Table Provider /discoveryTableProvider down 503 2020-02-25T11:36:59.075Z 2358 "1" of "95" health validations failed. The following errors were generated during the execution of this program: The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred: * The server failed to fulfill an apparently valid request.
The service that reports a 'down' status, discoveryTableProvider, is due to a licensing issue which does not allow the service to respond so the lack of a valid response is interpreted as a 'down' state. Therefore, I can safely ignore that service when assessing my system health.
So far so good. Let's add the --display-results option to display details for each of the services that were tested. As you can see from the following output, each service has an indication of the endpoint that is used to test process availability and the results show the HTTP status of each endpoint test. If you take a second to scroll through the information you will see that there are three sections of processes: microservices are listed first, followed by SAS web applications, which are followed by infrastructure processes such as CAS, the SAS Infrastructure Data Server, and the SAS Message Broker.
./sas-admin healthcheck system-health check-status complex --display-results Searching for services, applications, and infrastructure applications. [Validating Health] ............................................................................................................. Services Endpoint Status HTTP Status Time of Call Duration annotations /annotations up 200 2020-02-28T13:21:26.272Z 69 Application Registry service /appRegistry up 200 2020-02-28T13:21:26.341Z 96 Audit service /audit up 200 2020-02-28T13:21:26.438Z 70 Authorization service /authorization up 200 2020-02-28T13:21:26.508Z 12 backup-agent /backup-agent up 404 2020-02-28T13:21:26.521Z 25 Cache Locator service /cachelocator up 200 2020-02-28T13:21:26.546Z 21 Cache Server service /cacheserver up 200 2020-02-28T13:21:26.568Z 21 casAccessManagement /casAccessManagement up 200 2020-02-28T13:21:27.474Z 70 casFormats /casFormats up 200 2020-02-28T13:21:27.544Z 74 CAS Management service /casManagement up 200 2020-02-28T13:21:27.618Z 22 CAS Proxy service /casProxy up 200 2020-02-28T13:21:27.641Z 21 CAS Row Sets service /casRowSets up 200 2020-02-28T13:21:27.662Z 73 Catalog service /catalog up 200 2020-02-28T13:21:27.735Z 72 codeDebugger /codeDebugger up 200 2020-02-28T13:21:27.807Z 16 Comments service /comments up 200 2020-02-28T13:21:27.824Z 84 compute /compute up 200 2020-02-28T13:21:27.909Z 52 Configuration service /configuration up 200 2020-02-28T13:21:27.961Z 72 credentials /credentials up 200 2020-02-28T13:21:28.034Z 70 Cross-domain Proxy service /crossdomainproxy up 400 2020-02-28T13:21:28.105Z 19 Data Discovery service /dataDiscovery up 200 2020-02-28T13:21:28.125Z 35 Data Plans service /dataPlans up 200 2020-02-28T13:21:28.161Z 793 Profile Results service /dataProfiles up 200 2020-02-28T13:21:28.954Z 921 Data Sources service /dataSources up 200 2020-02-28T13:21:29.876Z 88 Data Tables service /dataTables up 200 2020-02-28T13:21:29.964Z 602 Backup service /deploymentBackup up 200 2020-02-28T13:21:30.567Z 423 Device Management service /deviceManagement up 200 2020-02-28T13:21:30.990Z 94 Discovery Table Provider /discoveryTableProvider down 503 2020-02-28T13:21:31.085Z 1450 Files service /files up 200 2020-02-28T13:21:32.535Z 31 Folders service /folders up 200 2020-02-28T13:21:32.567Z 129 Fonts service /fonts up 200 2020-02-28T13:21:32.696Z 88 Geo Enrichment service /geoEnrichment up 200 2020-02-28T13:21:32.785Z 73 Identities service /identities up 200 2020-02-28T13:21:32.859Z 58 import9 /import9 up 200 2020-02-28T13:21:32.917Z 50 Job Flow Scheduling service /jobFlowScheduling up 200 2020-02-28T13:21:32.968Z 22 Launcher service /launcher up 200 2020-02-28T13:21:32.990Z 54 licenses /licenses up 200 2020-02-28T13:21:33.044Z 23 Links service /links up 200 2020-02-28T13:21:33.068Z 72 Mail service /mail up 200 2020-02-28T13:21:33.141Z 77 Maps service /maps up 200 2020-02-28T13:21:33.219Z 74 Micro Analytic Score service /microanalyticScore up 200 2020-02-28T13:21:33.293Z 62 Model Management service /modelManagement up 200 2020-02-28T13:21:33.356Z 80 Model Publish service /modelPublish up 200 2020-02-28T13:21:33.436Z 66 Model Repository service /modelRepository up 200 2020-02-28T13:21:33.502Z 51 monitoring /monitoring up 200 2020-02-28T13:21:33.554Z 16 Natural Language Generation service /naturalLanguageGeneration up 200 2020-02-28T13:21:33.570Z 75 Natural Language Understanding service /naturalLanguageUnderstanding up 200 2020-02-28T13:21:33.646Z 90 Notifications service /notifications up 200 2020-02-28T13:21:33.736Z 63 Preferences service /preferences up 200 2020-02-28T13:21:33.918Z 56 Projects service /projects up 200 2020-02-28T13:21:33.975Z 83 Relationships service /relationships up 200 2020-02-28T13:21:34.212Z 68 Report Alerts service /reportAlerts up 200 2020-02-28T13:21:34.280Z 44 Report Data service /reportData up 200 2020-02-28T13:21:34.325Z 76 Report Distribution service /reportDistribution up 200 2020-02-28T13:21:34.401Z 16 Report Images service /reportImages up 200 2020-02-28T13:21:34.417Z 228 Report Packages service /reportPackages up 200 2020-02-28T13:21:34.646Z 123 Report Renderer service /reportRenderer up 200 2020-02-28T13:21:34.770Z 136 Report Templates service /reportTemplates up 200 2020-02-28T13:21:34.907Z 52 Report Transforms service /reportTransforms up 200 2020-02-28T13:21:34.959Z 44 reportViewerNaturalLanguageUnderstanding /reportViewerNaturalLanguageUnderstanding up 200 2020-02-28T13:21:35.004Z 138 Reports Persistence service /reports up 200 2020-02-28T13:21:35.143Z 467 Row Sets service /rowSets up 200 2020-02-28T13:21:35.610Z 351 Schedule service /scheduler up 200 2020-02-28T13:21:35.961Z 63 Score Definition service /scoreDefinitions up 200 2020-02-28T13:21:36.025Z 885 Score Execution service /scoreExecution up 200 2020-02-28T13:21:36.910Z 87 Search service /search up 200 2020-02-28T13:21:36.998Z 57 Search Index service /searchIndex up 200 2020-02-28T13:21:37.056Z 547 templates /templates up 200 2020-02-28T13:21:37.603Z 130 Tenant service /tenant up 200 2020-02-28T13:21:37.734Z 202 themeContent /themeContent up 200 2020-02-28T13:21:37.936Z 143 Themes service /themes up 200 2020-02-28T13:21:38.079Z 48 Thumbnails service /thumbnails up 200 2020-02-28T13:21:38.127Z 153 Transfer service /transfer up 200 2020-02-28T13:21:38.281Z 110 Transformations service /transformations up 200 2020-02-28T13:21:38.391Z 54 types /types up 200 2020-02-28T13:21:38.446Z 59 Web Data Access service /webDataAccess up 200 2020-02-28T13:21:38.506Z 85 Workflow service /workflow up 200 2020-02-28T13:21:38.591Z 62 Workflow Definition service /workflowDefinition up 200 2020-02-28T13:21:38.653Z 68 Workflow History service /workflowHistory up 200 2020-02-28T13:21:38.721Z 48 Applications Endpoint Status HTTP Status Time of Call Duration SASBackupManager /SASBackupManager up 200 2020-02-28T13:21:20.608Z 192 SASCodeDebugger /SASCodeDebugger up 200 2020-02-28T13:21:20.800Z 1115 SAS Data Explorer /SASDataExplorer up 200 2020-02-28T13:21:21.916Z 937 SAS Data Studio /SASDataStudio up 200 2020-02-28T13:21:22.853Z 72 SAS Drive /SASDrive up 200 2020-02-28T13:21:22.925Z 1101 SAS Environment Manager /SASEnvironmentManager up 200 2020-02-28T13:21:24.027Z 173 SAS Graph Builder /SASGraphBuilder up 200 2020-02-28T13:21:24.200Z 152 SAS Job Execution /SASJobExecution up 200 2020-02-28T13:21:24.353Z 880 SAS Lineage /SASLineage up 200 2020-02-28T13:21:25.233Z 200 SAS Logon Manager /SASLogon up 200 2020-02-28T13:21:25.434Z 51 SAS Model Manager /SASModelManager up 200 2020-02-28T13:21:25.486Z 184 SASStudio /SASStudio up 200 2020-02-28T13:21:25.671Z 26 SAS Studio Viya /SASStudioV up 200 2020-02-28T13:21:25.697Z 72 SAS Theme Designer /SASThemeDesigner up 200 2020-02-28T13:21:25.769Z 183 SAS Visual Analytics /SASVisualAnalytics up 200 2020-02-28T13:21:25.952Z 156 SAS Workflow Manager /SASWorkflowManager up 200 2020-02-28T13:21:26.109Z 162 Infrastructure Applications Status Time of Call Duration cas-shared-default up 2020-02-28T13:21:26.589Z 760 cas-shared-default-http up 2020-02-28T13:21:27.350Z 123 SAS Infrastructure Data Server up 2020-02-28T13:21:33.799Z 118 SAS Message Broker up 2020-02-28T13:21:34.058Z 117 "1" of "95" health validations failed. The items with statuses in yellow are functioning properly, but cannot be directly verified due to their nature. The following errors were generated during the execution of this program: The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred: * The server failed to fulfill an apparently valid request.
So while we have a healthy system with all processes that need to be up, let's re-run the healthcheck command with the --create-yaml option which will create an output file named complexCheck.yml containing details for all of the tested processes.
./sas-admin healthcheck system-health check-status complex --create-yaml complexCheck.yml Searching for services, applications, and infrastructure applications. [Validating Health] .............................................................................................................. Services Endpoint Status HTTP Status Time of Call Duration Discovery Table Provider /discoveryTableProvider down 503 2020-02-25T11:49:09.689Z 1804 "1" of "95" health validations failed.
Here is a look at the complexCheck.yml file. As the comments explain, each service has an associated endpoint for which the service to call to assess health is obtained from the SAS Configuration Server. The interesting thing though is that we now have a list of every service that should be present in a healthy deployment. We can modify this yaml file to suit our purposes such as splitting out separate checks for the web application or the infrastructure components if that suits our needs. We also have the option to add additional service checks providing we know the endpoint to call or we can specify an exact endpoint to use for any of the existing services.
#################################################################################################################### # A YAML file can be specified for the "complex" command. # At the minimum, a list of service or application names must be specified. # For every service or application listed, the program goes to Consul to find the correct head endpoint to use. # You can specify additional endpoints to test in the YAML file. If Consul finds a match for that service or application, # calls are made to the head endpoint in addition to making calls to all the endpoints specified in the YAML file. # If you specify a service or application name in the file that Consul does not recognize, the program tests only # the endpoints specified in the YAML file. If there are no endpoints, then that item is skipped. # Pasted below is a sample format for a complex check configuration file: #- Name: Files #- Name: Advanced Analytics Components service # Endpoints: # - /analyticsComponents/commons/health # - /analyticsComponents/components # - /analyticsComponents/templates # - /files/files #- Name: Advanced Analytics Data Segmentation service # Endpoints: # - /analyticsDataSegmentation/commons/health # - /analyticsDataSegmentation/plans #- Name: Analytics Events service #################################################################################################################### # Applications: - Name: SASBackupManager Endpoints: - /SASBackupManager - Name: SASCodeDebugger Endpoints: - /SASCodeDebugger - Name: SAS Data Explorer Endpoints: - /SASDataExplorer - Name: SAS Data Studio Endpoints: - /SASDataStudio - Name: SAS Drive Endpoints: - /SASDrive - Name: SAS Environment Manager Endpoints: - /SASEnvironmentManager - Name: SAS Graph Builder Endpoints: - /SASGraphBuilder - Name: SAS Job Execution Endpoints: - /SASJobExecution - Name: SAS Lineage Endpoints: - /SASLineage - Name: SAS Logon Manager Endpoints: - /SASLogon - Name: SAS Model Manager Endpoints: - /SASModelManager - Name: SASStudio Endpoints: - /SASStudio - Name: SAS Studio Viya Endpoints: - /SASStudioV - Name: SAS Theme Designer Endpoints: - /SASThemeDesigner - Name: SAS Visual Analytics Endpoints: - /SASVisualAnalytics - Name: SAS Workflow Manager Endpoints: - /SASWorkflowManager # Services: - Name: annotations Endpoints: - /annotations - Name: Application Registry service Endpoints: - /appRegistry - Name: Audit service Endpoints: - /audit - Name: Authorization service Endpoints: - /authorization - Name: backup-agent Endpoints: - /backup-agent - Name: Cache Locator service Endpoints: - /cachelocator - Name: Cache Server service Endpoints: - /cacheserver - Name: casAccessManagement Endpoints: - /casAccessManagement - Name: casFormats Endpoints: - /casFormats - Name: CAS Management service Endpoints: - /casManagement - Name: CAS Proxy service Endpoints: - /casProxy - Name: CAS Row Sets service Endpoints: - /casRowSets - Name: Catalog service Endpoints: - /catalog - Name: codeDebugger Endpoints: - /codeDebugger - Name: Comments service Endpoints: - /comments - Name: compute Endpoints: - /compute - Name: Configuration service Endpoints: - /configuration - Name: credentials Endpoints: - /credentials - Name: Cross-domain Proxy service Endpoints: - /crossdomainproxy - Name: Data Discovery service Endpoints: - /dataDiscovery - Name: Data Plans service Endpoints: - /dataPlans - Name: Profile Results service Endpoints: - /dataProfiles - Name: Data Sources service Endpoints: - /dataSources - Name: Data Tables service Endpoints: - /dataTables - Name: Backup service Endpoints: - /deploymentBackup - Name: Device Management service Endpoints: - /deviceManagement - Name: Discovery Table Provider Endpoints: - /discoveryTableProvider - Name: Files service Endpoints: - /files - Name: Folders service Endpoints: - /folders - Name: Fonts service Endpoints: - /fonts - Name: Geo Enrichment service Endpoints: - /geoEnrichment - Name: Graph Template Service Endpoints: - /graphTemplates - Name: Identities service Endpoints: - /identities - Name: import9 Endpoints: - /import9 - Name: Job Flow Scheduling service Endpoints: - /jobFlowScheduling - Name: Launcher service Endpoints: - /launcher - Name: licenses Endpoints: - /licenses - Name: Links service Endpoints: - /links - Name: Mail service Endpoints: - /mail - Name: Maps service Endpoints: - /maps - Name: Micro Analytic Score service Endpoints: - /microanalyticScore - Name: Model Management service Endpoints: - /modelManagement - Name: Model Publish service Endpoints: - /modelPublish - Name: Model Repository service Endpoints: - /modelRepository - Name: monitoring Endpoints: - /monitoring - Name: Natural Language Generation service Endpoints: - /naturalLanguageGeneration - Name: Natural Language Understanding service Endpoints: - /naturalLanguageUnderstanding - Name: Notifications service Endpoints: - /notifications - Name: Preferences service Endpoints: - /preferences - Name: Projects service Endpoints: - /projects - Name: Relationships service Endpoints: - /relationships - Name: Report Alerts service Endpoints: - /reportAlerts - Name: Report Data service Endpoints: - /reportData - Name: Report Distribution service Endpoints: - /reportDistribution - Name: Report Images service Endpoints: - /reportImages - Name: Report Packages service Endpoints: - /reportPackages - Name: Report Renderer service Endpoints: - /reportRenderer - Name: Report Templates service Endpoints: - /reportTemplates - Name: Report Transforms service Endpoints: - /reportTransforms - Name: reportViewerNaturalLanguageUnderstanding Endpoints: - /reportViewerNaturalLanguageUnderstanding - Name: Reports Persistence service Endpoints: - /reports - Name: Row Sets service Endpoints: - /rowSets - Name: Schedule service Endpoints: - /scheduler - Name: Score Definition service Endpoints: - /scoreDefinitions - Name: Score Execution service Endpoints: - /scoreExecution - Name: Search service Endpoints: - /search - Name: Search Index service Endpoints: - /searchIndex - Name: templates Endpoints: - /templates - Name: Tenant service Endpoints: - /tenant - Name: themeContent Endpoints: - /themeContent - Name: Themes service Endpoints: - /themes - Name: Thumbnails service Endpoints: - /thumbnails - Name: Transfer service Endpoints: - /transfer - Name: Transformations service Endpoints: - /transformations - Name: types Endpoints: - /types - Name: Web Data Access service Endpoints: - /webDataAccess - Name: Workflow service Endpoints: - /workflow - Name: Workflow Definition service Endpoints: - /workflowDefinition - Name: Workflow History service Endpoints: - /workflowHistory # Infrastructure Applications: - Name: cas-shared-default Endpoints: - /cas-shared-default - Name: cas-shared-default-http Endpoints: - /cas-shared-default-http - Name: SAS Infrastructure Data Server Endpoints: - /postgres - Name: SAS Message Broker Endpoints: - /rabbitmq
Let's see how having this list of services can be especially useful to administrators.
Suppose one of our services is either accidentally stopped or perhaps did not start sufficiently to register itself in the SAS Configuration Server. I'm going to simulate this by stopping the graph templates service.
Let's see how this is reported by the healthcheck using the default method of obtaining the list of available services from the SAS Configuration Server.
./sas-admin healthcheck system-health check-status complex Searching for services, applications, and infrastructure applications. [Validating Health] ............................................................................................................. Services Endpoint Status HTTP Status Time of Call Duration Discovery Table Provider /discoveryTableProvider down 503 2020-02-25T11:36:59.075Z 2358 "1" of "94" health validations failed. The following errors were generated during the execution of this program: The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred: * The server failed to fulfill an apparently valid request.
Interestingly, the results still indicate that only the expected discoveryTableProvider service is down. If you are a very observant administrator, you might notice that the total number of health validations is now 94 instead of the 95 we had earlier with a healthy system. If you were in a hurry and not paying close attention, you might miss that small difference and assume that everything is ok when in fact there is a missing service.
Let's re-run the healthcheck but this time pass it our yaml file as the list of services we want checked by adding the --source-location option to the command.
./sas-admin healthcheck system-health check-status complex --source-location complexCheck.yml [Validating Health] ................................................................................................... Services Endpoint Status HTTP Status Time of Call Duration Discovery Table Provider /discoveryTableProvider down 503 2020-02-25T11:57:23.201Z 1499 Infrastructure Applications Status Time of Call Duration Graph Template Service down 2020-02-25T11:57:25.047Z 1 "2" of "95" health validations failed. The following errors were generated during the execution of this program: The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred: * The server failed to fulfill an apparently valid request. The following error was encountered when making an endpoint call to "/graphTemplates": 1 error occurred: * The requested resource could not be found.
Aha! Now we can clearly see that there are two services down and the total number of health validations is back up to 95. So forcing healthcheck to look for all of the services we expect to have in our deployment appears to provide a bit of protection for administrators in cases where services do not get registered into the SAS Configuration Server.
So as an administrator, I can now proactively monitor processes that I know should be in my deployment without having to rely completely on the SAS Configuration Server. This should enable me to more reliably detect process issues and help maintain more robust system health.
There is much more to the healthcheck plug-in that I have touched on in this post so please take a look at the documentation for a more comprehensive understanding of the many options.
The sas-admin command line interface is one of, if not the best, administration tools for Viya. If you are not yet familiar with sas-admin I recommend that you read Gerry Nelson's posts SAS Viya command-line interfaces for Administration and Keeping the SAS Administration Command-Line interfaces up-to-date to learn more.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.