BookmarkSubscribeRSS Feed

SAS Viya process monitoring with the healthcheck plug-in

Started ‎03-31-2020 by
Modified ‎03-31-2020 by
Views 9,222


A SAS Viya deployment is comprised of many individual processes and making sure all of these processes are up and responding is one of the primary concerns for SAS administrators. There are many approaches to this problem, most of which lean heavily on the SAS Configuration Server to assess process availability because each service is supposed to register itself in the SAS Configuration Server when it starts. Relying on the SAS Configuration Server to tell us about the services in our deployment works well as long as  each service starts well enough to register itself and that no service is stopped in such a way as to unregister itself.  If either of these events occur the SAS Configuration Server has no knowledge of the service so it is effectively ignored when reporting process availability status.  This is why administrators have to be especially careful when reviewing the Availability portlet on the SAS Environment Manager dashboard page.


The root problem is that the SAS Configuration Server has no knowledge of the processes that should be present but aren't. With SAS Viya 3.5, the sas-admin healthcheck plug-in provides administrators with the ability to check on processes that should be present without relying on the SAS Configuration Server. Let's take a look at this new administration feature.


The healthcheck plug-in has several options but in this post I am going to focus on the system-health check-status  options.  There are two modes for assessing system health

  • basic - which checks the health of the microservices
  • complex - which checks the health of microservices and applications.

This is a key difference to understand.  The complex mode not only reports on microservices but also includes checks for applications such as

  • CAS
  • SAS Message Broker
  • SAS Configuration Server
  • SAS Studio
  • SAS Drive
  • SAS Visual Analytics
  • and all of the other SAS web applications that users interact with.

For this post, I am going to focus on complex mode as it is the mode with which administrators can control the services examined by feeding it a list of services to check.


Let's see how the healthcheck plug-in can help an administrator assess system process availability.  After authenticating myself to the sas-admin framework, I am going to issue a sas-admin healthcheck command in complex mode to look at the processes for a healthy, fully functional Viya 3.5 deployment.  The results indicate that my deployment has 95 processes that were tested and of those 95, 1 reports as down.


./sas-admin healthcheck system-health check-status complex
Searching for services, applications, and infrastructure applications.

[Validating Health] .............................................................................................................

Services                   Endpoint                  Status   HTTP Status   Time of Call               Duration
Discovery Table Provider   /discoveryTableProvider   down     503           2020-02-25T11:36:59.075Z   2358

"1" of "95" health validations failed.

The following errors were generated during the execution of this program:

The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred:
  * The server failed to fulfill an apparently valid request.


The service that reports a 'down' status, discoveryTableProvider, is due to a licensing issue which does not allow the service to respond so the lack of a valid response is interpreted as a 'down' state.  Therefore, I can safely ignore that service when assessing my system health.


So far so good.  Let's add the --display-results option to display details for each of the services that were tested.  As you can see from the following output, each service has an indication of the endpoint that is used to test process availability and the results show the HTTP status of each endpoint test.  If you take a second to scroll through the information you will see that there are three sections of processes:  microservices are listed first, followed by SAS web applications, which are followed by infrastructure processes such as CAS, the SAS Infrastructure Data Server, and the SAS Message Broker.


./sas-admin healthcheck system-health check-status complex  --display-results                            
Searching for services, applications, and infrastructure applications.

[Validating Health] .............................................................................................................

Services                                               Endpoint                                              Status   HTTP Status   Time of Call               Duration
annotations                                            /annotations                                          up       200           2020-02-28T13:21:26.272Z   69
Application Registry service                           /appRegistry                                          up       200           2020-02-28T13:21:26.341Z   96
Audit service                                          /audit                                                up       200           2020-02-28T13:21:26.438Z   70
Authorization service                                  /authorization                                        up       200           2020-02-28T13:21:26.508Z   12
backup-agent                                           /backup-agent                                         up       404           2020-02-28T13:21:26.521Z   25
Cache Locator service                                  /cachelocator                                         up       200           2020-02-28T13:21:26.546Z   21
Cache Server service                                   /cacheserver                                          up       200           2020-02-28T13:21:26.568Z   21
casAccessManagement                                    /casAccessManagement                                  up       200           2020-02-28T13:21:27.474Z   70
casFormats                                             /casFormats                                           up       200           2020-02-28T13:21:27.544Z   74
CAS Management service                                 /casManagement                                        up       200           2020-02-28T13:21:27.618Z   22
CAS Proxy service                                      /casProxy                                             up       200           2020-02-28T13:21:27.641Z   21
CAS Row Sets service                                   /casRowSets                                           up       200           2020-02-28T13:21:27.662Z   73
Catalog service                                        /catalog                                              up       200           2020-02-28T13:21:27.735Z   72
codeDebugger                                           /codeDebugger                                         up       200           2020-02-28T13:21:27.807Z   16
Comments service                                       /comments                                             up       200           2020-02-28T13:21:27.824Z   84
compute                                                /compute                                              up       200           2020-02-28T13:21:27.909Z   52
Configuration service                                  /configuration                                        up       200           2020-02-28T13:21:27.961Z   72
credentials                                            /credentials                                          up       200           2020-02-28T13:21:28.034Z   70
Cross-domain Proxy service                             /crossdomainproxy                                     up       400           2020-02-28T13:21:28.105Z   19
Data Discovery service                                 /dataDiscovery                                        up       200           2020-02-28T13:21:28.125Z   35
Data Plans service                                     /dataPlans                                            up       200           2020-02-28T13:21:28.161Z   793
Profile Results service                                /dataProfiles                                         up       200           2020-02-28T13:21:28.954Z   921
Data Sources service                                   /dataSources                                          up       200           2020-02-28T13:21:29.876Z   88
Data Tables service                                    /dataTables                                           up       200           2020-02-28T13:21:29.964Z   602
Backup service                                         /deploymentBackup                                     up       200           2020-02-28T13:21:30.567Z   423
Device Management service                              /deviceManagement                                     up       200           2020-02-28T13:21:30.990Z   94
Discovery Table Provider                               /discoveryTableProvider                               down     503           2020-02-28T13:21:31.085Z   1450
Files service                                          /files                                                up       200           2020-02-28T13:21:32.535Z   31
Folders service                                        /folders                                              up       200           2020-02-28T13:21:32.567Z   129
Fonts service                                          /fonts                                                up       200           2020-02-28T13:21:32.696Z   88
Geo Enrichment service                                 /geoEnrichment                                        up       200           2020-02-28T13:21:32.785Z   73
Identities service                                     /identities                                           up       200           2020-02-28T13:21:32.859Z   58
import9                                                /import9                                              up       200           2020-02-28T13:21:32.917Z   50
Job Flow Scheduling service                            /jobFlowScheduling                                    up       200           2020-02-28T13:21:32.968Z   22
Launcher service                                       /launcher                                             up       200           2020-02-28T13:21:32.990Z   54
licenses                                               /licenses                                             up       200           2020-02-28T13:21:33.044Z   23
Links service                                          /links                                                up       200           2020-02-28T13:21:33.068Z   72
Mail service                                           /mail                                                 up       200           2020-02-28T13:21:33.141Z   77
Maps service                                           /maps                                                 up       200           2020-02-28T13:21:33.219Z   74
Micro Analytic Score service                           /microanalyticScore                                   up       200           2020-02-28T13:21:33.293Z   62
Model Management service                               /modelManagement                                      up       200           2020-02-28T13:21:33.356Z   80
Model Publish service                                  /modelPublish                                         up       200           2020-02-28T13:21:33.436Z   66
Model Repository service                               /modelRepository                                      up       200           2020-02-28T13:21:33.502Z   51
monitoring                                             /monitoring                                           up       200           2020-02-28T13:21:33.554Z   16
Natural Language Generation service                    /naturalLanguageGeneration                            up       200           2020-02-28T13:21:33.570Z   75
Natural Language Understanding service                 /naturalLanguageUnderstanding                         up       200           2020-02-28T13:21:33.646Z   90
Notifications service                                  /notifications                                        up       200           2020-02-28T13:21:33.736Z   63
Preferences service                                    /preferences                                          up       200           2020-02-28T13:21:33.918Z   56
Projects service                                       /projects                                             up       200           2020-02-28T13:21:33.975Z   83
Relationships service                                  /relationships                                        up       200           2020-02-28T13:21:34.212Z   68
Report Alerts service                                  /reportAlerts                                         up       200           2020-02-28T13:21:34.280Z   44
Report Data service                                    /reportData                                           up       200           2020-02-28T13:21:34.325Z   76
Report Distribution service                            /reportDistribution                                   up       200           2020-02-28T13:21:34.401Z   16
Report Images service                                  /reportImages                                         up       200           2020-02-28T13:21:34.417Z   228
Report Packages service                                /reportPackages                                       up       200           2020-02-28T13:21:34.646Z   123
Report Renderer service                                /reportRenderer                                       up       200           2020-02-28T13:21:34.770Z   136
Report Templates service                               /reportTemplates                                      up       200           2020-02-28T13:21:34.907Z   52
Report Transforms service                              /reportTransforms                                     up       200           2020-02-28T13:21:34.959Z   44
reportViewerNaturalLanguageUnderstanding               /reportViewerNaturalLanguageUnderstanding             up       200           2020-02-28T13:21:35.004Z   138
Reports Persistence service                            /reports                                              up       200           2020-02-28T13:21:35.143Z   467
Row Sets service                                       /rowSets                                              up       200           2020-02-28T13:21:35.610Z   351
Schedule service                                       /scheduler                                            up       200           2020-02-28T13:21:35.961Z   63
Score Definition service                               /scoreDefinitions                                     up       200           2020-02-28T13:21:36.025Z   885
Score Execution service                                /scoreExecution                                       up       200           2020-02-28T13:21:36.910Z   87
Search service                                         /search                                               up       200           2020-02-28T13:21:36.998Z   57
Search Index service                                   /searchIndex                                          up       200           2020-02-28T13:21:37.056Z   547
templates                                              /templates                                            up       200           2020-02-28T13:21:37.603Z   130
Tenant service                                         /tenant                                               up       200           2020-02-28T13:21:37.734Z   202
themeContent                                           /themeContent                                         up       200           2020-02-28T13:21:37.936Z   143
Themes service                                         /themes                                               up       200           2020-02-28T13:21:38.079Z   48
Thumbnails service                                     /thumbnails                                           up       200           2020-02-28T13:21:38.127Z   153
Transfer service                                       /transfer                                             up       200           2020-02-28T13:21:38.281Z   110
Transformations service                                /transformations                                      up       200           2020-02-28T13:21:38.391Z   54
types                                                  /types                                                up       200           2020-02-28T13:21:38.446Z   59
Web Data Access service                                /webDataAccess                                        up       200           2020-02-28T13:21:38.506Z   85
Workflow service                                       /workflow                                             up       200           2020-02-28T13:21:38.591Z   62
Workflow Definition service                            /workflowDefinition                                   up       200           2020-02-28T13:21:38.653Z   68
Workflow History service                               /workflowHistory                                      up       200           2020-02-28T13:21:38.721Z   48

Applications                                           Endpoint                                              Status   HTTP Status   Time of Call               Duration
SASBackupManager                                       /SASBackupManager                                     up       200           2020-02-28T13:21:20.608Z   192
SASCodeDebugger                                        /SASCodeDebugger                                      up       200           2020-02-28T13:21:20.800Z   1115
SAS Data Explorer                                      /SASDataExplorer                                      up       200           2020-02-28T13:21:21.916Z   937
SAS Data Studio                                        /SASDataStudio                                        up       200           2020-02-28T13:21:22.853Z   72
SAS Drive                                              /SASDrive                                             up       200           2020-02-28T13:21:22.925Z   1101
SAS Environment Manager                                /SASEnvironmentManager                                up       200           2020-02-28T13:21:24.027Z   173
SAS Graph Builder                                      /SASGraphBuilder                                      up       200           2020-02-28T13:21:24.200Z   152
SAS Job Execution                                      /SASJobExecution                                      up       200           2020-02-28T13:21:24.353Z   880
SAS Lineage                                            /SASLineage                                           up       200           2020-02-28T13:21:25.233Z   200
SAS Logon Manager                                      /SASLogon                                             up       200           2020-02-28T13:21:25.434Z   51
SAS Model Manager                                      /SASModelManager                                      up       200           2020-02-28T13:21:25.486Z   184
SASStudio                                              /SASStudio                                            up       200           2020-02-28T13:21:25.671Z   26
SAS Studio Viya                                        /SASStudioV                                           up       200           2020-02-28T13:21:25.697Z   72
SAS Theme Designer                                     /SASThemeDesigner                                     up       200           2020-02-28T13:21:25.769Z   183
SAS Visual Analytics                                   /SASVisualAnalytics                                   up       200           2020-02-28T13:21:25.952Z   156
SAS Workflow Manager                                   /SASWorkflowManager                                   up       200           2020-02-28T13:21:26.109Z   162

Infrastructure Applications                                                                                  Status                 Time of Call               Duration
cas-shared-default                                                                                           up                     2020-02-28T13:21:26.589Z   760
cas-shared-default-http                                                                                      up                     2020-02-28T13:21:27.350Z   123
SAS Infrastructure Data Server                                                                               up                     2020-02-28T13:21:33.799Z   118
SAS Message Broker                                                                                           up                     2020-02-28T13:21:34.058Z   117

"1" of "95" health validations failed.

The items with statuses in yellow are functioning properly, but cannot be directly verified due to their nature.

The following errors were generated during the execution of this program:

The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred:
  * The server failed to fulfill an apparently valid request.


So while we have a healthy system with all processes that need to be up, let's re-run the healthcheck command with the --create-yaml option which will create an output file named complexCheck.yml containing details for all of the tested processes.


./sas-admin healthcheck system-health check-status complex --create-yaml complexCheck.yml
Searching for services, applications, and infrastructure applications.

[Validating Health] ..............................................................................................................

Services Endpoint Status HTTP Status Time of Call Duration
Discovery Table Provider /discoveryTableProvider down 503 2020-02-25T11:49:09.689Z 1804

"1" of "95" health validations failed.


Here is a look at the complexCheck.yml file. As the comments explain, each service has an associated endpoint for which the service to call to assess health is obtained from the SAS Configuration Server. The interesting thing though is that we now have a list of every service that should be present in a healthy deployment. We can modify this yaml file to suit our purposes such as splitting out separate checks for the web application or the infrastructure components if that suits our needs. We also have the option to add additional service checks providing we know the endpoint to call or we can specify an exact endpoint to use for any of the existing services.


# A YAML file can be specified for the "complex" command.

# At the minimum, a list of service or application names must be specified.
# For every service or application listed, the program goes to Consul to find the correct head endpoint to use.
# You can specify additional endpoints to test in the YAML file. If Consul finds a match for that service or application,
# calls are made to the head endpoint in addition to making calls to all the endpoints specified in the YAML file.
# If you specify a service or application name in the file that Consul does not recognize, the program tests only
# the endpoints specified in the YAML file. If there are no endpoints, then that item is skipped.

# Pasted below is a sample format for a complex check configuration file:

#- Name: Files
#- Name: Advanced Analytics Components service
# Endpoints:
# - /analyticsComponents/commons/health
# - /analyticsComponents/components
# - /analyticsComponents/templates
# - /files/files
#- Name: Advanced Analytics Data Segmentation service
# Endpoints:
# - /analyticsDataSegmentation/commons/health
# - /analyticsDataSegmentation/plans
#- Name: Analytics Events service

# Applications:
- Name: SASBackupManager
- /SASBackupManager
- Name: SASCodeDebugger
- /SASCodeDebugger
- Name: SAS Data Explorer
- /SASDataExplorer
- Name: SAS Data Studio
- /SASDataStudio
- Name: SAS Drive
- /SASDrive
- Name: SAS Environment Manager
- /SASEnvironmentManager
- Name: SAS Graph Builder
- /SASGraphBuilder
- Name: SAS Job Execution
- /SASJobExecution
- Name: SAS Lineage
- /SASLineage
- Name: SAS Logon Manager
- /SASLogon
- Name: SAS Model Manager
- /SASModelManager
- Name: SASStudio
- /SASStudio
- Name: SAS Studio Viya
- /SASStudioV
- Name: SAS Theme Designer
- /SASThemeDesigner
- Name: SAS Visual Analytics
- /SASVisualAnalytics
- Name: SAS Workflow Manager
- /SASWorkflowManager

# Services:
- Name: annotations
- /annotations
- Name: Application Registry service
- /appRegistry
- Name: Audit service
- /audit
- Name: Authorization service
- /authorization
- Name: backup-agent
- /backup-agent
- Name: Cache Locator service
- /cachelocator
- Name: Cache Server service
- /cacheserver
- Name: casAccessManagement
- /casAccessManagement
- Name: casFormats
- /casFormats
- Name: CAS Management service
- /casManagement
- Name: CAS Proxy service
- /casProxy
- Name: CAS Row Sets service
- /casRowSets
- Name: Catalog service
- /catalog
- Name: codeDebugger
- /codeDebugger
- Name: Comments service
- /comments
- Name: compute
- /compute
- Name: Configuration service
- /configuration
- Name: credentials
- /credentials
- Name: Cross-domain Proxy service
- /crossdomainproxy
- Name: Data Discovery service
- /dataDiscovery
- Name: Data Plans service
- /dataPlans
- Name: Profile Results service
- /dataProfiles
- Name: Data Sources service
- /dataSources
- Name: Data Tables service
- /dataTables
- Name: Backup service
- /deploymentBackup
- Name: Device Management service
- /deviceManagement
- Name: Discovery Table Provider
- /discoveryTableProvider
- Name: Files service
- /files
- Name: Folders service
- /folders
- Name: Fonts service
- /fonts
- Name: Geo Enrichment service
- /geoEnrichment
- Name: Graph Template Service
- /graphTemplates
- Name: Identities service
- /identities
- Name: import9
- /import9
- Name: Job Flow Scheduling service
- /jobFlowScheduling
- Name: Launcher service
- /launcher
- Name: licenses
- /licenses
- Name: Links service
- /links
- Name: Mail service
- /mail
- Name: Maps service
- /maps
- Name: Micro Analytic Score service
- /microanalyticScore
- Name: Model Management service
- /modelManagement
- Name: Model Publish service
- /modelPublish
- Name: Model Repository service
- /modelRepository
- Name: monitoring
- /monitoring
- Name: Natural Language Generation service
- /naturalLanguageGeneration
- Name: Natural Language Understanding service
- /naturalLanguageUnderstanding
- Name: Notifications service
- /notifications
- Name: Preferences service
- /preferences
- Name: Projects service
- /projects
- Name: Relationships service
- /relationships
- Name: Report Alerts service
- /reportAlerts
- Name: Report Data service
- /reportData
- Name: Report Distribution service
- /reportDistribution
- Name: Report Images service
- /reportImages
- Name: Report Packages service
- /reportPackages
- Name: Report Renderer service
- /reportRenderer
- Name: Report Templates service
- /reportTemplates
- Name: Report Transforms service
- /reportTransforms
- Name: reportViewerNaturalLanguageUnderstanding
- /reportViewerNaturalLanguageUnderstanding
- Name: Reports Persistence service
- /reports
- Name: Row Sets service
- /rowSets
- Name: Schedule service
- /scheduler
- Name: Score Definition service
- /scoreDefinitions
- Name: Score Execution service
- /scoreExecution
- Name: Search service
- /search
- Name: Search Index service
- /searchIndex
- Name: templates
- /templates
- Name: Tenant service
- /tenant
- Name: themeContent
- /themeContent
- Name: Themes service
- /themes
- Name: Thumbnails service
- /thumbnails
- Name: Transfer service
- /transfer
- Name: Transformations service
- /transformations
- Name: types
- /types
- Name: Web Data Access service
- /webDataAccess
- Name: Workflow service
- /workflow
- Name: Workflow Definition service
- /workflowDefinition
- Name: Workflow History service
- /workflowHistory

# Infrastructure Applications:
- Name: cas-shared-default
- /cas-shared-default
- Name: cas-shared-default-http
- /cas-shared-default-http
- Name: SAS Infrastructure Data Server
- /postgres
- Name: SAS Message Broker
- /rabbitmq


Let's see how having this list of services can be especially useful to administrators.


Suppose one of our services is either accidentally stopped or perhaps did not start sufficiently to register itself in the SAS Configuration Server.  I'm going to simulate this by stopping the graph templates service.


Let's see how this is reported by the healthcheck using the default method of obtaining the list of available services from the SAS Configuration Server.


./sas-admin healthcheck system-health check-status complex
Searching for services, applications, and infrastructure applications.

[Validating Health] .............................................................................................................

Services Endpoint Status HTTP Status Time of Call Duration
Discovery Table Provider /discoveryTableProvider down 503 2020-02-25T11:36:59.075Z 2358

"1" of "94" health validations failed.

The following errors were generated during the execution of this program:

The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred:
* The server failed to fulfill an apparently valid request.


Interestingly, the results still indicate that only the expected discoveryTableProvider service is down.  If you are a very observant administrator, you might notice that the total number of health validations is now 94 instead of the 95 we had earlier with a healthy system.  If you were in a hurry and not paying close attention, you might miss that small difference and assume that everything is ok when in fact there is a missing service.


Let's re-run the healthcheck but this time pass it our yaml file as the list of services we want checked by adding the --source-location option to the command.


./sas-admin healthcheck system-health check-status complex --source-location complexCheck.yml
[Validating Health] ...................................................................................................

Services                        Endpoint                    Status   HTTP Status   Time of Call               Duration
Discovery Table Provider        /discoveryTableProvider     down     503           2020-02-25T11:57:23.201Z   1499

Infrastructure Applications                                 Status                 Time of Call               Duration
Graph Template Service                                      down                   2020-02-25T11:57:25.047Z   1

"2" of "95" health validations failed.

The following errors were generated during the execution of this program:

The following error was encountered when making an endpoint call to "/discoveryTableProvider": 1 error occurred:
  * The server failed to fulfill an apparently valid request.
The following error was encountered when making an endpoint call to "/graphTemplates": 1 error occurred:
  * The requested resource could not be found.

Aha!  Now we can clearly see that there are two services down and the total number of health validations is back up to 95.  So forcing healthcheck to look for all of the services we expect to have in our deployment appears to provide a bit of protection for administrators in cases where services do not get registered into the SAS Configuration Server.


So as an administrator, I can now proactively monitor processes that I know should be in my deployment without having to rely completely on the SAS Configuration Server.  This should enable me to more reliably detect process issues and help maintain more robust system health.


There is much more to the healthcheck plug-in that I have touched on in this post so please take a look at the documentation for a more comprehensive understanding of the many options.


The sas-admin command line interface is one of, if not the best, administration tools for Viya. If you are not yet familiar with sas-admin I recommend that you read Gerry Nelson's posts SAS Viya command-line interfaces for Administration and Keeping the SAS Administration Command-Line interfaces up-to-date to learn more.


Hi Scott


Useful blog as always.


I presume the simple way to get around the warning message for the /discoveryTableProvider endpoint is to comment it out from the generated yaml file?


Will this be fixed in a future release as it's a usefull addition to the admin toolkit?





Hi Alan, thanks for reading. Yes, if you know there is a service that responds inappropriately or is not important to your monitoring needs it can be eliminated from the yaml file.
I do not believe the licensing issue would be something you would be likely to encounter. That is fortunately a problem only here at SAS where there is sometimes a lag between when new services are added and the license file we use is updated. It is, of course, possible you could see this if your company decides not to renew a SAS product in future years. In that case, the 503 error might be observable but your original solution could be used to ignore the service once the reason is known.

Thanks Scott


Actually, after upgrading my Viya install from v3.4 to v3.5 & updating the yaml file used by the healthcheck process, this was the only endpoint which returned an error  - exactly as per your example (all was well in Environment Manager though)

Something for Tech Suppprt I'd say?



Interesting...I'd say Tech Support is a good next step.

Hi Scott


I tried this for just this endpoint with verbose logging & as you say, the error relates to a license:


"message":"The product license was not found.",
"details":["traceId: a4dae0be1d56cf7e","path: /discoveryTableProvider/"],


Version history
Last update:
‎03-31-2020 11:53 AM
Updated by:

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags