Log monitoring updates in the SAS Viya Monitoring for Kubernetes project

1 Like

In my previous post , I briefly listed some of the more significant updates that have been made to the log and metric monitoring stacks in the SAS Viya Monitoring for Kubernetes project in the past year.

In this post, we'll look at some features specifically in the log monitoring stack of the project, and explain how to use them, such as the new logadm user, the refreshed Kibana interface, how to change the log retention period, and how to change internal user passwords in Elasticsearch/Kibana.

Here are the topics for this post. We will cover:

New logadm user
Select a Kibana tenant space
Refreshed Kibana user interface
Log retention in Elasticsearch Indices
How to adjust log retention
How to change log monitoring user passwords like admin, logadm and users created for specific Kibana tenant spaces

New logadm user

SAS Viya Monitoring for Kubernetes version 1.1.3 was released on January 14th 2022. It introduced a new Kibana user, logadm. As the release notes say, this user is intended to be the primary day-to-day Kibana user. You can read more about how its permissions differ from those of the admin user here. The release notes contain useful information about this user that's worth highlighting here:

To set the password for the logadm user, set the LOG_LOGADM_PASSWD environment variable in the same file. If no password has been set for the logadm user, but a password has been set for the admin user (by using the ES_ADMIN_PASSWD environment variable), that password is used for the logadm user also. If neither password has been set, random passwords are generated for these accounts. These random passwords are displayed in the log messages generated during the deployment process.

If you have version 1.1.3 or later of this project, why not get into the habit of logging in to Kibana as logadm rather than admin, unless you know you need to log in as a different user (e.g. to do real administrative work or to work on Kibana tenant space-specific resources, such as dashboards and visualizations).

Select a Kibana tenant space

From version 1.1.0 of the project, the first time you sign into Kibana, you are prompted to 'Select your tenant':

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

This is because since release 1.1.0 of the SAS Viya 4 Monitoring for Kubernetes project, Elasticsearch and Kibana have been configured to support application multi-tenancy.

Until you set up additional Kibana tenant spaces (I will cover this in a future post here), the only available option is 'Choose from custom', 'cluster_admins'. Notice the checkbox highlighted in red above, labelled 'Remember my selection next time I log in from this device.' Check (tick) this checkbox so that you are not prompted to select your tenant next time you sign in. Then just click 'Confirm'.

Note that Kibana tenant spaces are separate from SAS Viya tenants, though it would be normal to create a Kibana tenant space for each SAS Viya tenant. I'll cover these two related but distinct concepts in another post.

Refreshed Kibana user interface

We upgraded to a much newer version of Open Distro for Elasticsearch in SAS Viya Monitoring for Kubernetes release 1.1.0. There are notes providing important information about Kibana in this release, which:

Enabled multi-tenancy in Kibana (hence selecting a tenant space as described earlier)
Replaced the persistent navigation bar on the left of the Kibana web application with a menu button (☰). When clicked, this temporarily shows a navigation bar, with words describing the pages instead of icons. You can make it remain visible by expanding the Management section, and clicking 'Dock navigation':

In that navigation menu, below the Kibana heading, you can find pages that will be familiar if you used earlier releases of Kibana, such as Discover and Dashboard. These pages have not changed very much: the menu buttons across the top have moved from the left to the right side of the window and the Reporting button is new.

Old Kibana menu bar in the Discover page

New Kibana menu bar in the Discover page

Collapsing the navigation menu most of the time frees up screen space to see more data - it's an improvement.

Log retention in Elasticsearch Indices

Log messages (and their contextual data) are stored in Elasticsearch indices. Each index holds data from one UTC day and one service, so there are many separate indices, and they need to be deleted after a time to avoid filling the storage space allocated to Elasticsearch data.

Two index policies are defined during the project's initial installation and those policies are are then applied to indices to tell Elasticsearch how long to retain each index, and to delete the indexed log data when that retention period is over.

Tip: The logadm user does not have permission to see index management settings, so this is one situation where you will need to log into Kibana as admin.

Select 'Index Management' in the Kibana navigation menu to see this:

Let's look at these two index policies in the reverse of the order in which they are listed. Out of the box:

viya_logs_idxmgmt_policy is applied to SAS Viya pods, and most other pods in the cluster, to keep Viya logs for 3 days
viya_ops_idxmgmt_policy is applied to pods deployed by SAS Viya Monitoring for Kubernetes in its logging and monitoring namespaces, to keep logging and monitoring pod logs for 1 day

Click on each of the policies in turn to see its definition in the form of a JSON document. (I don't know why they don't provide a nicer user interface for this!). Here is the default JSON document for the viya_logs_idxmgmt_policy in full:

{
    "policy_id": "viya_logs_idxmgmt_policy",
    "description": "An index management policy for log messages from SAS Viya deployments.",
    "last_updated_time": 1645112966733,
    "schema_version": 1,
    "error_notification": null,
    "default_state": "hot",
    "states": [
        {
            "name": "hot",
            "actions": [],
            "transitions": [
                {
                    "state_name": "doomed",
                    "conditions": {
                        "min_index_age": "3d"
                    }
                }
            ]
        },
        {
            "name": "doomed",
            "actions": [
                {
                    "delete": {}
                }
            ],
            "transitions": []
        }
    ],
    "ism_template": {
        "index_patterns": [
            "viya_logs-*"
        ],
        "priority": 100,
        "last_updated_time": 1645112966733
    }
}

This policy's defines a default state of hot, and defines the hot state such that no action is performed on indices in that state, but they should transition the to doomed state when they have reached an age of at least 3d or 3 days. In the doomed state, they should be deleted.

That's a long-winded way of saying indices managed by this policy should be kept for 3 days, then deleted. The viya_ops_idxmgmt_policy policy for logging and monitoring pods is similar, except that indices are doomed to be deleted after 1 day.

How to adjust log retention

You can adjust the log retention periods before deployment, during re-deployment or afterwards.

Before deployment

Important: These steps cover situations where:

- you have never deployed the logging components of SAS Viya Monitoring for Kubernetes in this cluster, or
- you have removed the logging stack AND you have deleted the logging PVCs or namespace
  
  If you have previously deployed the logging components, first see 'During re-deployment' below.

To specify the number of days to retain Viya and other Kubernetes logs before you deploy the logging components of SAS Viya Monitoring for Kubernetes, specify an integer number of days for the LOG_RETENTION_PERIOD environment variable.

As with all the other environment variables used to manage deployment-time configuration, there are several ways you can set this environment variable; it can be exported from the parent shell, set in the deployment command line, or set in the user.env files. I won't try to explain all those ways here - they are well covered in the Customize the Deployment section of the Logging README.

Here's an example, showing a partial listing of ${USER_DIR}/logging/user.env which sets both the LOG_RETENTION_PERIOD and the OPS_LOG_RETENTION_PERIOD:

...
LOG_RETENTION_PERIOD=5
OPS_LOG_RETENTION_PERIOD=2
...

Once you have set a value for one or both log retention periods by whichever of the available methods you prefer, deploy the logging components with e.g. viya4-monitoring-kubernetes/logging/bin/deploy_logging_open.sh, viya4-monitoring-kubernetes/logging/bin/deploy_logging_open_openshift.sh or whichever other script is appropriate for your SAS Viya and Kubernetes platform.

When the deployment is complete, log into Kibana and view the Index Management policies. The log retention periods should be set to values you specified, or their default values if you did not specify a value e.g.:

During re-deployment (N.B. delete the logging namespace)

Important: If you have previously deployed the logging components, and you want to re-deploy Kibana to pick up new settings, it is not enough to either:

- re-run the deployment script e.g. ./logging/deploy_logging_open.sh, or to
- run e.g. ./logging/remove_logging_open.sh, and then run e.g. ./logging/deploy_logging_open.sh.

Neither of those approaches alone will change existing settings - the former values for log retention period will be preserved in logging namespace PVCs. Specifically:

- The remove scripts do not delete the logging namespace or its PVCs, unless you set the LOG_DELETE_NAMESPACE_ON_REMOVE flag to true before running the remove script. This flag is not really documented as being intended for this specific purpose, but it works well.
- The deploy scripts do not overwrite data or configuration settings in the logging namespace PVCs.

In a test environment, I removed the logging stack by running ./logging/remove_logging_open.sh without setting a value for LOG_DELETE_NAMESPACE_ON_REMOVE, and afterwards my cluster still had my logging namespace (called v4mlog😞

# kubectl get namespace | grep log
v4mlog            Active   5d23h

...and six PVCs in that namespace:

# kubectl -n v4mlog get pvc
NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-v4m-es-data-0     Bound    pvc-d300d172-6723-4a87-bfa5-9d5035d31328   30Gi       RWO            nfs-client     5d23h
data-v4m-es-data-1     Bound    pvc-3f76debd-e61a-41e0-b8a1-88e924d8ecff   30Gi       RWO            nfs-client     5d23h
data-v4m-es-data-2     Bound    pvc-7cc783b3-c193-4ea6-84e6-019e677b8fd1   30Gi       RWO            nfs-client     5d23h
data-v4m-es-master-0   Bound    pvc-af4abe16-3ab7-47a0-9353-cae3a53be7a6   8Gi        RWO            nfs-client     5d23h
data-v4m-es-master-1   Bound    pvc-d0cb70ad-11d4-4914-8fe1-55bb56abd137   8Gi        RWO            nfs-client     5d23h
data-v4m-es-master-2   Bound    pvc-8165135f-1b7d-4b08-8372-9cc71dbc8e85   8Gi        RWO            nfs-client     5d23h

So we see that the script to remove the logging components does not remove every last trace of them them: the data remains.

This might not be what you expect, but the remove scripts leave the data behind for a reason.

The SAS Viya 4 Monitoring for Kubernetes logging and monitoring stacks are designed to support being 'stopped and started' (e.g. overnight and at weekends to save resources) by being removed and redeployed. There is no separate 'scale to zero' capability, but removing the deployment effects the logging components in a way that is comparable with scaling SAS Viya to zero. The pods are removed. The PVCs containing the data, and the namespace remain.

When you want to 'stop' logging for your cluster, you can run the 'remove' scripts for the logging components. However, these scripts keep the logging components' data and configuration in PVCs in the logging namespace. Then, when you want to 'start' logging for your cluster again, you can re-run the 'deploy' scripts to redeploy the logging components again. Any data and configuration data found to already exist in the logging namespace PVCs is preserved and can be used again.

This means that whatever log data and manual changes to the configuration are saved during a normal uninstall and reinstall of the logging stack, and are not overwritten with 'default' settings as it is redeployed. This is generally desirable behavior, but the downside is that editing the user.env file and re-deploying (with or without running the 'remove' scripts first) will not change the settings in the deployed software.

Therefore, to apply new settings, such as new log retention periods, to an existing deployment of the logging stack, you must first do one of the following three things, any of which will delete all logging data and configuration data you have in your logging stack:

Set the shell environment variable LOG_DELETE_NAMESPACE_ON_REMOVE flag to true and then run the remove_logging_open.sh script (or its equivalent for your environment). Here is an example:

cd ~/viya4-monitoring-kubernetes/
LOG_DELETE_NAMESPACE_ON_REMOVE=true ./logging/bin/remove_logging_open.sh

OR, run the remove_logging_open.sh script, and then delete any remaining PVCs in the logging namespace
OR, run the remove_logging_open.sh script, and then delete the logging namespace (which also deletes the PVCs)

After doing one of those things, you can set values for LOG_RETENTION_PERIOD and/or OPS_LOG_RETENTION_PERIOD as desired and deploy as normal, as we described earlier under Before deployment.

After deployment

While manually changing a log retention period works and is easy, the documentation for Adjusting the Retention Policy describes a consideration which I think is worth repeating:

If you are maintaining customized configuration information (that is, using the USER_DIR functionality), consider updating the contents of the logging/user.env file to reflect the modified retention periods. Doing so ensures that your updated configuration is re-created if you redeploy the log-monitoring components.

You don't have to redeploy when you update the logging/user.env file. Editing it just keeps your script-based and manual configuration consistent with each other.

In the Kibana Index Management page, as admin, click on the policy you want to alter, then click Edit at the bottom of the dialog that pops up. Find the min_index_age key and set its value to the desired number of days, as an integer suffixed with a 'd'. For example, to have Viya logs retained for four days, set the value for min_index_age to 4d like this:

Click Update to save your changes.

Then, we need to apply this updated index policy to the existing indices to which its previous version was applied. This does not happen automatically.

So, still in the Index Management page in Kibana, still logged in as admin, choose Managed Indices from the menu on the left. Then click the Change policy button top right, type viya_logs- in the Managed indices filter, and accept the suggested value of viya_logs-*. Under Choose new policy, select viya_logs_idxmgmt_policy from the New policy list box:

Then click Change at the bottom.

All these steps are explained in a little more detail in the documentation. If it works, you should see a confirmation message briefly pop up at the bottom right of the application window, something like this, though the number of indices whose policy changed will likely be different than this for you:

If you then go back to the Managed Indices view, and click on any index that has the viya_logs_idxmgmt_policy applied to it, you should now see that it has the new retention period.

How to change log monitoring user passwords

You can set a password of your choice for the Elasticsearch (and Kibana) admin user, and for the logadm user, before you deploy the logging stack. Just set the environment variable ES_ADMIN_PASSWD for the admin user, or set the LOG_LOGADM_PASSWD for the logadm user, and then deploy. I find it easiest to set these in the logging/user.env file, like this:

...
ES_ADMIN_PASSWD=yourPasswordHere
LOG_LOGADM_PASSWD=yourPasswordHere
...

Note that as discussed in New logadm user earlier, if no password has been set for the logadm user, but a password has been set for the admin user (by using the ES_ADMIN_PASSWD environment variable), that password is used for the logadm user also. So if you want to set the same password for both users, as we often do in our workshop environments, you can just set a value for ES_ADMIN_PASSWD and that will set the same password for both these users.

Or, if you have already deployed the logging stack, use the ./logging/bin/change_internal_password.sh script to change them, like this:

./logging/bin/change_internal_password.sh admin yourPasswordHere
./logging/bin/change_internal_password.sh logadm yourPasswordHere

In this case, you do have to change each password separately. The script runs in just a few seconds, and the output looks something like this (for the admin user):

INFO User directory: /home/cloud-user/.v4m
INFO Kubernetes client version: v1.18.8
INFO Kubernetes server version: v1.20.8
INFO Loading user environment file: /home/cloud-user/.v4m/logging/user.env

INFO Successfully changed the password for [admin] in Elasticsearch internal database.
secret "internal-user-admin" deleted
INFO Created secret for Elasticsearch user credentials [admin]
secret/internal-user-admin labeled

In contrast to the log retention period settings, in production I would be inclined to use this script to change the admin and logadm passwords interactively, and NOT record them in the user.env file for all to see. Record the new passwords in a safer place, such as a password vault instead. That's all for this rather longer post. My sincere thanks to Greg Smith for his help and generous feedback in preparing this post.

See you next time!

Find more articles from SAS Global Enablement and Learning here.