In my last post, we looked at Managing monitoring data retention period and size in SAS Viya Monitoring for Kubernetes. In this post, let's find out how much log data is currently being stored by OpenSearch or ElasticSearch, how big the PVCs used to store it are individually and all together, and how to change the log data retention period.
In SAS Viya Monitoring for Kubernetes (v4m for short), log messages and their contextual data are gathered by Fluent Bit, streamed to OpenSearch and displayed in OpenSearch Dashboards. Versions of v4m earlier than version 1.2.0 used ElasticSearch and Kibana, which were essentially the same things. As we will see, both the new and old versions of these tools manage log data retention and size in the same way, so we can explain them together.
Incidentally, log retention is covered by recently updated documentation in SAS Help Center here, in place of the README files that were part of the SAS Viya Monitoring for Kubernetes project before. I think this is a big improvement.
Contrary to what you might expect, old log data is not deleted as soon as it becomes more than 3 days old. Rather, whole days' of data are deleted at a time, once per day. Here's how it works in detail.
SAS Viya Monitoring for Kubernetes configures OpenSearch to keep log data in many separate indices, a bit like table partitions in other databases. There are separate indices for:
The 'one set of indices per UTC day' bit is most relevant for this post.
Each day at 00:00 UTC, OpenSearch stops writing log data to the current set of indices, and starts writing it to a new set. Separately, a job runs every few minutes in OpenSearch, and checks if the creation date of each index is older than 3 days ago. If the creation date of an index is older than that, the whole index is deleted. OpenSearch and ElasticSearch don't delete individual log messages from indices as they become older than 3 days, they delete an entire day worth of indexes, and thus that day's entire set of log messages, more or less all at once.
With the default log data retention period of 3 days, let's imagine you look at the system at any time on 4th January UTC. OpenSearch will be writing log data as it arrives into the set of indices for 4th January. (There is a date in the index name).
The sets of indices which were created moments after midnight on 3rd and 2nd January, each containing all the log data from one of those dates, will still be stored in the PVC and loaded into memory.
However, just after midnight on 4th January, the set of indices containing all the log data collected on 1st January was created just over 3 days ago. So the set of indices for 1st January were deleted by a maintenance job, just after midnight UTC on 4th January. This means that in practice, the 3 day retention period results in log data being available for the past 2 days plus the time since the most recent midnight UTC.
One more thing: in contrast to Prometheus's metric data, there is no enforced limit to log data size, other than the size of the PVC used to store the log data. It is quite possible to fill that PVC, which causes OpenSearch to stop working properly, so we need to know how to avoid that happening.
The method for seeing how much storage is available and how much is actually being used depends on the type of Storage Class used for your Persistent Volume Claims. In our workshop environments we use a simple NFS share, but in a production environment you should use something better, like Azure Files or S3.
Note: NFS storage is not ideal in a cloud environment, and we don't recommend it for production deployments. However, it is easy (and free) to use in our RACE-based classroom and lab environments, and it is therefore the only persistent Storage Class currently available in them.
In our workshop environments, the default (and only) storage class is nfs-client
. The files written to PVCs are ultimately written to a directory structure in the filesystem on the sasnode01
host in each collection, since that is where the cluster's NFS server runs. From a shell on the sasnode01
host, we can browse that filesystem and find the data under /srv/nfs/kubedata
. This is highly implementation-specific. There is little chance a customer deployment would be set up like this. Talk to your architect or Kubernetes administrator, and they may be able to suggest something similar to the following that makes sense in your environment.
By default, our SAS Viya Monitoring for Kubernetes project configures and deploys OpenSearch or ElasticSearch with three pods in its v4m-search
statefulset (OpenSearch, v4m 1.2.0 later) or its v4m-es-data
statefulset (ElasticSearch, v4m 1.1.8 and earlier).
Each of these three pods has its own PVC. The aggregate storage size of these three PVCs is the total storage available to OpenSearch or ElasticSearch. Exactly what data is stored in these PVCs varies slightly between the two versions - OpenSearch uses just one set of PVCs for all of its data, whereas ElasticSearch has a second, smaller set of PVCs for 'master data', which I presume stores provided and user-created objects and other management data, but our focus here will be on the main data PVCs.
Our SAS Viya Monitoring for Kubernetes project is configured to request each of those PVCs to be 30 GiB (30 Gibibytes = 30 x 230 bytes). So across the 3 pods, by default OpenSearch or ElasticSearch requests 3 x 30 GiB = 90 GiB storage in total.
You can see how this is specified either in the user-values-opensearch.yaml
file (OpenSearch) or the user-values-elasticsearch-open.yaml
file (ElasticSearch) in the v4m USER_DIR/logging
directory, or in the default helm charts referenced at the top of each of those files, if those files don't override the defaults. Look for a value for replicas
for how many pods there should be in the statefulset, and (often rather separately) a persistence
section containing a size
value which may be something like 30Gi
. So, in theory, that's what we should actually have.
In theory, there's no difference between theory and practice. But in practice, there is. You can find the actual size of the PVCs in the monitoring namespace in practice using kubectl or Lens. In kubectl, try something like this (where logging
is the namespace where the log monitoring components of SAS Viya Monitoring for Kubernetes are deployed):kubectl get pvc -n logging
Look for the number and size of the v4m-search-v4m-search-*
PVCs (OpenSearch) or the data-v4m-es-data-*
PVCs (ElasticSearch).
In one of our workshop deployments with v4m 1.2.1, the logging (or if you prefer, log monitoring) namespace is v4mlog
, so here's what that looks like for OpenSearch:
[cloud-user@hostname logging]$ kubectl get pvc -n v4mlog NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE v4m-search-v4m-search-0 Bound pvc-82514968-1728-4bc8-95b7-708b08043af1 30Gi RWO nfs-client 25h v4m-search-v4m-search-1 Bound pvc-82ffb122-1730-4780-825c-ce8887e0b01f 30Gi RWO nfs-client 25h v4m-search-v4m-search-2 Bound pvc-47e35a8e-423c-4490-bbb7-a17e5e0def17 30Gi RWO nfs-client 25h
In another of our workshop deployments with v4m 1.1.8, this is the equivalent for ElasticSearch:
[cloud-user@hostname ~]$ kubectl get pvc -n v4mlog NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-v4m-es-data-0 Bound pvc-ed9314dc-06b4-471a-9a37-499cf42ddcc9 30Gi RWO nfs-client 19d data-v4m-es-data-1 Bound pvc-4127753a-528b-40ec-93c8-3826c6397d5c 30Gi RWO nfs-client 19d data-v4m-es-data-2 Bound pvc-804a632c-97a4-4e7b-942d-55ba6fd6828c 30Gi RWO nfs-client 19d data-v4m-es-master-0 Bound pvc-18252229-6e88-4e44-9037-15ccd68664f0 8Gi RWO nfs-client 19d data-v4m-es-master-1 Bound pvc-2047fd01-0c15-4f90-842a-a86129647580 8Gi RWO nfs-client 19d data-v4m-es-master-2 Bound pvc-a68baa6b-fde8-47d4-9de7-2970d0be9d48 8Gi RWO nfs-client 19d
From this we can see that the size of the total data storage for OpenSearch or ElasticSearch is 3 x 30Gi = 90Gi.
Here is the OpenSearch PVC set in Lens:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Either of these ways to see the PVC size should work on all environments, if you have kubectl or Lens, a kube config file which lets you access the cluster, and you know your log monitoring namespace name.
So we know how big the PVCs are. How much of that space is being used?
This is more implementation-specific. Since we are using NFS for our PVCs, as explained above the data ends up on sasnode01
under/srv/nfs/kubedata
. Let's look at how to find the amount of data stored on our NFS share - the actual values below are not typical; they are likely much smaller than you should expect in a real production environment.
If you have v4m version 1.2.0 or later, something along the lines of this command might show how big the data in the three OpenSearch PVCs actually is:
for f in /srv/nfs/kubedata/v4mlog-v4m-search-v4m-search-* ; do sudo du -xsh $f; done
Substitute the path to the kubedata directory in your NFS shared volume in place of /srv/nfs
above. Here is some example output from a very lightly-used workshop environment:
[cloud-user@hostname logging]$ for f in /srv/nfs/kubedata/v4mlog-v4m-search-v4m-search-* ; do sudo du -xsh $f; done 4.5G /srv/nfs/kubedata/v4mlog-v4m-search-v4m-search-0-pvc-82514968-1728-4bc8-95b7-708b08043af1 3.1G /srv/nfs/kubedata/v4mlog-v4m-search-v4m-search-1-pvc-82ffb122-1730-4780-825c-ce8887e0b01f 5.7G /srv/nfs/kubedata/v4mlog-v4m-search-v4m-search-2-pvc-47e35a8e-423c-4490-bbb7-a17e5e0def17
So in this environment, there is currently 4.5G + 3.1G + 5.7G = 13.3GiB of log data. Well, actually, slightly less than that. These PVC directories also contain a few MB of other stored data. Remember this example is intended to show how to calculate disk usage on our NFS storage, not to give estimates of typical usage.
If you have v4m version 1.1.8 or earlier, something along the lines of this command might show how big the data in the three ElasticSearch data PVCs actually is:
for f in /srv/nfs/kubedata/v4mlog-data-v4m-es-data-* ; do sudo du -xsh $f; done
Substitute the path to the kubedata directory in your NFS shared volume in place of /srv/nfs
above. Here is some example output from a very lightly-used workshop environment:
[cloud-user@hostname ~]$ for f in /srv/nfs/kubedata/v4mlog-data-v4m-es-data-* ; do sudo du -xsh $f; done 1.3G /srv/nfs/kubedata/v4mlog-data-v4m-es-data-0-pvc-ed9314dc-06b4-471a-9a37-499cf42ddcc9 511M /srv/nfs/kubedata/v4mlog-data-v4m-es-data-1-pvc-4127753a-528b-40ec-93c8-3826c6397d5c 854M /srv/nfs/kubedata/v4mlog-data-v4m-es-data-2-pvc-804a632c-97a4-4e7b-942d-55ba6fd6828c
So in this environment, there is currently 1.3G + (511/1024)G + (854/1024)G = 2.63GiB of log data. Again, this is meant to show the method for calculating usage, not to give estimates of typical usage.
Another way to roughly estimate log data size - and one which does not require command line access to the servers or a kube config file - is to open the OpenSearch Dashboards (or Kibana) Index Management page, and switch to the Indices tab. This tab shows a table of all Indices currently held in OpenSearch (or ElasticSearch), with statistics for each index including its total size:
With a bit of patience, you could manually add up the 'Total size' values of the each row. The table is paged, so make sure you include each page. It is possible that there is an API or command line way to do this - I have not explored OpenSearch or ElasticSearch APIs or command-line tools.
The rate at which log data is collected can be quite volatile over time, as it depends very much on how heavily SAS Viya and other applications in the Kubernetes cluster are used, as well as on whether log thresholds are changed to e.g. increase log detail and for how long. Extrapolating log data growth over time is therefore more of an art than a science, but it's worth estimating (or guess-timating) based on what you know about how your collection's level of activity now, historical log data size and your understanding of how you are likely to change logging levels. Then monitor log data size closely. This may reveal that your early estimates are quite inaccurate, but it's better than not estimating and monitoring it at all.
To see the current log data retention period, follow these steps:
https://osd.ingress_controller_hostname/
where ingress_controller_hostname
is the full hostname of the sasnode01
host in our Kubernetes cluster. For example, in one RACE collection I happen to have running, this is http://osd.pdcesx02020.race.sas.com
. We are considering changing the hostname prefix from osd to something else to avoid confusion with SAS/ODS, so if you don't find it this way, check the workshop instructions for where to find OpenSearch Dashboards.https://kibana.ingress_controller_hostname/
where ingress_controller_hostname
is the full hostname of the sasnode01
host in our Kubernetes cluster.http://ingress_controller_hostname/dashboards
where ingress_controller_hostname
is the hostname of your Kubernetes cluster's ingress controller.http://ingress_controller_hostname/kibana
where ingress_controller_hostname
is the hostname of your Kubernetes cluster's ingress controller.kubectl get ingress -n logging
where logging
is the namespace in which the SAS Viya Monitoring for Kubernetes logging (or log monitoring) components are deployed. Look for an ingress named v4m-osd
or v4m-es-kibana-ing
depending on whether you have v4m 1.2.0 or later, or 1.1.8 or earlier.v4m-osd
or v4m-es-kibana-ing
.admin
.cluster_admins
), check the checkbox for 'Remember my selection next time I log in from this device' if there is one, and click Confirm.
Here, '3d
' means 3 days. The only units that are sensible in the context of our SAS Viya Monitoring for Kubernetes configuration are whole days.
There are two steps to changing the log data retention period.
First, change the index management policy, and optionally second, re-apply that policy to existing indices.
Let's start by changing the policy. From the page where you see the index management policy details, follow these steps:
"min_index_age": "3d"
, and change it to a new value, e.g. "min_index_age": "4d"
. Then click Update.
Read the OpenSearch or ElasticSearch documentation if you want to know more about making your own index management policy; that is beyond the scope of this post. We are only trying to change the retention period in an existing policy.
Next, apply the changed policy to existing SAS Viya indices:
viya_logs-
, and accept the suggested value of viya_logs-*
.
viya_log_idxmgmt_policy
from the dropdown list.
When you change the log retention period interactively like this, it is recommend to also update the LOG_RETENTION_PERIOD value in your in USER_DIR/logging/user.env
to keep it consistent with this change. This is sensible, because if you later remove and redeploy the v4m logging stack, your new configuration (which in this case keeps indices for e.g. 4 days) is preserved, instead of reverting to the original configuration (which keeps indices for 3 days). We made a similar recommendation for metric data in my previous post.
Hopefully this gives a reasonably complete explanation of finding the PVC size and usage, and finding and changing the log data retention period. I have intentionally not covered changing the PVC size, in either this post or my previous one. That is a sufficiently complex topic to explain that it deserves a post of its own, mostly because the type of storage class used for your PVCs greatly affects the method for changing its size, and also greatly effects whether you can change the PVC size without dropping and re-creating all the data currently stored in it. See you next time!
Find more articles from SAS Global Enablement and Learning here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.