A second look at Azure Monitor Logs for AKS clusters

2 Likes

Last time we took a first look at Azure Monitor Logs for AKS clusters, and saw how to create an Azure log analytics workspace from the Azure CLI and then enable monitoring for your AKS cluster. Today I'll assume you've already done that, and also followed the next steps in that post to confirm log data is being collected in Azure Monitor Logs.

Having spent a little more time with these Azure-native logging tools, I'm fairly confident you're better off using SAS Viya Monitoring for Kubernetes. It's prettier, it has more features, it is much easier to use, and it works just as well in an Azure AKS cluster as anywhere else. But if you really want to, or really have to use Azure's own tools for viewing SAS Viya logs in tables or charts, or if you want see a bit about how much they cost, read on!

Still here? This time we're going to create some distinctive log messages, and explore these log messages in a bit more depth with Azure Monitor Logging. We'll use a couple of example Kusto queries which format the logs a bit better, and make them more readable. We'll view log data in a summary chart, and see how much of SAS's money I spent on Azure Monitoring logging writing a blog post. All the following code examples have a green bar at the side if they are commands you can run (perhaps after modifying them to suit your needs):

Green bars are for commands you can run

Finally, this post includes the code of an example Kusto SQL query from the Microsoft Azure Monitor Logs site, and that is shown in a box with a purple bar:

Purple bars are for Kusto queries

Create some distinctive log messages

Open SAS Studio in your cluster. The URL for SAS Studio in my cluster (username: sukdws, region: eastus) was http://sukdwsvk.eastus.cloudapp.azure.com/SASStudio. Obviously your URL for SAS Studio will differ from mine.

Log on to SAS Studio and if necessary skip the setup and the tour. When SAS Studio finishes opening, open a New SAS Program, and run this code to put some distinctive text in the SAS logs:

%put The quick brown fox jumped over the lazy dog;

It should look something like this:

SAS Studio after writing a distinctive log message SAS Studio after writing a distinctive log message

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Find the distinctive log message in Azure Monitor Logs

If you successfully followed the steps in my previous post, including the steps to 'Confirm log data is being collected in Azure Monitor Logs', then this distinctive log message output by the compute server (compsrv) will have been captured in the Azure Log Analytics Workspace. Let's go find it! Last time we ran a very basic Kusto query, like this:

// List container logs per namespace
// View container logs from all the namespaces in the cluster.
ContainerLog
|join(KubePodInventory| where TimeGenerated > startofday(ago(1h)))//KubePodInventory Contains namespace information
on ContainerID
|where TimeGenerated > startofday(ago(1h))
| project TimeGenerated ,Namespace , LogEntrySource , LogEntry

This time, add one extra line to it, as the second-last line:

// List container logs per namespace
// View container logs from all the namespaces in the cluster.
ContainerLog
|join(KubePodInventory| where TimeGenerated > startofday(ago(1h)))//KubePodInventory Contains namespace information
on ContainerID
|where TimeGenerated > startofday(ago(1h))
| where LogEntry contains "The quick brown fox"
| project TimeGenerated ,Namespace , LogEntrySource , LogEntry

Later on, I'm going to refer to the query above as the 'original, basic query'.

The number of results you get may vary a bit, but the time I tried this while writing this blog post, I got 45 (!) rows of results, all containing that string. The longer I waited after running the program in SAS Studio before I ran or re-ran this query in Azure Monitor Logs, the more rows of results I seemed to get:

Azure Monitor Logs Results of basic query to search for “The quick brown fox” Azure Monitor Logs Results of basic query to search for “The quick brown fox”

As before, these are clearly logs from our deployment and the message we're interested in is there, but it is still not very easy to see:

which container or service generated the log message
the severity of the message (DEBUG, INFO, WARNING, ERROR etc.) where there is one
the text of the log message (in some cases)
there are definitely duplicate rows

Let's replace the original, basic query in the New Query tab in Azure Monitor Logs with the following new, improved query:

let startTimestamp = ago(1h);
KubePodInventory
| where TimeGenerated > startTimestamp
| project ContainerID, PodName=Name, Namespace
| distinct ContainerID, PodName, Namespace
| where Namespace == "lab"
| where PodName contains "sas-launcher"
| join(ContainerLog | where TimeGenerated > startTimestamp) on ContainerID
| where LogEntry contains "The quick brown fox"
// The next line parses the JSON doc in LogEntry and stores the result in an dynamic variable called 'log'
// Doing this allowing us to extract individual fields from that object
| extend log=parse_json(LogEntry)
| extend Level=log.level
| extend Source=log.source
| extend Message=log.message
| project TimeGenerated, Namespace, Source, Level, Message, PodName, LogEntry, LogEntrySource
| order by TimeGenerated desc

Let's run this new, improved Kusto query first, see the results it produces. Afterwards, we will break it down and compare it with the original basic query to understand how each works. When you run this new, improved query, the results should look something like this:

Azure Monitor Logs improved Kusto query Azure Monitor Logs improved Kusto query

This is better than the original, basic query because:

There are only two rows of results, not dozens. We fixed the duplicates issue and filtered the results better.
The log source, level (i.e. SAS's severity indicator) and message are much easier to see. They have their own columns in the results table.
We can even see which pod generated the messages - they are from a pod named sas-launcher-long-guid where long-guid is some long id number. In fact, the Kusto query is filtered to only show messages from pods whose names contain the string "sas-launcher". This helped eliminate some unwanted rows, apparently from a search.appendIndex call in CAS.

When querying Azure Monitor Logs, Kusto queries can be written in either Kusto query language (or KQL, which is the 'preferred' language) as T-SQL select statements (for compatibility with tools that can't easily be converted to use KQL).

Elements of our improved log query

Let's break down the things in this query that improve on the previous one.

Original, basic query	New, improved query	Notes
	let startTimestamp = ago(1h);	1. The new improved query defines a variable containing the start time, of 1 hour ago. This offers a slightly more convenient way to change the time range over which log messages are displayed, and doing so once, when the time is used in two places in the new query. But other than that convenience, it is not materially different to setting the time inline, like the original query does.
ContainerLog	KubePodInventory \| where TimeGenerated > startTimestamp \| project ContainerID, PodName=Name, Namespace \| distinct ContainerID, PodName, Namespace \| where Namespace == "lab" \| where PodName contains "sas-launcher"	2. The Original query takes both the ContainerLog table and the KubePodInventory table, and joins them before any filters are applied. This works, but it requires matching of more rows from both tables, and could thus potentially be slightly slower, though in practice I didn't notice a meaningful difference in query time. The new query filters down the rows in BOTH tables before they are joined, keeping only rows from either table generated since since the start time stamp, and also filtering the KubePodInventory rows to keep only those for the namespace and pod we are interested in, and taking a distinct set of the resulting container, pod and namespace(s) to avoid duplicate results. This makes for a much more efficient, deduped query.
\|join(KubePodInventory\| where TimeGenerated > startofday(ago(1h))) //KubePodInventory Contains namespace information on ContainerID	\| join(ContainerLog \| where TimeGenerated > startTimestamp) on ContainerID	3. Both queries the join the (filtered or unfiltered) rows from KubePodInventory and ContainerLog on ContainerID. We discussed the time filters in these lines already.
\|where TimeGenerated > startofday(ago(1h))		4. The original query then further filters the results to keep only those from the most recent day. The new query doesn't need to do that, it is done already.
\| where LogEntry contains "The quick brown fox"	\| where LogEntry contains "The quick brown fox"	5. Both queries filter for messages containing "The quick brown fox" in the same way.
	// The next line parses the JSON doc in LogEntry and stores the result in an dynamic variable called 'log' // Doing this allowing us to extract individual fields from that object \| extend log=parse_json(LogEntry) \| extend Level=log.level \| extend Source=log.source \| extend Message=log.message	6. The new query then uses a function called parse_json to extract the values out of the JSON document stored in the LogEntry field. It temporarily stores these in a dynamic object we chose to name 'log', before extracting the values we are interested in, from log.level, log.source and log.message. Try experimenting with extracting other values from this field if you like!
\| project TimeGenerated ,Namespace , LogEntrySource , LogEntry	\| project TimeGenerated, Namespace, Source, Level, Message, PodName, LogEntry, LogEntrySource	7. The project operator in Kusto selects the columns in the results so far to include, rename or drop, and can also be used to insert new computed columns. Here, both queries use it to select which columns to keep.
	\| order by TimeGenerated desc	8. The new query orders the resulting log message rows, newest first. Remove the 'desc' if you want oldest first.

Search the web for other examples of functions and operators you can use in Kusto to modify log queries to display log messages in a tabular format that is most useful to you.

Summary Charts in Azure Monitor Logs

Our resident logging observability guru Greg Smith shows how you can create rudimentary charts using the Kusto summarize operator and the Chart display tab of Azure Monitor logs, in the AZURE_LOG_ANALYTICS_WORKSPACES.md readme file in the SAS Viya Monitoring for Kubernetes project. It's worth a read.

To demonstrate, we'll modify our new, improved query slightly, commenting out (//) the line that filters to just show log messages to only have those where PodName contains "sas-launcher", and adding a new summarize line at the end. This 'chart query' is then:

let startTimestamp = ago(1h);
KubePodInventory
| where TimeGenerated > startTimestamp
| project ContainerID, PodName=Name, Namespace
| distinct ContainerID, PodName, Namespace
| where Namespace == "test"
//| where PodName contains "sas-launcher"
| join(ContainerLog | where TimeGenerated > startTimestamp) on ContainerID
// The next line parses the JSON doc in LogEntry and stores the result in an dynamic variable called 'log'
// Doing this allowing us to extract individual fields from that object
| extend log=parse_json(LogEntry)
| extend Level=log.level
| extend Source=log.source
| extend Message=log.message
| project TimeGenerated, Namespace, Source, Level, Message, PodName, LogEntry, LogEntrySource
| order by TimeGenerated desc
| summarize msgcount=count() by tostring(Level), tostring(Source)

The results table doesn't look very impressive:

Azure Monitor Logs chart query table Azure Monitor Logs chart query table

But if you click the Chart tab, next to results, just above the table, you get a simple histogram showing the same data which is quite nice:

Azure Monitor Logs Chart Azure Monitor Logs Chart

To change what is shown in the chart, you have to manually edit your Kusto query. For example, I would be tempted to do things like:

only keep rows where Level is 'error' or 'warning', or
exclude rows where source is 'cas'

But really there's no one right thing to do next with a chart like this. It depends what you're interested in - use the tool to follow a train of thought and see if it takes you anywhere useful or interesting.

The charting capabilities we've found so far in Azure Monitor Logs are however nowhere near as good as those offered by Kibana, as used in the SAS Viya Monitoring for Kubernetes project. For this, and many other reasons, that project offers a far more capable logging solution for SAS Viya running on an Azure AKS cluster than Azure Monitor Logs does, and we would recommend you use it if you can.

If you don't need it, disable monitoring in Azure to save money

Collecting log data in Azure Monitor Logs can run up quite a bill if you aren't paying attention. The chart below shows the Azure Cost Management tool, and within it a cost analysis of all my resource groups and the log analytics service for the roughly five-hour period that I had a collection up and running and monitoring enabled, on a practically idle SAS Viya deployment:

Azure Log Analytics costs for about 5 hours Azure Log Analytics costs for about 5 hours

That means I spent a little under USD $8 for five hours of log analytics service, to write this blog post. This Azure Monitor service is charged by GB of data ingested, and per GB of data retained. The larger cost by far of those two is for data ingested. For a whole day at the pay-as-you-go rate, we might extrapolate this to guess that the cost might be around $30-40, for a nearly-idle Viya deployment like the one I used. If the deployment was actually being used for work, I can only assume the cost would go up very significantly from that. The pricing page does explain how you can use 'Capacity Reservations' to reduce these costs, but even so, this is something you will want to keep an eye on and budget for.

If you only enabled the monitoring add-ons in Azure to try this out, but do not plan on using them any further, please save some money by disabling the addons, and (optionally) deleting the Log Analytics Workspace too. My previous post has a section at the end titled 'Cleanup: Disable Monitoring and Remove the Log Analytics Workspace' which shows you how to do this.

See you next time!

SAS Communities Library