How to configure monitoring for Azure Kubernetes Service

Malek Zaag Lv2

How to configure monitoring for Azure Kubernetes Service

In this article, I am going to demonstrate how we can configure monitoring service for our Azure Kubernetes cluster and how we manage to receive alert notifications (Email and Mobile App notification) when it is under heavy load.

Installing Kubernetes

Since I am fond of IaC, I decided to provision my Infrastructure using code, so I wrote the following terraform file :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "3.74.0"
}
}
}

provider "azurerm" {
features {

}
subscription_id = var.subscription_id
}

resource "azurerm_resource_group" "example" {
name = "AKS-rg"
location = "West Europe"
}

resource "random_id" "workspace" {
keepers = {
# Generate a new id each time we switch to a new resource group
group_name = azurerm_resource_group.example.name
}

byte_length = 8
}

resource "azurerm_log_analytics_workspace" "example" {
name = "k8s-workspace-${random_id.workspace.hex}"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
sku = "PerGB2018"
}

resource "azurerm_log_analytics_solution" "example" {
solution_name = "ContainerInsights"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
workspace_resource_id = azurerm_log_analytics_workspace.example.id
workspace_name = azurerm_log_analytics_workspace.example.name

plan {
publisher = "Microsoft"
product = "OMSGallery/ContainerInsights"
}
}


resource "azurerm_kubernetes_cluster" "example" {
name = "example-aks1"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "my-cluster"
sku_tier = "Standard"

default_node_pool {
name = "default"
node_count = 2
vm_size = "Standard_A2_v2"
}

identity {
type = "SystemAssigned"
}

tags = {
Environment = "Test"
}

automatic_channel_upgrade = "stable"
http_application_routing_enabled = true

oms_agent {
log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
}

}

In the previous file, I provisionned the kubernetes cluster, a log analytics solution which is ContainerInsights and the log analytics workspace to edit and run log queries from data collected if needed .

Creating Azure logic App

Setting the logic app is pretty easy and you can customize the settings as you need :

logic applogic app

Now we set our trigger (HTTP request) and the action (send Email to me on trigger) :

Setting up trigger and actionSetting up trigger and action

Testing the Email :

logic app designerlogic app designer

I tested the app using the button and it worked fine :

email receivedemail received

Creating alert

Alert on Azure need a trigger to run (generally triggers are related to metrics), and after the trigger is launched some actions are invoked in order to notify the administrator or the user, or simply to do some actions :

Alert explanationAlert explanation

In this tutorial, I am going to create the alert, the action group and the alert processing to rule :

alert creationalert creation

Alert created with a severity of critical and it is triggered when node CPU usage average exceeds 70% :

Creating an action group

I created the action group :

Then I configured actions so that I can receive mobile app notification on my phone and trigger logic app by a HTTP request :

Adding Logic App to action group:

Creating alert processing Rule and assigning action group

alert processing rulealert processing rule

Load and stress Testing for Kubernetes

Now everything in place we need to start loading the cluster:

  • **Load test.** How the system responds to a sudden increase in requests.

  • **Endurance test.** How the system survives a constant, moderate load for longer duration of times. It can also be referred to as a *soak test*, referring to the long time the software spends under test.

  • **Stress test.** How the system responds under a heavy load, with an intent to find out the point at which system is stressed and ceases functioning.

Add load to the application

We go by stress test in order to provoke CPU usage so i deployed a pod running PHP application .

Once the PHP web application is running in the cluster and we have set up an autoscaling deployment, introduce load on the web application. This tutorial uses a BusyBox image in a container and infinite web requests running from BusyBox to the PHP web application.

BusyBox is a lightweight image of many common UNIX utilities, such as GNU Wget. Another tool that enables load testing is the open source Hey, which creates concurrent processes to send requests to an endpoint.

1
2
3
4
5
6
7
8
9
10
11
$ kubectl apply -f .\php-apache.yaml
deployment.apps/php-apache created
service/php-apache created

$ kubectl apply -f .\stress-test.yaml
deployment.apps/infinite-calls created

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
infinite-calls-5cffd59c59-4xnpx 1/1 Running 0 29s
php-apache-7495ff8f5b-nqctr 1/1 Running 0 2m42s

In order to spice up things i decided to scale out the BusyBox pods in order to have more requests and throttle the CPU :

1
kubectl scale deployment/infinite-calls --replicas 4

Finally Scaling to 40 replicas :

1
kubectl scale deployment/infinite-calls --replicas 40

But still my CPU isn’t going as higher as I want so I did a lot of digging and find the following github Repo :
GitHub - giantswarm/kube-stresscheck: Script to check Kubernetes nodes on stress (CPU/RAM)
Script to check Kubernetes nodes on stress (CPU/RAM) resistance. - GitHub - giantswarm/kube-stresscheck: Script togithub.com

It is simply a script written in Go to check Kubernetes nodes on stress (CPU/RAM) resistance.

i deployed the yaml file in the repo and Voila ! The cluster CPU is going up and after 1 minute i got an alert on both my phone and Email :

Conclusion

Azure Kubernetes monitoring seem to be difficult but using the managed Azure metrics and ContainerInsgihts is it very easy and helpful when it comes to alerting and Incident response.


Was this helpful? Confusing?
If you have any questions, feel free to comment below! Make sure to follow on Linkedin: https://www.linkedin.com/in/malekzaag/ and github: https://github.com/Malek-Zaag if you’re interested in similar content and want to keep learning alongside me!

  • Title: How to configure monitoring for Azure Kubernetes Service
  • Author: Malek Zaag
  • Created at : 2023-09-28 18:30:36
  • Updated at : 2025-08-17 19:07:15
  • Link: https://malekzaag.me/2023/09/28/How-to-configure-monitoring-for-Azure-Kubernetes-Service/
  • License: This work is licensed under CC BY-NC-SA 4.0.