12 KiB
Kubernetes Controller manager by HTTP
Overview
The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Controller manager by HTTP
- collects metrics by HTTP agent from Controller manager /metrics endpoint.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Kubernetes Controller manager 1.19.10
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.CONTROLLER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. You might need to set the --binding-address
option for Controller Manager to the address where Zabbix proxy can reach it.
For example, for clusters created with kubeadm
it can be set in the following manifest file (changes will be applied immediately):
- /etc/kubernetes/manifests/kube-controller-manager.yaml
NOTE. Some metrics may not be collected depending on your Kubernetes Controller manager instance version and configuration.
Macros used
Name | Description | Default |
---|---|---|
{$KUBE.CONTROLLER.SERVER.URL} | Kubernetes Controller manager metrics endpoint URL. |
https://localhost:10257/metrics |
{$KUBE.API.TOKEN} | API Authorization Token |
|
{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Controller: Get Controller metrics | Get raw metrics from Controller instance /metrics endpoint. |
HTTP agent | kubernetes.controller.get_metrics Preprocessing
|
Kubernetes Controller Manager: Leader election status | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. |
Dependent item | kubernetes.controller.leader_election_master_status Preprocessing
|
Kubernetes Controller Manager: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.controller.process_virtual_memory_bytes Preprocessing
|
Kubernetes Controller Manager: Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.controller.process_resident_memory_bytes Preprocessing
|
Kubernetes Controller Manager: CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.controller.cpu.util Preprocessing
|
Kubernetes Controller Manager: Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.controller.go_goroutines Preprocessing
|
Kubernetes Controller Manager: Go threads | Number of OS threads created. |
Dependent item | kubernetes.controller.go_threads Preprocessing
|
Kubernetes Controller Manager: Fds open | Number of open file descriptors. |
Dependent item | kubernetes.controller.open_fds Preprocessing
|
Kubernetes Controller Manager: Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.controller.max_fds Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.controller.client_http_requests_200.rate Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.controller.client_http_requests_300.rate Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.controller.client_http_requests_400.rate Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.controller.client_http_requests_500.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Controller Manager: Too many HTTP client errors | "Kubernetes Controller manager is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Controller manager by HTTP/kubernetes.controller.client_http_requests_500.rate,5m)>{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} |
Warning |
LLD rule Workqueue metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | Dependent item | kubernetes.controller.workqueue.discovery Preprocessing
|
Item prototypes for Workqueue metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
Dependent item | kubernetes.controller.workqueue_adds_total["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue depth | Current depth of workqueue. |
Dependent item | kubernetes.controller.workqueue_depth["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue unfinished work, sec | How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. |
Dependent item | kubernetes.controller.workqueue_unfinished_work_seconds["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue retries, rate | Total number of retries handled by workqueue per second. |
Dependent item | kubernetes.controller.workqueue_retries_total["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue longest running processor, sec | How many seconds has the longest running processor for workqueue been running. |
Dependent item | kubernetes.controller.workqueue_longest_running_processor_seconds["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p90 | 90 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueue_work_duration_seconds_p90["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p95 | 95 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueue_work_duration_seconds_p95["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p99 | 99 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueue_work_duration_seconds_p99["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, 50p | 50 percentiles of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueue_work_duration_seconds_p50["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p90 | 90 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueue_queue_duration_seconds_p90["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p95 | 95 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueue_queue_duration_seconds_p95["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p99 | 99 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueue_queue_duration_seconds_p99["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, 50p | 50 percentile of how long in seconds an item stays in workqueue before being requested. If there are no requests for 5 minute, item value will be discarded. |
Calculated | kubernetes.controller.workqueue_queue_duration_seconds_p50["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue duration seconds bucket, {#LE} | How long in seconds processing an item from workqueue takes. |
Dependent item | kubernetes.controller.duration_seconds_bucket[{#LE},"{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Queue duration seconds bucket, {#LE} | How long in seconds an item stays in workqueue before being requested. |
Dependent item | kubernetes.controller.queue_duration_seconds_bucket[{#LE},"{#NAME}"] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums