yzl
93958d0fb0
|
1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
template_kubernetes_scheduler.yaml | 1 year ago |
README.md
Kubernetes Scheduler by HTTP
Overview
The template to monitor Kubernetes Scheduler by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Scheduler by HTTP
- collects metrics by HTTP agent from Scheduler /metrics endpoint.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Kubernetes Scheduler 1.19.10
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.SCHEDULER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. You might need to set the --binding-address
option for Scheduler to the address where Zabbix proxy can reach it.
For example, for clusters created with kubeadm
it can be set in the following manifest file (changes will be applied immediately):
- /etc/kubernetes/manifests/kube-scheduler.yaml
NOTE. Some metrics may not be collected depending on your Kubernetes Scheduler instance version and configuration.
Macros used
Name | Description | Default |
---|---|---|
{$KUBE.SCHEDULER.SERVER.URL} | Kubernetes Scheduler metrics endpoint URL. |
https://localhost:10259/metrics |
{$KUBE.API.TOKEN} | API Authorization Token. |
|
{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
{$KUBE.SCHEDULER.UNSCHEDULABLE} | Maximum number of scheduling failures with 'unschedulable' used for trigger. |
2 |
{$KUBE.SCHEDULER.ERROR} | Maximum number of scheduling failures with 'error' used for trigger. |
2 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: Get Scheduler metrics | Get raw metrics from Scheduler instance /metrics endpoint. |
HTTP agent | kubernetes.scheduler.get_metrics Preprocessing
|
Kubernetes Scheduler: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.scheduler.process_virtual_memory_bytes Preprocessing
|
Kubernetes Scheduler: Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.scheduler.process_resident_memory_bytes Preprocessing
|
Kubernetes Scheduler: CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.scheduler.cpu.util Preprocessing
|
Kubernetes Scheduler: Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.scheduler.go_goroutines Preprocessing
|
Kubernetes Scheduler: Go threads | Number of OS threads created. |
Dependent item | kubernetes.scheduler.go_threads Preprocessing
|
Kubernetes Scheduler: Fds open | Number of open file descriptors. |
Dependent item | kubernetes.scheduler.open_fds Preprocessing
|
Kubernetes Scheduler: Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.scheduler.max_fds Preprocessing
|
Kubernetes Scheduler: REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.scheduler.client_http_requests_200.rate Preprocessing
|
Kubernetes Scheduler: REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.scheduler.client_http_requests_300.rate Preprocessing
|
Kubernetes Scheduler: REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.scheduler.client_http_requests_400.rate Preprocessing
|
Kubernetes Scheduler: REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.scheduler.client_http_requests_500.rate Preprocessing
|
Kubernetes Scheduler: Schedule attempts: scheduled | Number of attempts to schedule pods with result "scheduled" per second. |
Dependent item | kubernetes.scheduler.scheduler_schedule_attempts.scheduled.rate Preprocessing
|
Kubernetes Scheduler: Schedule attempts: unschedulable | Number of attempts to schedule pods with result "unschedulable" per second. |
Dependent item | kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate Preprocessing
|
Kubernetes Scheduler: Schedule attempts: error | Number of attempts to schedule pods with result "error" per second. |
Dependent item | kubernetes.scheduler.scheduler_schedule_attempts.error.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Scheduler: Too many REST Client errors | "Kubernetes Scheduler REST Client requests is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.client_http_requests_500.rate,5m)>{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} |
Warning | |
Kubernetes Scheduler: Too many unschedulable pods | Number of attempts to schedule pods with 'unschedulable' result is too high. 'unschedulable' means a pod could not be scheduled. |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate,5m)>{$KUBE.SCHEDULER.UNSCHEDULABLE} |
Warning | |
Kubernetes Scheduler: Too many schedule attempts with errors | Number of attempts to schedule pods with 'error' result is too high. 'error' means an internal scheduler problem. |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.error.rate,5m)>{$KUBE.SCHEDULER.ERROR} |
Warning |
LLD rule Scheduling algorithm histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduling algorithm histogram | Discovery raw data of scheduling algorithm latency. |
Dependent item | kubernetes.scheduler.scheduling_algorithm.discovery Preprocessing
|
Item prototypes for Scheduling algorithm histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: Scheduling algorithm duration bucket, {#LE} | Scheduling algorithm latency in seconds. |
Dependent item | kubernetes.scheduler.scheduling_algorithm_duration[{#LE}] Preprocessing
|
Kubernetes Scheduler: Scheduling algorithm duration, p90 | 90 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.scheduling_algorithm_duration_p90[{#SINGLETON}] |
Kubernetes Scheduler: Scheduling algorithm duration, p95 | 95 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.scheduling_algorithm_duration_p95[{#SINGLETON}] |
Kubernetes Scheduler: Scheduling algorithm duration, p99 | 99 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.scheduling_algorithm_duration_p99[{#SINGLETON}] |
Kubernetes Scheduler: Scheduling algorithm duration, p50 | 50 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.scheduling_algorithm_duration_p50[{#SINGLETON}] |
LLD rule Binding histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Binding histogram | Discovery raw data of binding latency. |
Dependent item | kubernetes.scheduler.binding.discovery Preprocessing
|
Item prototypes for Binding histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: Binding duration bucket, {#LE} | Binding latency in seconds. |
Dependent item | kubernetes.scheduler.binding_duration[{#LE}] Preprocessing
|
Kubernetes Scheduler: Binding duration, p90 | 90 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.binding_duration_p90[{#SINGLETON}] |
Kubernetes Scheduler: Binding duration, p95 | 99 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.binding_duration_p95[{#SINGLETON}] |
Kubernetes Scheduler: Binding duration, p99 | 95 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.binding_duration_p99[{#SINGLETON}] |
Kubernetes Scheduler: Binding duration, p50 | 50 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.binding_duration_p50[{#SINGLETON}] |
LLD rule e2e scheduling histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
e2e scheduling histogram | Discovery raw data and percentile items of e2e scheduling latency. |
Dependent item | kubernetes.controller.e2e_scheduling.discovery Preprocessing
|
Item prototypes for e2e scheduling histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling seconds bucket, {#LE} | E2e scheduling latency in seconds (scheduling algorithm + binding) |
Dependent item | kubernetes.scheduler.e2e_scheduling_bucket[{#LE},"{#RESULT}"] Preprocessing
|
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p50 | 50 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2e_scheduling_p50["{#RESULT}"] |
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p90 | 90 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2e_scheduling_p90["{#RESULT}"] |
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p95 | 95 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2e_scheduling_p95["{#RESULT}"] |
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p99 | 95 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2e_scheduling_p99["{#RESULT}"] |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums