yzl
93958d0fb0
|
1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
template_kubernetes_api_servers.yaml | 1 year ago |
README.md
Kubernetes API server by HTTP
Overview
The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes API server by HTTP
- collects metrics by HTTP agent from API server /metrics endpoint.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Kubernetes API server 1.19.10
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Kubernetes API server instance version and configuration.
Macros used
Name | Description | Default |
---|---|---|
{$KUBE.API.SERVER.URL} | Kubernetes API server metrics endpoint URL. |
https://localhost:6443/metrics |
{$KUBE.API.TOKEN} | API Authorization Token. |
|
{$KUBE.API.CERT.EXPIRATION} | Number of days for alert of client certificate used for trigger. |
7 |
{$KUBE.API.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
{$KUBE.API.HTTP.SERVER.ERROR} | Maximum number of HTTP server requests failures used for trigger. |
2 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Get API instance metrics | Get raw metrics from API instance /metrics endpoint. |
HTTP agent | kubernetes.api.get_metrics Preprocessing
|
Kubernetes API: Audit events, total | Accumulated number audit events generated and sent to the audit backend. |
Dependent item | kubernetes.api.audit_event_total Preprocessing
|
Kubernetes API: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.api.process_virtual_memory_bytes Preprocessing
|
Kubernetes API: Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.api.process_resident_memory_bytes Preprocessing
|
Kubernetes API: CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.api.cpu.util Preprocessing
|
Kubernetes API: Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.api.go_goroutines Preprocessing
|
Kubernetes API: Go threads | Number of OS threads created. |
Dependent item | kubernetes.api.go_threads Preprocessing
|
Kubernetes API: Fds open | Number of open file descriptors. |
Dependent item | kubernetes.api.open_fds Preprocessing
|
Kubernetes API: Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.api.max_fds Preprocessing
|
Kubernetes API: gRPCs client started, rate | Total number of RPCs started per second. |
Dependent item | kubernetes.api.grpc_client_started.rate Preprocessing
|
Kubernetes API: gRPCs messages received, rate | Total number of gRPC stream messages received per second. |
Dependent item | kubernetes.api.grpc_client_msg_received.rate Preprocessing
|
Kubernetes API: gRPCs messages sent, rate | Total number of gRPC stream messages sent per second. |
Dependent item | kubernetes.api.grpc_client_msg_sent.rate Preprocessing
|
Kubernetes API: Request terminations, rate | Number of requests which apiserver terminated in self-defense per second. |
Dependent item | kubernetes.api.apiserver_request_terminations Preprocessing
|
Kubernetes API: TLS handshake errors, rate | Number of requests dropped with 'TLS handshake error from' error per second. |
Dependent item | kubernetes.api.apiserver_tls_handshake_errors_total.rate Preprocessing
|
Kubernetes API: API server requests: 5xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserver_request_total_500.rate Preprocessing
|
Kubernetes API: API server requests: 4xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserver_request_total_400.rate Preprocessing
|
Kubernetes API: API server requests: 3xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserver_request_total_300.rate Preprocessing
|
Kubernetes API: API server requests: 0 | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserver_request_total_0.rate Preprocessing
|
Kubernetes API: API server requests: 2xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserver_request_total_200.rate Preprocessing
|
Kubernetes API: HTTP requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.api.rest_client_requests_total_500.rate Preprocessing
|
Kubernetes API: HTTP requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.api.rest_client_requests_total_400.rate Preprocessing
|
Kubernetes API: HTTP requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.api.rest_client_requests_total_300.rate Preprocessing
|
Kubernetes API: HTTP requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.api.rest_client_requests_total_200.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API: Too many server errors | "Kubernetes API server is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR} |
Warning | |
Kubernetes API: Too many client errors | "Kubernetes API client is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR} |
Warning |
LLD rule Long-running requests
Name | Description | Type | Key and additional info |
---|---|---|---|
Long-running requests | Discovery of long-running requests by verb, resource and scope. |
Dependent item | kubernetes.api.longrunning_gauge.discovery Preprocessing
|
Item prototypes for Long-running requests
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE} | Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way. |
Dependent item | kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"] Preprocessing
|
LLD rule Request duration histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Request duration histogram | Discovery raw data and percentile items of request duration. |
Dependent item | kubernetes.api.requests_bucket.discovery Preprocessing
|
Item prototypes for Request duration histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: ["{#VERB}"] Requests bucket: {#LE} | Response latency distribution in seconds for each verb. |
Dependent item | kubernetes.api.request_duration_seconds_bucket[{#LE},"{#VERB}"] Preprocessing
|
Kubernetes API: ["{#VERB}"] Requests, p90 | 90 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.request_duration_seconds_p90["{#VERB}"] |
Kubernetes API: ["{#VERB}"] Requests, p95 | 95 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.request_duration_seconds_p95["{#VERB}"] |
Kubernetes API: ["{#VERB}"] Requests, p99 | 99 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.request_duration_seconds_p99["{#VERB}"] |
Kubernetes API: ["{#VERB}"] Requests, p50 | 50 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.request_duration_seconds_p50["{#VERB}"] |
LLD rule Requests inflight discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Requests inflight discovery | Discovery requests inflight by kind. |
Dependent item | kubernetes.api.inflight_requests.discovery Preprocessing
|
Item prototypes for Requests inflight discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Requests current: {#KIND} | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. |
Dependent item | kubernetes.api.current_inflight_requests["{#KIND}"] Preprocessing
|
LLD rule gRPC completed requests discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC completed requests discovery | Discovery grpc completed requests by grpc code. |
Dependent item | kubernetes.api.grpc_client_handled.discovery Preprocessing
|
Item prototypes for gRPC completed requests discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: gRPCs completed: {#GRPC_CODE}, rate | Total number of RPCs completed by the client regardless of success or failure per second. |
Dependent item | kubernetes.api.grpc_client_handled_total.rate["{#GRPC_CODE}"] Preprocessing
|
LLD rule Authentication attempts discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication attempts discovery | Discovery authentication attempts by result. |
Dependent item | kubernetes.api.authentication_attempts.discovery Preprocessing
|
Item prototypes for Authentication attempts discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Authentication attempts: {#RESULT}, rate | Authentication attempts by result per second. |
Dependent item | kubernetes.api.authentication_attempts.rate["{#RESULT}"] Preprocessing
|
LLD rule Authentication requests discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication requests discovery | Discovery authentication attempts by name. |
Dependent item | kubernetes.api.authenticated_user_requests.discovery Preprocessing
|
Item prototypes for Authentication requests discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Authenticated requests: {#NAME}, rate | Counter of authenticated requests broken out by username per second. |
Dependent item | kubernetes.api.authenticated_user_requests.rate["{#NAME}"] Preprocessing
|
LLD rule Watchers metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Watchers metrics discovery | Discovery watchers by kind. |
Dependent item | kubernetes.api.apiserver_registered_watchers.discovery Preprocessing
|
Item prototypes for Watchers metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Watchers: {#KIND} | Number of currently registered watchers for a given resource. |
Dependent item | kubernetes.api.apiserver_registered_watchers["{#KIND}"] Preprocessing
|
LLD rule Etcd objects metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd objects metrics discovery | Discovery etcd objects by resource. |
Dependent item | kubernetes.api.etcd_object_counts.discovery Preprocessing
|
Item prototypes for Etcd objects metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: etcd objects: {#RESOURCE} | Number of stored objects at the time of last check split by kind. |
Dependent item | kubernetes.api.etcd_object_counts["{#RESOURCE}"] Preprocessing
|
LLD rule Workqueue metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | Discovery workqueue metrics by name. |
Dependent item | kubernetes.api.workqueue.discovery Preprocessing
|
Item prototypes for Workqueue metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: ["{#NAME}"] Workqueue depth | Current depth of workqueue. |
Dependent item | kubernetes.api.workqueue_depth["{#NAME}"] Preprocessing
|
Kubernetes API: ["{#NAME}"] Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
Dependent item | kubernetes.api.workqueue_adds_total.rate["{#NAME}"] Preprocessing
|
LLD rule Client certificate expiration histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Client certificate expiration histogram | Discovery raw data of client certificate expiration |
Dependent item | kubernetes.api.certificate_expiration.discovery Preprocessing
|
Item prototypes for Client certificate expiration histogram
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Certificate expiration seconds bucket, {#LE} | Distribution of the remaining lifetime on the certificate used to authenticate a request. |
Dependent item | kubernetes.api.client_certificate_expiration_seconds_bucket[{#LE}] Preprocessing
|
Kubernetes API: Client certificate expiration, p1 | 1 percentile of the remaining lifetime on the certificate used to authenticate a request. |
Calculated | kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}] |
Trigger prototypes for Client certificate expiration histogram
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API: Kubernetes client certificate is expiring | A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}*24*60*60 |
Warning | Depends on:
|
Kubernetes API: Kubernetes client certificate expires soon | A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 24*60*60 |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums