# Kubernetes API server by HTTP ## Overview The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. Template `Kubernetes API server by HTTP` - collects metrics by HTTP agent from API server /metrics endpoint. ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - Kubernetes API server 1.19.10 ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token. Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. *NOTE.* Some metrics may not be collected depending on your Kubernetes API server instance version and configuration. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$KUBE.API.SERVER.URL}|
Kubernetes API server metrics endpoint URL.
|`https://localhost:6443/metrics`| |{$KUBE.API.TOKEN}|API Authorization Token.
|| |{$KUBE.API.CERT.EXPIRATION}|Number of days for alert of client certificate used for trigger.
|`7`| |{$KUBE.API.HTTP.CLIENT.ERROR}|Maximum number of HTTP client requests failures used for trigger.
|`2`| |{$KUBE.API.HTTP.SERVER.ERROR}|Maximum number of HTTP server requests failures used for trigger.
|`2`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Get API instance metrics|Get raw metrics from API instance /metrics endpoint.
|HTTP agent|kubernetes.api.get_metrics**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
Accumulated number audit events generated and sent to the audit backend.
|Dependent item|kubernetes.api.audit_event_total**Preprocessing**
Prometheus pattern: `SUM(apiserver_audit_event_total)`
⛔️Custom on fail: Discard value
Virtual memory size in bytes.
|Dependent item|kubernetes.api.process_virtual_memory_bytes**Preprocessing**
Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
⛔️Custom on fail: Discard value
Resident memory size in bytes.
|Dependent item|kubernetes.api.process_resident_memory_bytes**Preprocessing**
Prometheus pattern: `VALUE(process_resident_memory_bytes)`
⛔️Custom on fail: Discard value
Total user and system CPU usage ratio.
|Dependent item|kubernetes.api.cpu.util**Preprocessing**
Prometheus pattern: `VALUE(process_cpu_seconds_total)`
Custom multiplier: `100`
Number of goroutines that currently exist.
|Dependent item|kubernetes.api.go_goroutines**Preprocessing**
Prometheus pattern: `SUM(go_goroutines)`
⛔️Custom on fail: Discard value
Number of OS threads created.
|Dependent item|kubernetes.api.go_threads**Preprocessing**
Prometheus pattern: `VALUE(go_threads)`
⛔️Custom on fail: Discard value
Number of open file descriptors.
|Dependent item|kubernetes.api.open_fds**Preprocessing**
Prometheus pattern: `VALUE(process_open_fds)`
⛔️Custom on fail: Discard value
Maximum allowed open file descriptors.
|Dependent item|kubernetes.api.max_fds**Preprocessing**
Prometheus pattern: `VALUE(process_max_fds)`
⛔️Custom on fail: Discard value
Total number of RPCs started per second.
|Dependent item|kubernetes.api.grpc_client_started.rate**Preprocessing**
Prometheus pattern: `SUM(grpc_client_started_total)`
⛔️Custom on fail: Discard value
Total number of gRPC stream messages received per second.
|Dependent item|kubernetes.api.grpc_client_msg_received.rate**Preprocessing**
Prometheus pattern: `SUM(grpc_client_msg_received_total)`
⛔️Custom on fail: Discard value
Total number of gRPC stream messages sent per second.
|Dependent item|kubernetes.api.grpc_client_msg_sent.rate**Preprocessing**
Prometheus pattern: `SUM(grpc_client_msg_sent_total)`
⛔️Custom on fail: Discard value
Number of requests which apiserver terminated in self-defense per second.
|Dependent item|kubernetes.api.apiserver_request_terminations**Preprocessing**
Prometheus pattern: `SUM(apiserver_request_terminations_total)`
⛔️Custom on fail: Discard value
Number of requests dropped with 'TLS handshake error from' error per second.
|Dependent item|kubernetes.api.apiserver_tls_handshake_errors_total.rate**Preprocessing**
Prometheus pattern: `SUM(apiserver_tls_handshake_errors_total)`
⛔️Custom on fail: Discard value
Counter of apiserver requests broken out for each HTTP response code.
|Dependent item|kubernetes.api.apiserver_request_total_500.rate**Preprocessing**
Prometheus pattern: `SUM(apiserver_request_total{code =~ "5.."})`
⛔️Custom on fail: Discard value
Counter of apiserver requests broken out for each HTTP response code.
|Dependent item|kubernetes.api.apiserver_request_total_400.rate**Preprocessing**
Prometheus pattern: `SUM(apiserver_request_total{code =~ "4.."})`
⛔️Custom on fail: Discard value
Counter of apiserver requests broken out for each HTTP response code.
|Dependent item|kubernetes.api.apiserver_request_total_300.rate**Preprocessing**
Prometheus pattern: `SUM(apiserver_request_total{code =~ "3.."})`
⛔️Custom on fail: Discard value
Counter of apiserver requests broken out for each HTTP response code.
|Dependent item|kubernetes.api.apiserver_request_total_0.rate**Preprocessing**
Prometheus pattern: `SUM(apiserver_request_total{code = "0"})`
⛔️Custom on fail: Discard value
Counter of apiserver requests broken out for each HTTP response code.
|Dependent item|kubernetes.api.apiserver_request_total_200.rate**Preprocessing**
Prometheus pattern: `SUM(apiserver_request_total{code =~ "2.."})`
⛔️Custom on fail: Discard value
Number of HTTP requests with 5xx status code per second.
|Dependent item|kubernetes.api.rest_client_requests_total_500.rate**Preprocessing**
Prometheus pattern: `SUM(rest_client_requests_total{code =~ "5.."})`
⛔️Custom on fail: Discard value
Number of HTTP requests with 4xx status code per second.
|Dependent item|kubernetes.api.rest_client_requests_total_400.rate**Preprocessing**
Prometheus pattern: `SUM(rest_client_requests_total{code =~ "4.."})`
⛔️Custom on fail: Discard value
Number of HTTP requests with 3xx status code per second.
|Dependent item|kubernetes.api.rest_client_requests_total_300.rate**Preprocessing**
Prometheus pattern: `SUM(rest_client_requests_total{code =~ "3.."})`
⛔️Custom on fail: Discard value
Number of HTTP requests with 2xx status code per second.
|Dependent item|kubernetes.api.rest_client_requests_total_200.rate**Preprocessing**
Prometheus pattern: `SUM(rest_client_requests_total{code =~ "2.."})`
⛔️Custom on fail: Discard value
"Kubernetes API server is experiencing high error rate (with 5xx HTTP code).
|`min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR}`|Warning|| |Kubernetes API: Too many client errors|"Kubernetes API client is experiencing high error rate (with 5xx HTTP code).
|`min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR}`|Warning|| ### LLD rule Long-running requests |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Long-running requests|Discovery of long-running requests by verb, resource and scope.
|Dependent item|kubernetes.api.longrunning_gauge.discovery**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way.
|Dependent item|kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Discovery raw data and percentile items of request duration.
|Dependent item|kubernetes.api.requests_bucket.discovery**Preprocessing**
Prometheus to JSON: `{__name__=~ "apiserver_request_duration_*", verb =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Response latency distribution in seconds for each verb.
|Dependent item|kubernetes.api.request_duration_seconds_bucket[{#LE},"{#VERB}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
90 percentile of response latency distribution in seconds for each verb.
|Calculated|kubernetes.api.request_duration_seconds_p90["{#VERB}"]| |Kubernetes API: ["{#VERB}"] Requests, p95|95 percentile of response latency distribution in seconds for each verb.
|Calculated|kubernetes.api.request_duration_seconds_p95["{#VERB}"]| |Kubernetes API: ["{#VERB}"] Requests, p99|99 percentile of response latency distribution in seconds for each verb.
|Calculated|kubernetes.api.request_duration_seconds_p99["{#VERB}"]| |Kubernetes API: ["{#VERB}"] Requests, p50|50 percentile of response latency distribution in seconds for each verb.
|Calculated|kubernetes.api.request_duration_seconds_p50["{#VERB}"]| ### LLD rule Requests inflight discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Requests inflight discovery|Discovery requests inflight by kind.
|Dependent item|kubernetes.api.inflight_requests.discovery**Preprocessing**
Prometheus to JSON: `apiserver_current_inflight_requests{request_kind =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Maximal number of currently used inflight request limit of this apiserver per request kind in last second.
|Dependent item|kubernetes.api.current_inflight_requests["{#KIND}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Discovery grpc completed requests by grpc code.
|Dependent item|kubernetes.api.grpc_client_handled.discovery**Preprocessing**
Prometheus to JSON: `grpc_client_handled_total{grpc_code =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Total number of RPCs completed by the client regardless of success or failure per second.
|Dependent item|kubernetes.api.grpc_client_handled_total.rate["{#GRPC_CODE}"]**Preprocessing**
Prometheus pattern: `SUM(grpc_client_handled_total{grpc_code = "{#GRPC_CODE}"})`
⛔️Custom on fail: Discard value
Discovery authentication attempts by result.
|Dependent item|kubernetes.api.authentication_attempts.discovery**Preprocessing**
Prometheus to JSON: `authentication_attempts{result =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Authentication attempts by result per second.
|Dependent item|kubernetes.api.authentication_attempts.rate["{#RESULT}"]**Preprocessing**
Prometheus pattern: `SUM(authentication_attempts{result = "{#RESULT}"})`
⛔️Custom on fail: Discard value
Discovery authentication attempts by name.
|Dependent item|kubernetes.api.authenticated_user_requests.discovery**Preprocessing**
Prometheus to JSON: `authenticated_user_requests{username =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Counter of authenticated requests broken out by username per second.
|Dependent item|kubernetes.api.authenticated_user_requests.rate["{#NAME}"]**Preprocessing**
Prometheus pattern: `VALUE(authenticated_user_requests{result = "{#NAME}"})`
⛔️Custom on fail: Discard value
Discovery watchers by kind.
|Dependent item|kubernetes.api.apiserver_registered_watchers.discovery**Preprocessing**
Prometheus to JSON: `apiserver_registered_watchers{kind =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Number of currently registered watchers for a given resource.
|Dependent item|kubernetes.api.apiserver_registered_watchers["{#KIND}"]**Preprocessing**
Prometheus pattern: `VALUE(apiserver_registered_watchers{kind = "{#KIND}"})`
⛔️Custom on fail: Discard value
Discovery etcd objects by resource.
|Dependent item|kubernetes.api.etcd_object_counts.discovery**Preprocessing**
Prometheus to JSON: `etcd_object_counts{resource =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Number of stored objects at the time of last check split by kind.
|Dependent item|kubernetes.api.etcd_object_counts["{#RESOURCE}"]**Preprocessing**
Prometheus pattern: `VALUE(etcd_object_counts{ resource = "{#RESOURCE}"})`
⛔️Custom on fail: Discard value
Discovery workqueue metrics by name.
|Dependent item|kubernetes.api.workqueue.discovery**Preprocessing**
Prometheus to JSON: `workqueue_adds_total{name =~ ".*"}`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Current depth of workqueue.
|Dependent item|kubernetes.api.workqueue_depth["{#NAME}"]**Preprocessing**
Prometheus pattern: `VALUE(workqueue_depth{name = "{#NAME}"})`
⛔️Custom on fail: Discard value
Total number of adds handled by workqueue per second.
|Dependent item|kubernetes.api.workqueue_adds_total.rate["{#NAME}"]**Preprocessing**
Prometheus pattern: `VALUE(workqueue_adds_total{name = "{#NAME}"})`
⛔️Custom on fail: Discard value
Discovery raw data of client certificate expiration
|Dependent item|kubernetes.api.certificate_expiration.discovery**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Distribution of the remaining lifetime on the certificate used to authenticate a request.
|Dependent item|kubernetes.api.client_certificate_expiration_seconds_bucket[{#LE}]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
1 percentile of the remaining lifetime on the certificate used to authenticate a request.
|Calculated|kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]| ### Trigger prototypes for Client certificate expiration histogram |Name|Description|Expression|Severity|Dependencies and additional info| |----|-----------|----------|--------|--------------------------------| |Kubernetes API: Kubernetes client certificate is expiring|A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days.
|`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}*24*60*60`|Warning|**Depends on**:A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.
|`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 24*60*60`|Warning|| ## Feedback Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com) You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)