# Kubernetes API server by HTTP ## Overview The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. Template `Kubernetes API server by HTTP` - collects metrics by HTTP agent from API server /metrics endpoint. ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - Kubernetes API server 1.19.10 ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token. Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. *NOTE.* Some metrics may not be collected depending on your Kubernetes API server instance version and configuration. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$KUBE.API.SERVER.URL}|

Kubernetes API server metrics endpoint URL.

|`https://localhost:6443/metrics`| |{$KUBE.API.TOKEN}|

API Authorization Token.

|| |{$KUBE.API.CERT.EXPIRATION}|

Number of days for alert of client certificate used for trigger.

|`7`| |{$KUBE.API.HTTP.CLIENT.ERROR}|

Maximum number of HTTP client requests failures used for trigger.

|`2`| |{$KUBE.API.HTTP.SERVER.ERROR}|

Maximum number of HTTP server requests failures used for trigger.

|`2`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Get API instance metrics|

Get raw metrics from API instance /metrics endpoint.

|HTTP agent|kubernetes.api.get_metrics

**Preprocessing**

| |Kubernetes API: Audit events, total|

Accumulated number audit events generated and sent to the audit backend.

|Dependent item|kubernetes.api.audit_event_total

**Preprocessing**

| |Kubernetes API: Virtual memory, bytes|

Virtual memory size in bytes.

|Dependent item|kubernetes.api.process_virtual_memory_bytes

**Preprocessing**

| |Kubernetes API: Resident memory, bytes|

Resident memory size in bytes.

|Dependent item|kubernetes.api.process_resident_memory_bytes

**Preprocessing**

| |Kubernetes API: CPU|

Total user and system CPU usage ratio.

|Dependent item|kubernetes.api.cpu.util

**Preprocessing**

| |Kubernetes API: Goroutines|

Number of goroutines that currently exist.

|Dependent item|kubernetes.api.go_goroutines

**Preprocessing**

| |Kubernetes API: Go threads|

Number of OS threads created.

|Dependent item|kubernetes.api.go_threads

**Preprocessing**

| |Kubernetes API: Fds open|

Number of open file descriptors.

|Dependent item|kubernetes.api.open_fds

**Preprocessing**

| |Kubernetes API: Fds max|

Maximum allowed open file descriptors.

|Dependent item|kubernetes.api.max_fds

**Preprocessing**

| |Kubernetes API: gRPCs client started, rate|

Total number of RPCs started per second.

|Dependent item|kubernetes.api.grpc_client_started.rate

**Preprocessing**

| |Kubernetes API: gRPCs messages received, rate|

Total number of gRPC stream messages received per second.

|Dependent item|kubernetes.api.grpc_client_msg_received.rate

**Preprocessing**

| |Kubernetes API: gRPCs messages sent, rate|

Total number of gRPC stream messages sent per second.

|Dependent item|kubernetes.api.grpc_client_msg_sent.rate

**Preprocessing**

| |Kubernetes API: Request terminations, rate|

Number of requests which apiserver terminated in self-defense per second.

|Dependent item|kubernetes.api.apiserver_request_terminations

**Preprocessing**

| |Kubernetes API: TLS handshake errors, rate|

Number of requests dropped with 'TLS handshake error from' error per second.

|Dependent item|kubernetes.api.apiserver_tls_handshake_errors_total.rate

**Preprocessing**

| |Kubernetes API: API server requests: 5xx, rate|

Counter of apiserver requests broken out for each HTTP response code.

|Dependent item|kubernetes.api.apiserver_request_total_500.rate

**Preprocessing**

| |Kubernetes API: API server requests: 4xx, rate|

Counter of apiserver requests broken out for each HTTP response code.

|Dependent item|kubernetes.api.apiserver_request_total_400.rate

**Preprocessing**

| |Kubernetes API: API server requests: 3xx, rate|

Counter of apiserver requests broken out for each HTTP response code.

|Dependent item|kubernetes.api.apiserver_request_total_300.rate

**Preprocessing**

| |Kubernetes API: API server requests: 0|

Counter of apiserver requests broken out for each HTTP response code.

|Dependent item|kubernetes.api.apiserver_request_total_0.rate

**Preprocessing**

| |Kubernetes API: API server requests: 2xx, rate|

Counter of apiserver requests broken out for each HTTP response code.

|Dependent item|kubernetes.api.apiserver_request_total_200.rate

**Preprocessing**

| |Kubernetes API: HTTP requests: 5xx, rate|

Number of HTTP requests with 5xx status code per second.

|Dependent item|kubernetes.api.rest_client_requests_total_500.rate

**Preprocessing**

| |Kubernetes API: HTTP requests: 4xx, rate|

Number of HTTP requests with 4xx status code per second.

|Dependent item|kubernetes.api.rest_client_requests_total_400.rate

**Preprocessing**

| |Kubernetes API: HTTP requests: 3xx, rate|

Number of HTTP requests with 3xx status code per second.

|Dependent item|kubernetes.api.rest_client_requests_total_300.rate

**Preprocessing**

| |Kubernetes API: HTTP requests: 2xx, rate|

Number of HTTP requests with 2xx status code per second.

|Dependent item|kubernetes.api.rest_client_requests_total_200.rate

**Preprocessing**

| ### Triggers |Name|Description|Expression|Severity|Dependencies and additional info| |----|-----------|----------|--------|--------------------------------| |Kubernetes API: Too many server errors|

"Kubernetes API server is experiencing high error rate (with 5xx HTTP code).

|`min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR}`|Warning|| |Kubernetes API: Too many client errors|

"Kubernetes API client is experiencing high error rate (with 5xx HTTP code).

|`min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR}`|Warning|| ### LLD rule Long-running requests |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Long-running requests|

Discovery of long-running requests by verb, resource and scope.

|Dependent item|kubernetes.api.longrunning_gauge.discovery

**Preprocessing**

| ### Item prototypes for Long-running requests |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE}|

Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way.

|Dependent item|kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"]

**Preprocessing**

| ### LLD rule Request duration histogram |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Request duration histogram|

Discovery raw data and percentile items of request duration.

|Dependent item|kubernetes.api.requests_bucket.discovery

**Preprocessing**

| ### Item prototypes for Request duration histogram |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: ["{#VERB}"] Requests bucket: {#LE}|

Response latency distribution in seconds for each verb.

|Dependent item|kubernetes.api.request_duration_seconds_bucket[{#LE},"{#VERB}"]

**Preprocessing**

| |Kubernetes API: ["{#VERB}"] Requests, p90|

90 percentile of response latency distribution in seconds for each verb.

|Calculated|kubernetes.api.request_duration_seconds_p90["{#VERB}"]| |Kubernetes API: ["{#VERB}"] Requests, p95|

95 percentile of response latency distribution in seconds for each verb.

|Calculated|kubernetes.api.request_duration_seconds_p95["{#VERB}"]| |Kubernetes API: ["{#VERB}"] Requests, p99|

99 percentile of response latency distribution in seconds for each verb.

|Calculated|kubernetes.api.request_duration_seconds_p99["{#VERB}"]| |Kubernetes API: ["{#VERB}"] Requests, p50|

50 percentile of response latency distribution in seconds for each verb.

|Calculated|kubernetes.api.request_duration_seconds_p50["{#VERB}"]| ### LLD rule Requests inflight discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Requests inflight discovery|

Discovery requests inflight by kind.

|Dependent item|kubernetes.api.inflight_requests.discovery

**Preprocessing**

| ### Item prototypes for Requests inflight discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Requests current: {#KIND}|

Maximal number of currently used inflight request limit of this apiserver per request kind in last second.

|Dependent item|kubernetes.api.current_inflight_requests["{#KIND}"]

**Preprocessing**

| ### LLD rule gRPC completed requests discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |gRPC completed requests discovery|

Discovery grpc completed requests by grpc code.

|Dependent item|kubernetes.api.grpc_client_handled.discovery

**Preprocessing**

| ### Item prototypes for gRPC completed requests discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: gRPCs completed: {#GRPC_CODE}, rate|

Total number of RPCs completed by the client regardless of success or failure per second.

|Dependent item|kubernetes.api.grpc_client_handled_total.rate["{#GRPC_CODE}"]

**Preprocessing**

| ### LLD rule Authentication attempts discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Authentication attempts discovery|

Discovery authentication attempts by result.

|Dependent item|kubernetes.api.authentication_attempts.discovery

**Preprocessing**

| ### Item prototypes for Authentication attempts discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Authentication attempts: {#RESULT}, rate|

Authentication attempts by result per second.

|Dependent item|kubernetes.api.authentication_attempts.rate["{#RESULT}"]

**Preprocessing**

| ### LLD rule Authentication requests discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Authentication requests discovery|

Discovery authentication attempts by name.

|Dependent item|kubernetes.api.authenticated_user_requests.discovery

**Preprocessing**

| ### Item prototypes for Authentication requests discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Authenticated requests: {#NAME}, rate|

Counter of authenticated requests broken out by username per second.

|Dependent item|kubernetes.api.authenticated_user_requests.rate["{#NAME}"]

**Preprocessing**

| ### LLD rule Watchers metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Watchers metrics discovery|

Discovery watchers by kind.

|Dependent item|kubernetes.api.apiserver_registered_watchers.discovery

**Preprocessing**

| ### Item prototypes for Watchers metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Watchers: {#KIND}|

Number of currently registered watchers for a given resource.

|Dependent item|kubernetes.api.apiserver_registered_watchers["{#KIND}"]

**Preprocessing**

| ### LLD rule Etcd objects metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Etcd objects metrics discovery|

Discovery etcd objects by resource.

|Dependent item|kubernetes.api.etcd_object_counts.discovery

**Preprocessing**

| ### Item prototypes for Etcd objects metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: etcd objects: {#RESOURCE}|

Number of stored objects at the time of last check split by kind.

|Dependent item|kubernetes.api.etcd_object_counts["{#RESOURCE}"]

**Preprocessing**

| ### LLD rule Workqueue metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Workqueue metrics discovery|

Discovery workqueue metrics by name.

|Dependent item|kubernetes.api.workqueue.discovery

**Preprocessing**

| ### Item prototypes for Workqueue metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: ["{#NAME}"] Workqueue depth|

Current depth of workqueue.

|Dependent item|kubernetes.api.workqueue_depth["{#NAME}"]

**Preprocessing**

| |Kubernetes API: ["{#NAME}"] Workqueue adds total, rate|

Total number of adds handled by workqueue per second.

|Dependent item|kubernetes.api.workqueue_adds_total.rate["{#NAME}"]

**Preprocessing**

| ### LLD rule Client certificate expiration histogram |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Client certificate expiration histogram|

Discovery raw data of client certificate expiration

|Dependent item|kubernetes.api.certificate_expiration.discovery

**Preprocessing**

| ### Item prototypes for Client certificate expiration histogram |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kubernetes API: Certificate expiration seconds bucket, {#LE}|

Distribution of the remaining lifetime on the certificate used to authenticate a request.

|Dependent item|kubernetes.api.client_certificate_expiration_seconds_bucket[{#LE}]

**Preprocessing**

| |Kubernetes API: Client certificate expiration, p1|

1 percentile of the remaining lifetime on the certificate used to authenticate a request.

|Calculated|kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]| ### Trigger prototypes for Client certificate expiration histogram |Name|Description|Expression|Severity|Dependencies and additional info| |----|-----------|----------|--------|--------------------------------| |Kubernetes API: Kubernetes client certificate is expiring|

A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days.

|`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}*24*60*60`|Warning|**Depends on**:
| |Kubernetes API: Kubernetes client certificate expires soon|

A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.

|`last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 24*60*60`|Warning|| ## Feedback Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com) You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)