yzl
93958d0fb0
|
1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
template_kubernetes_nodes.yaml | 1 year ago |
README.md
Kubernetes nodes by HTTP
Overview
The template to monitor Kubernetes nodes that work without any external scripts.
It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.
Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F7.0) in your Kubernetes cluster.
Change the values according to the environment in the file $HOME/zabbix_values.yaml.
For example:
-
Enables use of Zabbix proxy
enabled: false
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set up the macros to filter the metrics of discovered nodes
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Kubernetes 1.19.10
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Install the Zabbix Helm Chart in your Kubernetes cluster.
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.NODES.ENDPOINT.NAME}
with Zabbix agent's endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-zabbix-helm-chrt-agent
.
Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:
- {$KUBE.LLD.FILTER.NODE.MATCHES}
- {$KUBE.LLD.FILTER.NODE.NOT_MATCHES}
- {$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}
- {$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}
Set up macros to filter pod metrics by namespace:
- {$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}
- {$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}
Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.
You can use the {$KUBE.NODE.FILTER.LABELS}
, {$KUBE.POD.FILTER.LABELS}
, {$KUBE.NODE.FILTER.ANNOTATIONS}
and {$KUBE.POD.FILTER.ANNOTATIONS}
macros for advanced filtering of nodes and pods by labels and annotations.
Notes about labels and annotations filters:
- Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (
key1: value, key2: regexp
). - ECMAScript syntax is used for regular expressions.
- Filters are applied if such a label key exists for the entity that is being filtered (it means that if you specify a key in a filter, entities which do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
- You can also use the exclamation point symbol (
!
) to invert the filter (!key: value
).
For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*
. As a result, the nodes 5-25 without the "ingress" role will be discovered.
See the Kubernetes documentation for details about labels and annotations:
- https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.
Macros used
Name | Description | Default |
---|---|---|
{$KUBE.API.URL} | Kubernetes API endpoint URL in the format ://: |
https://kubernetes.default.svc.cluster.local:443 |
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$KUBE.NODES.ENDPOINT.NAME} | Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep". |
zabbix-zabbix-helm-chrt-agent |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes. |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES} | Filter of discoverable nodes by role. |
.* |
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES} | Filter to exclude discovered node by role. |
CHANGE_IF_NEEDED |
{$KUBE.NODE.FILTER.ANNOTATIONS} | Annotations to filter nodes (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.NODE.FILTER.LABELS} | Labels to filter nodes (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.POD.FILTER.ANNOTATIONS} | Annotations to filter pods (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.POD.FILTER.LABELS} | Labels to filter Pods (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES} | Filter of discoverable pods by namespace. |
.* |
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered pods by namespace. |
CHANGE_IF_NEEDED |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Get nodes | Collecting and processing cluster nodes data via Kubernetes API. |
Script | kube.nodes |
Get nodes check | Data collection check. |
Dependent item | kube.nodes.check Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Failed to get nodes | length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0 |
Warning |
LLD rule Node discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | kube.node.discovery Preprocessing
|
Item prototypes for Node discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NAME}]: Get data | Collecting and processing cluster by node [{#NAME}] data via Kubernetes API. |
Dependent item | kube.node.get[{#NAME}] Preprocessing
|
Node [{#NAME}] Addresses: External IP | Typically the IP address of the node that is externally routable (available from outside the cluster). |
Dependent item | kube.node.addresses.external_ip[{#NAME}] Preprocessing
|
Node [{#NAME}] Addresses: Internal IP | Typically the IP address of the node that is routable only within the cluster. |
Dependent item | kube.node.addresses.internal_ip[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: CPU | Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
Dependent item | kube.node.allocatable.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: Memory | Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
Dependent item | kube.node.allocatable.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
Dependent item | kube.node.allocatable.pods[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: CPU | CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
Dependent item | kube.node.capacity.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: Memory | Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
Dependent item | kube.node.capacity.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
Dependent item | kube.node.capacity.pods[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Disk pressure | True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
Dependent item | kube.node.conditions.diskpressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Memory pressure | True if pressure exists on the node memory - that is, if the node memory is low; otherwise False. |
Dependent item | kube.node.conditions.memorypressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Network unavailable | True if the network for the node is not correctly configured, otherwise False. |
Dependent item | kube.node.conditions.networkunavailable[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: PID pressure | True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False. |
Dependent item | kube.node.conditions.pidpressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Ready | True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds). |
Dependent item | kube.node.conditions.ready[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Architecture | Node architecture. |
Dependent item | kube.node.info.architecture[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Container runtime | Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/ |
Dependent item | kube.node.info.containerruntime[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Kernel version | Node kernel version. |
Dependent item | kube.node.info.kernelversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Kubelet version | Version of Kubelet. |
Dependent item | kube.node.info.kubeletversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: KubeProxy version | Version of KubeProxy. |
Dependent item | kube.node.info.kubeproxyversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Operating system | Node operating system. |
Dependent item | kube.node.info.operatingsystem[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: OS image | Node OS image. |
Dependent item | kube.node.info.osversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Roles | Node roles. |
Dependent item | kube.node.info.roles[{#NAME}] Preprocessing
|
Node [{#NAME}] Limits: CPU | Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.limits.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Limits: Memory | Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.limits.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Requests: CPU | Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.requests.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Requests: Memory | Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.requests.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Uptime | Node uptime. |
Dependent item | kube.node.uptime[{#NAME}] Preprocessing
|
Node [{#NAME}] Used: Pods | Current number of pods on the node. |
Dependent item | kube.node.used.pods[{#NAME}] Preprocessing
|
Trigger prototypes for Node discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Node [{#NAME}] Conditions: Pressure exists on the disk size | True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1 |
Warning | |
Node [{#NAME}] Conditions: Pressure exists on the node memory | True - pressure exists on the node memory - that is, if the node memory is low; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1 |
Warning | |
Node [{#NAME}] Conditions: Network is not correctly configured | True - the network for the node is not correctly configured, otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1 |
Warning | |
Node [{#NAME}] Conditions: Pressure exists on the processes | True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1 |
Warning | |
Node [{#NAME}] Conditions: Is not in Ready state | False - if the node is not healthy and is not accepting pods. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1 |
Warning | |
Node [{#NAME}] Limits: Total CPU limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9 |
Warning | Depends on:
|
|
Node [{#NAME}] Limits: Total CPU limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1 |
Average | ||
Node [{#NAME}] Limits: Total memory limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9 |
Warning | Depends on:
|
|
Node [{#NAME}] Limits: Total memory limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1 |
Average | ||
Node [{#NAME}] Requests: Total CPU requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5 |
Warning | Depends on:
|
|
Node [{#NAME}] Requests: Total CPU requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8 |
Average | ||
Node [{#NAME}] Requests: Total memory requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5 |
Warning | Depends on:
|
|
Node [{#NAME}] Requests: Total memory requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8 |
Average | ||
Node [{#NAME}]: Has been restarted | Uptime is less than 10 minutes. |
last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10 |
Info | |
Node [{#NAME}] Used: Kubelet too many pods | Kubelet is running at capacity. |
last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9 |
Warning |
LLD rule Pod discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Pod discovery | Dependent item | kube.pod.discovery Preprocessing
|
Item prototypes for Pod discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NODE}] Pod [{#POD}]: Get data | Collecting and processing cluster by node [{#NODE}] data via Kubernetes API. |
Dependent item | kube.pod.get[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Containers ready | All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.containers_ready[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Initialized | All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.initialized[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Ready | The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.ready[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Scheduled | The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.scheduled[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Containers: Restarts | The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection. |
Dependent item | kube.pod.containers.restartcount[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Status: Phase | The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase |
Dependent item | kube.pod.status.phase[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Uptime | Pod uptime. |
Dependent item | kube.pod.uptime[{#POD}] Preprocessing
|
Trigger prototypes for Pod discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Node [{#NODE}] Pod [{#POD}]: Pod is crash looping | Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state. |
(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}],15m))>1 |
Warning | |
Node [{#NODE}] Pod [{#POD}] Status: Kubernetes Pod not healthy | Pod has been in a non-ready state for longer than 10 minutes. |
count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#POD}],10m, "regexp","^(1|4|5)$")>=9 |
High |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums