You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yzl 93958d0fb0
zabbix6.0
1 year ago
..
README.md zabbix6.0 1 year ago
template_kubernetes_nodes.yaml zabbix6.0 1 year ago

README.md

Kubernetes nodes by HTTP

Overview

The template to monitor Kubernetes nodes that work without any external scripts.
It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API. Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F7.0) in your Kubernetes cluster.

Change the values according to the environment in the file $HOME/zabbix_values.yaml.

For example:

  • Enables use of Zabbix proxy

    enabled: false

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.

Set up the macros to filter the metrics of discovered nodes

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • Kubernetes 1.19.10

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Install the Zabbix Helm Chart in your Kubernetes cluster.

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.
Set {$KUBE.NODES.ENDPOINT.NAME} with Zabbix agent's endpoint name. See kubectl -n monitoring get ep. Default: zabbix-zabbix-helm-chrt-agent.

Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:

  • {$KUBE.LLD.FILTER.NODE.MATCHES}
  • {$KUBE.LLD.FILTER.NODE.NOT_MATCHES}
  • {$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}
  • {$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}

Set up macros to filter pod metrics by namespace:

  • {$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}
  • {$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}

Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.

You can use the {$KUBE.NODE.FILTER.LABELS}, {$KUBE.POD.FILTER.LABELS}, {$KUBE.NODE.FILTER.ANNOTATIONS} and {$KUBE.POD.FILTER.ANNOTATIONS} macros for advanced filtering of nodes and pods by labels and annotations.

Notes about labels and annotations filters:

  • Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (key1: value, key2: regexp).
  • ECMAScript syntax is used for regular expressions.
  • Filters are applied if such a label key exists for the entity that is being filtered (it means that if you specify a key in a filter, entities which do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
  • You can also use the exclamation point symbol (!) to invert the filter (!key: value).

For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*. As a result, the nodes 5-25 without the "ingress" role will be discovered.

See the Kubernetes documentation for details about labels and annotations:

Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.

Macros used

Name Description Default
{$KUBE.API.URL}

Kubernetes API endpoint URL in the format ://:

https://kubernetes.default.svc.cluster.local:443
{$KUBE.API.TOKEN}

Service account bearer token.

{$KUBE.HTTP.PROXY}

Sets the HTTP proxy to http_proxy value. If this parameter is empty, then no proxy is used.

{$KUBE.NODES.ENDPOINT.NAME}

Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep".

zabbix-zabbix-helm-chrt-agent
{$KUBE.LLD.FILTER.NODE.MATCHES}

Filter of discoverable nodes.

.*
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}

Filter to exclude discovered nodes.

CHANGE_IF_NEEDED
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}

Filter of discoverable nodes by role.

.*
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}

Filter to exclude discovered node by role.

CHANGE_IF_NEEDED
{$KUBE.NODE.FILTER.ANNOTATIONS}

Annotations to filter nodes (regex in values are supported). See the template's README.md for details.

{$KUBE.NODE.FILTER.LABELS}

Labels to filter nodes (regex in values are supported). See the template's README.md for details.

{$KUBE.POD.FILTER.ANNOTATIONS}

Annotations to filter pods (regex in values are supported). See the template's README.md for details.

{$KUBE.POD.FILTER.LABELS}

Labels to filter Pods (regex in values are supported). See the template's README.md for details.

{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}

Filter of discoverable pods by namespace.

.*
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}

Filter to exclude discovered pods by namespace.

CHANGE_IF_NEEDED

Items

Name Description Type Key and additional info
Kubernetes: Get nodes

Collecting and processing cluster nodes data via Kubernetes API.

Script kube.nodes
Get nodes check

Data collection check.

Dependent item kube.nodes.check

Preprocessing

  • JSON Path: $.error

    Custom on fail: Set value to

  • Discard unchanged with heartbeat: 3h

Triggers

Name Description Expression Severity Dependencies and additional info
Kubernetes: Failed to get nodes length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0 Warning

LLD rule Node discovery

Name Description Type Key and additional info
Node discovery Dependent item kube.node.discovery

Preprocessing

  • JSON Path: $.nodes..filternode

Item prototypes for Node discovery

Name Description Type Key and additional info
Node [{#NAME}]: Get data

Collecting and processing cluster by node [{#NAME}] data via Kubernetes API.

Dependent item kube.node.get[{#NAME}]

Preprocessing

  • JSON Path: $.nodes..[?(@.metadata.name == "{#NAME}")].first()

Node [{#NAME}] Addresses: External IP

Typically the IP address of the node that is externally routable (available from outside the cluster).

Dependent item kube.node.addresses.external_ip[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Addresses: Internal IP

Typically the IP address of the node that is routable only within the cluster.

Dependent item kube.node.addresses.internal_ip[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Allocatable: CPU

Allocatable CPU.

'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.

Dependent item kube.node.allocatable.cpu[{#NAME}]

Preprocessing

  • JSON Path: $.status.allocatable.cpu

Node [{#NAME}] Allocatable: Memory

Allocatable Memory.

'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.

Dependent item kube.node.allocatable.memory[{#NAME}]

Preprocessing

  • JSON Path: $.status.allocatable.memory

Node [{#NAME}] Allocatable: Pods

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

Dependent item kube.node.allocatable.pods[{#NAME}]

Preprocessing

  • JSON Path: $.status.allocatable.pods

Node [{#NAME}] Capacity: CPU

CPU resource capacity.

https://kubernetes.io/docs/concepts/architecture/nodes/#capacity

Dependent item kube.node.capacity.cpu[{#NAME}]

Preprocessing

  • JSON Path: $.status.capacity.cpu

Node [{#NAME}] Capacity: Memory

Memory resource capacity.

https://kubernetes.io/docs/concepts/architecture/nodes/#capacity

Dependent item kube.node.capacity.memory[{#NAME}]

Preprocessing

  • JSON Path: $.status.capacity.memory

Node [{#NAME}] Capacity: Pods

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

Dependent item kube.node.capacity.pods[{#NAME}]

Preprocessing

  • JSON Path: $.status.capacity.pods

Node [{#NAME}] Conditions: Disk pressure

True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.

Dependent item kube.node.conditions.diskpressure[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NAME}] Conditions: Memory pressure

True if pressure exists on the node memory - that is, if the node memory is low; otherwise False.

Dependent item kube.node.conditions.memorypressure[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NAME}] Conditions: Network unavailable

True if the network for the node is not correctly configured, otherwise False.

Dependent item kube.node.conditions.networkunavailable[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NAME}] Conditions: PID pressure

True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False.

Dependent item kube.node.conditions.pidpressure[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NAME}] Conditions: Ready

True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).

Dependent item kube.node.conditions.ready[{#NAME}]

Preprocessing

  • JSON Path: $.status.conditions[?(@.type == "Ready")].status.first()

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NAME}] Info: Architecture

Node architecture.

Dependent item kube.node.info.architecture[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.architecture

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: Container runtime

Container runtime.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/

Dependent item kube.node.info.containerruntime[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.containerRuntimeVersion

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: Kernel version

Node kernel version.

Dependent item kube.node.info.kernelversion[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.kernelVersion

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: Kubelet version

Version of Kubelet.

Dependent item kube.node.info.kubeletversion[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.kubeletVersion

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: KubeProxy version

Version of KubeProxy.

Dependent item kube.node.info.kubeproxyversion[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.kubeProxyVersion

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: Operating system

Node operating system.

Dependent item kube.node.info.operatingsystem[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.operatingSystem

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: OS image

Node OS image.

Dependent item kube.node.info.osversion[{#NAME}]

Preprocessing

  • JSON Path: $.status.nodeInfo.kernelVersion

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Info: Roles

Node roles.

Dependent item kube.node.info.roles[{#NAME}]

Preprocessing

  • JSON Path: $.status.roles

  • Discard unchanged with heartbeat: 3h

Node [{#NAME}] Limits: CPU

Node CPU limits.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Dependent item kube.node.limits.cpu[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

Node [{#NAME}] Limits: Memory

Node Memory limits.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Dependent item kube.node.limits.memory[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

Node [{#NAME}] Requests: CPU

Node CPU requests.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Dependent item kube.node.requests.cpu[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

Node [{#NAME}] Requests: Memory

Node Memory requests.

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Dependent item kube.node.requests.memory[{#NAME}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

Node [{#NAME}] Uptime

Node uptime.

Dependent item kube.node.uptime[{#NAME}]

Preprocessing

  • JSON Path: $.metadata.creationTimestamp

    Custom on fail: Discard value

  • JavaScript: return Math.floor((Date.now() - new Date(value)) / 1000);

Node [{#NAME}] Used: Pods

Current number of pods on the node.

Dependent item kube.node.used.pods[{#NAME}]

Preprocessing

  • JSON Path: $.status.podsCount

Trigger prototypes for Node discovery

Name Description Expression Severity Dependencies and additional info
Node [{#NAME}] Conditions: Pressure exists on the disk size

True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.

last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1 Warning
Node [{#NAME}] Conditions: Pressure exists on the node memory

True - pressure exists on the node memory - that is, if the node memory is low; otherwise False

last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1 Warning
Node [{#NAME}] Conditions: Network is not correctly configured

True - the network for the node is not correctly configured, otherwise False

last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1 Warning
Node [{#NAME}] Conditions: Pressure exists on the processes

True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False

last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1 Warning
Node [{#NAME}] Conditions: Is not in Ready state

False - if the node is not healthy and is not accepting pods.
Unknown - if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).

last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1 Warning
Node [{#NAME}] Limits: Total CPU limits are too high last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9 Warning Depends on:
  • Node [{#NAME}] Limits: Total CPU limits are too high
Node [{#NAME}] Limits: Total CPU limits are too high last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1 Average
Node [{#NAME}] Limits: Total memory limits are too high last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9 Warning Depends on:
  • Node [{#NAME}] Limits: Total memory limits are too high
Node [{#NAME}] Limits: Total memory limits are too high last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1 Average
Node [{#NAME}] Requests: Total CPU requests are too high last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5 Warning Depends on:
  • Node [{#NAME}] Requests: Total CPU requests are too high
Node [{#NAME}] Requests: Total CPU requests are too high last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8 Average
Node [{#NAME}] Requests: Total memory requests are too high last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5 Warning Depends on:
  • Node [{#NAME}] Requests: Total memory requests are too high
Node [{#NAME}] Requests: Total memory requests are too high last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8 Average
Node [{#NAME}]: Has been restarted

Uptime is less than 10 minutes.

last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10 Info
Node [{#NAME}] Used: Kubelet too many pods

Kubelet is running at capacity.

last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9 Warning

LLD rule Pod discovery

Name Description Type Key and additional info
Pod discovery Dependent item kube.pod.discovery

Preprocessing

  • JSON Path: $.Pods

  • Discard unchanged with heartbeat: 3h

Item prototypes for Pod discovery

Name Description Type Key and additional info
Node [{#NODE}] Pod [{#POD}]: Get data

Collecting and processing cluster by node [{#NODE}] data via Kubernetes API.

Dependent item kube.pod.get[{#POD}]

Preprocessing

  • JSON Path: $.Pods[?(@.name == "{#POD}")].first()

Node [{#NODE}] Pod [{#POD}] Conditions: Containers ready

All containers in the Pod are ready.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions

Dependent item kube.pod.conditions.containers_ready[{#POD}]

Preprocessing

  • JSON Path: $.conditions[?(@.type == "ContainersReady")].status.first()

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NODE}] Pod [{#POD}] Conditions: Initialized

All init containers have started successfully.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions

Dependent item kube.pod.conditions.initialized[{#POD}]

Preprocessing

  • JSON Path: $.conditions[?(@.type == "Initialized")].status.first()

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NODE}] Pod [{#POD}] Conditions: Ready

The Pod is able to serve requests and should be added to the load balancing pools of all matching Services.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions

Dependent item kube.pod.conditions.ready[{#POD}]

Preprocessing

  • JSON Path: $.conditions[?(@.type == "Ready")].status.first()

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NODE}] Pod [{#POD}] Conditions: Scheduled

The Pod has been scheduled to a node.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions

Dependent item kube.pod.conditions.scheduled[{#POD}]

Preprocessing

  • JSON Path: $.conditions[?(@.type == "PodScheduled")].status.first()

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NODE}] Pod [{#POD}] Containers: Restarts

The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection.

Dependent item kube.pod.containers.restartcount[{#POD}]

Preprocessing

  • JSON Path: $.containers.restartCount

    Custom on fail: Discard value

Node [{#NODE}] Pod [{#POD}] Status: Phase

The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase

Dependent item kube.pod.status.phase[{#POD}]

Preprocessing

  • JSON Path: $.phase

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Node [{#NODE}] Pod [{#POD}] Uptime

Pod uptime.

Dependent item kube.pod.uptime[{#POD}]

Preprocessing

  • JSON Path: $.startTime

    Custom on fail: Discard value

  • JavaScript: return Math.floor((Date.now() - new Date(value)) / 1000);

Trigger prototypes for Pod discovery

Name Description Expression Severity Dependencies and additional info
Node [{#NODE}] Pod [{#POD}]: Pod is crash looping

Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state.

(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}],15m))>1 Warning
Node [{#NODE}] Pod [{#POD}] Status: Kubernetes Pod not healthy

Pod has been in a non-ready state for longer than 10 minutes.

count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#POD}],10m, "regexp","^(1|4|5)$")>=9 High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums