History

yzl 93958d0fb0 zabbix6.0		2 years ago
..
README.md	zabbix6.0	2 years ago
template_kubernetes_nodes.yaml	zabbix6.0	2 years ago

README.md

Unescape Escape

Kubernetes nodes by HTTP

Overview

The template to monitor Kubernetes nodes that work without any external scripts.
It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API. Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F7.0) in your Kubernetes cluster.

Change the values according to the environment in the file $HOME/zabbix_values.yaml.

For example:

Enables use of Zabbix proxy
enabled: false

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.

Set up the macros to filter the metrics of discovered nodes

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Kubernetes 1.19.10

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Install the Zabbix Helm Chart in your Kubernetes cluster.

Set the {$KUBE.API.URL} such as <scheme>://<host>:<port>.

Get the generated service account token using the command

kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d

Then set it to the macro {$KUBE.API.TOKEN}.
Set {$KUBE.NODES.ENDPOINT.NAME} with Zabbix agent's endpoint name. See kubectl -n monitoring get ep. Default: zabbix-zabbix-helm-chrt-agent.

Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:

{$KUBE.LLD.FILTER.NODE.MATCHES}
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}

Set up macros to filter pod metrics by namespace:

{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}

Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.

You can use the {$KUBE.NODE.FILTER.LABELS}, {$KUBE.POD.FILTER.LABELS}, {$KUBE.NODE.FILTER.ANNOTATIONS} and {$KUBE.POD.FILTER.ANNOTATIONS} macros for advanced filtering of nodes and pods by labels and annotations.

Notes about labels and annotations filters:

Macro values should be specified separated by commas and must have the key/value form with support for regular expressions in the value (key1: value, key2: regexp).
ECMAScript syntax is used for regular expressions.
Filters are applied if such a label key exists for the entity that is being filtered (it means that if you specify a key in a filter, entities which do not have this key will not be affected by the filter and will still be discovered, and only entities containing that key will be filtered by the value).
You can also use the exclamation point symbol (!) to invert the filter (!key: value).

For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*. As a result, the nodes 5-25 without the "ingress" role will be discovered.

See the Kubernetes documentation for details about labels and annotations:

Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.

Macros used

Name	Description	Default
{$KUBE.API.URL}	Kubernetes API endpoint URL in the format ://:	`https://kubernetes.default.svc.cluster.local:443`
{$KUBE.API.TOKEN}	Service account bearer token.
{$KUBE.HTTP.PROXY}	Sets the HTTP proxy to `http_proxy` value. If this parameter is empty, then no proxy is used.
{$KUBE.NODES.ENDPOINT.NAME}	Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep".	`zabbix-zabbix-helm-chrt-agent`
{$KUBE.LLD.FILTER.NODE.MATCHES}	Filter of discoverable nodes.	`.*`
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE_IF_NEEDED`
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES}	Filter of discoverable nodes by role.	`.*`
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES}	Filter to exclude discovered node by role.	`CHANGE_IF_NEEDED`
{$KUBE.NODE.FILTER.ANNOTATIONS}	Annotations to filter nodes (regex in values are supported). See the template's README.md for details.
{$KUBE.NODE.FILTER.LABELS}	Labels to filter nodes (regex in values are supported). See the template's README.md for details.
{$KUBE.POD.FILTER.ANNOTATIONS}	Annotations to filter pods (regex in values are supported). See the template's README.md for details.
{$KUBE.POD.FILTER.LABELS}	Labels to filter Pods (regex in values are supported). See the template's README.md for details.
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES}	Filter of discoverable pods by namespace.	`.*`
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES}	Filter to exclude discovered pods by namespace.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Kubernetes: Get nodes

Name	Description	Type	Key and additional info
Kubernetes: Get nodes	Collecting and processing cluster nodes data via Kubernetes API.	Script	kube.nodes
Get nodes check	Data collection check.	Dependent item	kube.nodes.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Collecting and processing cluster nodes data via Kubernetes API.

Script

kube.nodes

Get nodes check

Data collection check.

Dependent item

kube.nodes.check

Preprocessing

JSON Path: $.error
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: 3h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Kubernetes: Failed to get nodes		`length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0`	Warning

LLD rule Node discovery

Name Description Type Key and additional info

Node discovery

Dependent item

Name	Description	Type	Key and additional info
Node discovery		Dependent item	kube.node.discovery Preprocessing JSON Path: `$.nodes..filternode`

kube.node.discovery

Preprocessing

JSON Path: $.nodes..filternode

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
Node [{#NAME}]: Get data	Collecting and processing cluster by node [{#NAME}] data via Kubernetes API.	Dependent item	kube.node.get[{#NAME}] Preprocessing JSON Path: `$.nodes..[?(@.metadata.name == "{#NAME}")].first()`
Node [{#NAME}] Addresses: External IP	Typically the IP address of the node that is externally routable (available from outside the cluster).	Dependent item	kube.node.addresses.external_ip[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Addresses: Internal IP	Typically the IP address of the node that is routable only within the cluster.	Dependent item	kube.node.addresses.internal_ip[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Allocatable: CPU	Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.	Dependent item	kube.node.allocatable.cpu[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.cpu`
Node [{#NAME}] Allocatable: Memory	Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.	Dependent item	kube.node.allocatable.memory[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.memory`
Node [{#NAME}] Allocatable: Pods	https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/	Dependent item	kube.node.allocatable.pods[{#NAME}] Preprocessing JSON Path: `$.status.allocatable.pods`
Node [{#NAME}] Capacity: CPU	CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity	Dependent item	kube.node.capacity.cpu[{#NAME}] Preprocessing JSON Path: `$.status.capacity.cpu`
Node [{#NAME}] Capacity: Memory	Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity	Dependent item	kube.node.capacity.memory[{#NAME}] Preprocessing JSON Path: `$.status.capacity.memory`
Node [{#NAME}] Capacity: Pods	https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/	Dependent item	kube.node.capacity.pods[{#NAME}] Preprocessing JSON Path: `$.status.capacity.pods`
Node [{#NAME}] Conditions: Disk pressure	True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.	Dependent item	kube.node.conditions.diskpressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Memory pressure	True if pressure exists on the node memory - that is, if the node memory is low; otherwise False.	Dependent item	kube.node.conditions.memorypressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Network unavailable	True if the network for the node is not correctly configured, otherwise False.	Dependent item	kube.node.conditions.networkunavailable[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: PID pressure	True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False.	Dependent item	kube.node.conditions.pidpressure[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Conditions: Ready	True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).	Dependent item	kube.node.conditions.ready[{#NAME}] Preprocessing JSON Path: `$.status.conditions[?(@.type == "Ready")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NAME}] Info: Architecture	Node architecture.	Dependent item	kube.node.info.architecture[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.architecture` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Container runtime	Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/	Dependent item	kube.node.info.containerruntime[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.containerRuntimeVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Kernel version	Node kernel version.	Dependent item	kube.node.info.kernelversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kernelVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Kubelet version	Version of Kubelet.	Dependent item	kube.node.info.kubeletversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kubeletVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: KubeProxy version	Version of KubeProxy.	Dependent item	kube.node.info.kubeproxyversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kubeProxyVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Operating system	Node operating system.	Dependent item	kube.node.info.operatingsystem[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.operatingSystem` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: OS image	Node OS image.	Dependent item	kube.node.info.osversion[{#NAME}] Preprocessing JSON Path: `$.status.nodeInfo.kernelVersion` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Info: Roles	Node roles.	Dependent item	kube.node.info.roles[{#NAME}] Preprocessing JSON Path: `$.status.roles` Discard unchanged with heartbeat: `3h`
Node [{#NAME}] Limits: CPU	Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.limits.cpu[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Limits: Memory	Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.limits.memory[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Requests: CPU	Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.requests.cpu[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Requests: Memory	Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	Dependent item	kube.node.requests.memory[{#NAME}] Preprocessing JSON Path: `The text is too long. Please see the template.`
Node [{#NAME}] Uptime	Node uptime.	Dependent item	kube.node.uptime[{#NAME}] Preprocessing JSON Path: `$.metadata.creationTimestamp` ⛔️Custom on fail: Discard value JavaScript: `return Math.floor((Date.now() - new Date(value)) / 1000);`
Node [{#NAME}] Used: Pods	Current number of pods on the node.	Dependent item	kube.node.used.pods[{#NAME}] Preprocessing JSON Path: `$.status.podsCount`

Trigger prototypes for Node discovery

Name	Description	Expression	Severity	Dependencies and additional info
Node [{#NAME}] Conditions: Pressure exists on the disk size	True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False.	`last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1`	Warning
Node [{#NAME}] Conditions: Pressure exists on the node memory	True - pressure exists on the node memory - that is, if the node memory is low; otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1`	Warning
Node [{#NAME}] Conditions: Network is not correctly configured	True - the network for the node is not correctly configured, otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1`	Warning
Node [{#NAME}] Conditions: Pressure exists on the processes	True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False	`last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1`	Warning
Node [{#NAME}] Conditions: Is not in Ready state	False - if the node is not healthy and is not accepting pods. Unknown - if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds).	`last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1`	Warning
Node [{#NAME}] Limits: Total CPU limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9`	Warning	Depends on: Node [{#NAME}] Limits: Total CPU limits are too high
Node [{#NAME}] Limits: Total CPU limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1`	Average
Node [{#NAME}] Limits: Total memory limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9`	Warning	Depends on: Node [{#NAME}] Limits: Total memory limits are too high
Node [{#NAME}] Limits: Total memory limits are too high		`last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1`	Average
Node [{#NAME}] Requests: Total CPU requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5`	Warning	Depends on: Node [{#NAME}] Requests: Total CPU requests are too high
Node [{#NAME}] Requests: Total CPU requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8`	Average
Node [{#NAME}] Requests: Total memory requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5`	Warning	Depends on: Node [{#NAME}] Requests: Total memory requests are too high
Node [{#NAME}] Requests: Total memory requests are too high		`last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8`	Average
Node [{#NAME}]: Has been restarted	Uptime is less than 10 minutes.	`last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10`	Info
Node [{#NAME}] Used: Kubelet too many pods	Kubelet is running at capacity.	`last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9`	Warning

LLD rule Pod discovery

Name Description Type Key and additional info

Pod discovery

Dependent item

Name	Description	Type	Key and additional info
Pod discovery		Dependent item	kube.pod.discovery Preprocessing JSON Path: `$.Pods` Discard unchanged with heartbeat: `3h`

kube.pod.discovery

Preprocessing

JSON Path: $.Pods
Discard unchanged with heartbeat: 3h

Item prototypes for Pod discovery

Name	Description	Type	Key and additional info
Node [{#NODE}] Pod [{#POD}]: Get data	Collecting and processing cluster by node [{#NODE}] data via Kubernetes API.	Dependent item	kube.pod.get[{#POD}] Preprocessing JSON Path: `$.Pods[?(@.name == "{#POD}")].first()`
Node [{#NODE}] Pod [{#POD}] Conditions: Containers ready	All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.containers_ready[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "ContainersReady")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Conditions: Initialized	All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.initialized[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "Initialized")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Conditions: Ready	The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.ready[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "Ready")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Conditions: Scheduled	The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions	Dependent item	kube.pod.conditions.scheduled[{#POD}] Preprocessing JSON Path: `$.conditions[?(@.type == "PodScheduled")].status.first()` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Containers: Restarts	The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection.	Dependent item	kube.pod.containers.restartcount[{#POD}] Preprocessing JSON Path: `$.containers.restartCount` ⛔️Custom on fail: Discard value
Node [{#NODE}] Pod [{#POD}] Status: Phase	The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase	Dependent item	kube.pod.status.phase[{#POD}] Preprocessing JSON Path: `$.phase` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Node [{#NODE}] Pod [{#POD}] Uptime	Pod uptime.	Dependent item	kube.pod.uptime[{#POD}] Preprocessing JSON Path: `$.startTime` ⛔️Custom on fail: Discard value JavaScript: `return Math.floor((Date.now() - new Date(value)) / 1000);`

Trigger prototypes for Pod discovery

Name	Description	Expression	Severity	Dependencies and additional info
Node [{#NODE}] Pod [{#POD}]: Pod is crash looping	Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state.	`(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}],15m))>1`	Warning
Node [{#NODE}] Pod [{#POD}] Status: Kubernetes Pod not healthy	Pod has been in a non-ready state for longer than 10 minutes.	`count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#POD}],10m, "regexp","^(1\|4\|5)$")>=9`	High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

README.md Unescape Escape

Kubernetes nodes by HTTP

Overview

Enables use of Zabbix proxy

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule Node discovery

Item prototypes for Node discovery

Trigger prototypes for Node discovery

LLD rule Pod discovery

Item prototypes for Pod discovery

Trigger prototypes for Pod discovery

Feedback

README.md

Unescape Escape