You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

32 KiB

Linux by Prom

Overview

This template collects Linux metrics from node_exporter 0.18 and above. Support for older node_exporter versions is provided as 'best effort'.

Known Issues

  • Description: node_exporter v0.16.0 renamed many metrics. CPU utilization for 'guest' and 'guest_nice' metrics are not supported in this template with node_exporter < 0.16. Disk IO metrics are not supported. Other metrics provided as 'best effort'. See https://github.com/prometheus/node_exporter/releases/tag/v0.16.0 for details.

    • version: below 0.16.0
  • Description: metric node_network_info with label 'device' cannot be found, so network discovery is not possible.

    • version: below 0.18

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • node_exporter 0.17.0
  • node_exporter 0.18.1

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Please refer to the node_exporter docs. Use node_exporter v0.18.0 or above.

Macros used

Name Description Default
{$CPU.UTIL.CRIT} 90
{$IF.ERRORS.WARN} 2
{$IF.UTIL.MAX} 90
{$SYSTEM.FUZZYTIME.MAX} 60
{$KERNEL.MAXFILES.MIN} 256
{$LOAD_AVG_PER_CPU.MAX.WARN}

Load per CPU considered sustainable. Tune if needed.

1.5
{$NODE_EXPORTER_PORT}

TCP Port node_exporter is listening on.

9100
{$SWAP.PFREE.MIN.WARN} 50
{$VFS.DEV.READ.AWAIT.WARN}

Disk read average response time (in ms) before the trigger would fire.

20
{$VFS.DEV.WRITE.AWAIT.WARN}

Disk write average response time (in ms) before the trigger would fire.

20
{$VFS.DEV.DEVNAME.NOT_MATCHES}

This macro is used in block devices discovery. Can be overridden on the host or linked template level.

Macro too long. Please see the template.
{$VFS.DEV.DEVNAME.MATCHES}

This macro is used in block devices discovery. Can be overridden on the host or linked template level.

.+
{$VFS.FS.FSNAME.NOT_MATCHES}

This macro is used in filesystems discovery. Can be overridden on the host or linked template level.

^(/dev|/sys|/run|/proc|.+/shm$)
{$VFS.FS.FSNAME.MATCHES}

This macro is used in filesystems discovery. Can be overridden on the host or linked template level.

.+
{$VFS.FS.FSTYPE.MATCHES}

This macro is used in filesystems discovery. Can be overridden on the host or linked template level.

Macro too long. Please see the template.
{$VFS.FS.FSTYPE.NOT_MATCHES}

This macro is used in filesystems discovery. Can be overridden on the host or linked template level.

^\s$
{$VFS.FS.FSDEVICE.MATCHES}

This macro is used in filesystems discovery. Can be overridden on the host or linked template level.

^.+$
{$VFS.FS.FSDEVICE.NOT_MATCHES}

This macro is used in filesystems discovery. Can be overridden on the host or linked template level.

^\s$
{$MEMORY.UTIL.MAX} 90
{$MEMORY.AVAILABLE.MIN} 20M
{$IFCONTROL} 1
{$NET.IF.IFNAME.MATCHES} ^.*$
{$NET.IF.IFNAME.NOT_MATCHES}

Filter out loopbacks, nulls, docker veth links and docker0 bridge by default.

Macro too long. Please see the template.
{$NET.IF.IFOPERSTATUS.MATCHES} ^.*$
{$NET.IF.IFOPERSTATUS.NOT_MATCHES}

Ignore notPresent(7).

^7$
{$NET.IF.IFALIAS.MATCHES} ^.*$
{$NET.IF.IFALIAS.NOT_MATCHES} CHANGE_IF_NEEDED
{$VFS.FS.FREE.MIN.CRIT}

The critical threshold of the filesystem utilization.

5G
{$VFS.FS.FREE.MIN.WARN}

The warning threshold of the filesystem utilization.

10G
{$VFS.FS.INODE.PFREE.MIN.CRIT} 10
{$VFS.FS.INODE.PFREE.MIN.WARN} 20
{$VFS.FS.PUSED.MAX.CRIT} 90
{$VFS.FS.PUSED.MAX.WARN} 80

Items

Name Description Type Key and additional info
Linux: Get node_exporter metrics HTTP agent node_exporter.get
Linux: Version of node_exporter running Dependent item agent.version[node_exporter]

Preprocessing

  • Prometheus pattern: node_exporter_build_info label version

  • Discard unchanged with heartbeat: 1d

Linux: System boot time Dependent item system.boottime[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"^node_boot_time(?:_seconds)?$"})

Linux: System local time

The local system time of the host.

Dependent item system.localtime[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"^node_time(?:_seconds)?$"})

Linux: System name

The host name of the system.

Dependent item system.name[node_exporter]

Preprocessing

  • Prometheus pattern: node_uname_info label nodename

  • Discard unchanged with heartbeat: 1d

Linux: System description

Labeled system information as provided by the uname system call.

Dependent item system.descr[node_exporter]

Preprocessing

  • Prometheus to JSON: node_uname_info

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 1d

Linux: Maximum number of open file descriptors

It could be increased by using sysctl utility or modifying the file /etc/sysctl.conf.

Dependent item kernel.maxfiles[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE(node_filefd_maximum)

  • Discard unchanged with heartbeat: 1d

Linux: Number of open file descriptors Dependent item fd.open[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE(node_filefd_allocated)

Linux: Operating system Dependent item system.sw.os[node_exporter]

Preprocessing

  • Discard unchanged with heartbeat: 1d

Linux: Operating system architecture

The architecture of the operating system.

Dependent item system.sw.arch[node_exporter]

Preprocessing

  • Prometheus pattern: node_uname_info label machine

  • Discard unchanged with heartbeat: 1d

Linux: System uptime

The system uptime expressed in the following format: "N days, hh:mm:ss".

Dependent item system.uptime[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"^node_boot_time(?:_seconds)?$"})

  • JavaScript: The text is too long. Please see the template.

Linux: Load average (1m avg) Dependent item system.cpu.load.avg1[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE(node_load1)

Linux: Load average (5m avg) Dependent item system.cpu.load.avg5[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE(node_load5)

Linux: Load average (15m avg) Dependent item system.cpu.load.avg15[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE(node_load15)

Linux: Number of CPUs Dependent item system.cpu.num[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

Linux: CPU idle time

The time the CPU has spent doing nothing.

Dependent item system.cpu.idle[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU utilization

The CPU utilization expressed in %.

Dependent item system.cpu.util[node_exporter]

Preprocessing

  • JavaScript: //Calculate utilization<br>return (100 - value)

Linux: CPU system time

The time the CPU has spent running the kernel and its processes.

Dependent item system.cpu.system[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU user time

The time the CPU has spent running users' processes that are not niced.

Dependent item system.cpu.user[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU steal time

The amount of "stolen" CPU from this virtual machine by the hypervisor for other tasks, such as running another virtual machine.

Dependent item system.cpu.steal[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU softirq time

The amount of time the CPU has been servicing software interrupts.

Dependent item system.cpu.softirq[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU nice time

The time the CPU has spent running users' processes that have been niced.

Dependent item system.cpu.nice[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU iowait time

The amount of time the CPU has been waiting for I/O to complete.

Dependent item system.cpu.iowait[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU interrupt time

The amount of time the CPU has been servicing hardware interrupts.

Dependent item system.cpu.interrupt[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU guest time

Guest time - the time spent on running a virtual CPU for a guest operating system.

Dependent item system.cpu.guest[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: CPU guest nice time

The time spent on running a niced guest (a virtual CPU for guest operating systems under the control of the Linux kernel).

Dependent item system.cpu.guest_nice[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Change per second
  • Custom multiplier: 100

Linux: Interrupts per second Dependent item system.cpu.intr[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"node_intr"})

  • Change per second
Linux: Context switches per second Dependent item system.cpu.switches[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"node_context_switches"})

  • Change per second
Linux: Memory utilization

Memory used percentage is calculated as (total-available)/total*100.

Calculated vm.memory.util[node_exporter]
Linux: Total memory

The total memory expressed in bytes.

Dependent item vm.memory.total[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"node_memory_MemTotal"})

Linux: Available memory

The available memory:

- in Linux - available = free + buffers + cache;

- on other platforms calculation may vary.

See also Appendixes in Zabbix Documentation about parameters of the vm.memory.size item.

Dependent item vm.memory.available[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"node_memory_MemAvailable"})

Linux: Total swap space

The total space of the swap volume/file expressed in bytes.

Dependent item system.swap.total[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"node_memory_SwapTotal"})

Linux: Free swap space

The free space of the swap volume/file expressed in bytes.

Dependent item system.swap.free[node_exporter]

Preprocessing

  • Prometheus pattern: VALUE({__name__=~"node_memory_SwapFree"})

Linux: Free swap space in %

The free space of the swap volume/file expressed in %.

Calculated system.swap.pfree[node_exporter]

Triggers

Name Description Expression Severity Dependencies and additional info
Linux: node_exporter is not available

Failed to fetch system metrics from node_exporter in time.

nodata(/Linux by Prom/node_exporter.get,30m)=1 Warning Manual close: Yes
Linux: System time is out of sync

The host's system time is different from Zabbix server time.

fuzzytime(/Linux by Prom/system.localtime[node_exporter],{$SYSTEM.FUZZYTIME.MAX})=0 Warning Manual close: Yes
Linux: System name has changed

The name of the system has changed. Acknowledge to close the problem manually.

last(/Linux by Prom/system.name[node_exporter],#1)<>last(/Linux by Prom/system.name[node_exporter],#2) and length(last(/Linux by Prom/system.name[node_exporter]))>0 Info Manual close: Yes
Linux: Configured max number of open filedescriptors is too low last(/Linux by Prom/kernel.maxfiles[node_exporter])<{$KERNEL.MAXFILES.MIN} Info Depends on:
  • Linux: Running out of file descriptors
Linux: Running out of file descriptors last(/Linux by Prom/fd.open[node_exporter])/last(/Linux by Prom/kernel.maxfiles[node_exporter])*100>80 Warning
Linux: Operating system description has changed

The description of the operating system has changed. Possible reasons are that the system has been updated or replaced. Acknowledge to close the problem manually.

last(/Linux by Prom/system.sw.os[node_exporter],#1)<>last(/Linux by Prom/system.sw.os[node_exporter],#2) and length(last(/Linux by Prom/system.sw.os[node_exporter]))>0 Info Manual close: Yes
Depends on:
  • Linux: System name has changed
Linux: {HOST.NAME} has been restarted

The device uptime is less than 10 minutes.

last(/Linux by Prom/system.uptime[node_exporter])<10m Warning Manual close: Yes
Linux: Load average is too high

The load average per CPU is too high. The system may be slow to respond.

min(/Linux by Prom/system.cpu.load.avg1[node_exporter],5m)/last(/Linux by Prom/system.cpu.num[node_exporter])>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/Linux by Prom/system.cpu.load.avg5[node_exporter])>0 and last(/Linux by Prom/system.cpu.load.avg15[node_exporter])>0 Average
Linux: High CPU utilization

The CPU utilization is too high. The system might be slow to respond.

min(/Linux by Prom/system.cpu.util[node_exporter],5m)>{$CPU.UTIL.CRIT} Warning Depends on:
  • Linux: Load average is too high
Linux: High memory utilization

The system is running out of free memory.

min(/Linux by Prom/vm.memory.util[node_exporter],5m)>{$MEMORY.UTIL.MAX} Average Depends on:
  • Linux: Lack of available memory
Linux: Lack of available memory max(/Linux by Prom/vm.memory.available[node_exporter],5m)<{$MEMORY.AVAILABLE.MIN} and last(/Linux by Prom/vm.memory.total[node_exporter])>0 Average
Linux: High swap space usage

If there is no swap configured, this trigger is ignored.

max(/Linux by Prom/system.swap.pfree[node_exporter],5m)<{$SWAP.PFREE.MIN.WARN} and last(/Linux by Prom/system.swap.total[node_exporter])>0 Warning Depends on:
  • Linux: Lack of available memory
  • Linux: High memory utilization

LLD rule Network interface discovery

Name Description Type Key and additional info
Network interface discovery

Discovery of network interfaces. Requires node_exporter v0.18 and up.

Dependent item net.if.discovery[node_exporter]

Preprocessing

  • Prometheus to JSON: {__name__=~"^node_network_info$"}

Item prototypes for Network interface discovery

Name Description Type Key and additional info
Interface {#IFNAME}({#IFALIAS}): Bits received Dependent item net.if.in[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_receive_bytes_total{device="{#IFNAME}"})

  • Change per second
  • Custom multiplier: 8

Interface {#IFNAME}({#IFALIAS}): Bits sent Dependent item net.if.out[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_transmit_bytes_total{device="{#IFNAME}"})

  • Change per second
  • Custom multiplier: 8

Interface {#IFNAME}({#IFALIAS}): Outbound packets with errors Dependent item net.if.out.errors[node_exporter"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_transmit_errs_total{device="{#IFNAME}"})

  • Change per second
Interface {#IFNAME}({#IFALIAS}): Inbound packets with errors Dependent item net.if.in.errors[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_receive_errs_total{device="{#IFNAME}"})

  • Change per second
Interface {#IFNAME}({#IFALIAS}): Inbound packets discarded Dependent item net.if.in.discards[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_receive_drop_total{device="{#IFNAME}"})

  • Change per second
Interface {#IFNAME}({#IFALIAS}): Outbound packets discarded Dependent item net.if.out.discards[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_transmit_drop_total{device="{#IFNAME}"})

  • Change per second
Interface {#IFNAME}({#IFALIAS}): Speed

Sets value to 0 if metric is missing in node_exporter output.

Dependent item net.if.speed[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_speed_bytes{device="{#IFNAME}"})

    Custom on fail: Set value to: 0

  • Custom multiplier: 8

Interface {#IFNAME}({#IFALIAS}): Interface type

node_network_protocol_type protocol_type value of /sys/class/net/.

Dependent item net.if.type[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_network_protocol_type{device="{#IFNAME}"})

Interface {#IFNAME}({#IFALIAS}): Operational status

Reference: https://www.kernel.org/doc/Documentation/networking/operstates.txt

Dependent item net.if.status[node_exporter,"{#IFNAME}"]

Preprocessing

  • Prometheus pattern: node_network_info{device="{#IFNAME}"} label operstate

  • JavaScript: The text is too long. Please see the template.

Trigger prototypes for Network interface discovery

Name Description Expression Severity Dependencies and additional info
Interface {#IFNAME}({#IFALIAS}): High bandwidth usage

The utilization of the network interface is close to its estimated maximum bandwidth.

(avg(/Linux by Prom/net.if.in[node_exporter,"{#IFNAME}"],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"]) or avg(/Linux by Prom/net.if.out[node_exporter,"{#IFNAME}"],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])) and last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])>0 Warning Manual close: Yes
Depends on:
  • Interface {#IFNAME}({#IFALIAS}): Link down
Interface {#IFNAME}({#IFALIAS}): High error rate

It recovers when it is below 80% of the {$IF.ERRORS.WARN:"{#IFNAME}"} threshold.

min(/Linux by Prom/net.if.in.errors[node_exporter,"{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} or min(/Linux by Prom/net.if.out.errors[node_exporter"{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Warning Manual close: Yes
Depends on:
  • Interface {#IFNAME}({#IFALIAS}): Link down
Interface {#IFNAME}({#IFALIAS}): Ethernet has changed to lower speed than it was before

This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.

change(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])<0 and last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])>0 and ( last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=6 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=7 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=11 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=62 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=69 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=117 ) and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])<>2) Info Manual close: Yes
Depends on:
  • Interface {#IFNAME}({#IFALIAS}): Link down
Interface {#IFNAME}({#IFALIAS}): Ethernet has changed to lower speed than it was before

This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.

change(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])<0 and last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])>0 and (last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=6 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=1) and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])<>2) Info Manual close: Yes
Depends on:
  • Interface {#IFNAME}({#IFALIAS}): Link down
Interface {#IFNAME}({#IFALIAS}): Link down

This trigger expression works as follows:
1. It can be triggered if the operations status is down.
2. {$IFCONTROL:"{#IFNAME}"}=1 - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.
3. {TEMPLATE_NAME:METRIC.diff()}=1 - the trigger fires only if the operational status was up to (1) sometime before (so, do not fire for the 'eternal off' interfaces.)

WARNING: if closed manually - it will not fire again on the next poll, because of .diff.

{$IFCONTROL:"{#IFNAME}"}=1 and last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])=2 and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"],#1)<>last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"],#2)) Average Manual close: Yes

LLD rule Mounted filesystem discovery

Name Description Type Key and additional info
Mounted filesystem discovery

Discovery of file systems of different types.

Dependent item vfs.fs.discovery[node_exporter]

Preprocessing

  • Prometheus to JSON: The text is too long. Please see the template.

Item prototypes for Mounted filesystem discovery

Name Description Type Key and additional info
{#FSNAME}: Free space Dependent item vfs.fs.free[node_exporter,"{#FSNAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

{#FSNAME}: Total space

Total space in bytes

Dependent item vfs.fs.total[node_exporter,"{#FSNAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

{#FSNAME}: Used space

Used storage in bytes

Calculated vfs.fs.used[node_exporter,"{#FSNAME}"]
{#FSNAME}: Space utilization

The space utilization expressed in % for {#FSNAME}.

Calculated vfs.fs.pused[node_exporter,"{#FSNAME}"]
{#FSNAME}: Free inodes in % Dependent item vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"]

Preprocessing

  • Prometheus to JSON: {__name__=~"node_filesystem_files.*",mountpoint="{#FSNAME}"}

  • JavaScript: The text is too long. Please see the template.

Trigger prototypes for Mounted filesystem discovery

Name Description Expression Severity Dependencies and additional info
{#FSNAME}: Disk space is critically low

Two conditions should match:
1. The first condition - utilization of the space should be above {$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}.
2. The second condition should be one of the following:
- the disk free space is less than {$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"};
- the disk will be full in less than 24 hours.

last(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and ((last(/Linux by Prom/vfs.fs.total[node_exporter,"{#FSNAME}"])-last(/Linux by Prom/vfs.fs.used[node_exporter,"{#FSNAME}"]))<{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} or timeleft(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"],1h,100)<1d) Average Manual close: Yes
{#FSNAME}: Disk space is low

Two conditions should match:
1. The first condition - utilization of the space should be above {$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"}.
2. The second condition should be one of the following:
- the disk free space is less than {$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"};
- the disk will be full in less than 24 hours.

last(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"])>{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"} and ((last(/Linux by Prom/vfs.fs.total[node_exporter,"{#FSNAME}"])-last(/Linux by Prom/vfs.fs.used[node_exporter,"{#FSNAME}"]))<{$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"} or timeleft(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"],1h,100)<1d) Warning Manual close: Yes
Depends on:
  • {#FSNAME}: Disk space is critically low
{#FSNAME}: Running out of free inodes

It may become impossible to write to a disk if there are no index nodes left.
The following error messages may be returned as symptoms, even though the free space is available:
- 'No space left on device';
- 'Disk is full'.

min(/Linux by Prom/vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"],5m)<{$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"} Average
{#FSNAME}: Running out of free inodes

It may become impossible to write to a disk if there are no index nodes left.
The following error messages may be returned as symptoms, even though the free space is available:
- 'No space left on device';
- 'Disk is full'.

min(/Linux by Prom/vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"],5m)<{$VFS.FS.INODE.PFREE.MIN.WARN:"{#FSNAME}"} Warning Depends on:
  • {#FSNAME}: Running out of free inodes

LLD rule Block devices discovery

Name Description Type Key and additional info
Block devices discovery Dependent item vfs.dev.discovery[node_exporter]

Preprocessing

  • Prometheus to JSON: node_disk_io_now{device=~".+"}

Item prototypes for Block devices discovery

Name Description Type Key and additional info
{#DEVNAME}: Disk read rate

r/s. The number (after merges) of read requests completed per second for the device.

Dependent item vfs.dev.read.rate[node_exporter,"{#DEVNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_disk_reads_completed_total{device="{#DEVNAME}"})

  • Change per second
{#DEVNAME}: Disk write rate

w/s. The number (after merges) of write requests completed per second for the device.

Dependent item vfs.dev.write.rate[node_exporter,"{#DEVNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_disk_writes_completed_total{device="{#DEVNAME}"})

  • Change per second
{#DEVNAME}: Disk read time (rate)

Rate of total read time counter. Used in r_await calculation.

Dependent item vfs.dev.read.time.rate[node_exporter,"{#DEVNAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
{#DEVNAME}: Disk write time (rate)

Rate of total write time counter. Used in w_await calculation.

Dependent item vfs.dev.write.time.rate[node_exporter,"{#DEVNAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
{#DEVNAME}: Disk read request avg waiting time (r_await)

This formula contains two Boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.

Calculated vfs.dev.read.await[node_exporter,"{#DEVNAME}"]
{#DEVNAME}: Disk write request avg waiting time (w_await)

This formula contains two Boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.

Calculated vfs.dev.write.await[node_exporter,"{#DEVNAME}"]
{#DEVNAME}: Disk average queue size (avgqu-sz)

The current average disk queue; the number of requests outstanding on the disk while the performance data is being collected.

Dependent item vfs.dev.queue_size[node_exporter,"{#DEVNAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
{#DEVNAME}: Disk utilization

This item is the percentage of elapsed time during which the selected disk drive was busy while servicing read or write requests.

Dependent item vfs.dev.util[node_exporter,"{#DEVNAME}"]

Preprocessing

  • Prometheus pattern: VALUE(node_disk_io_time_seconds_total{device="{#DEVNAME}"})

  • Change per second
  • Custom multiplier: 100

Trigger prototypes for Block devices discovery

Name Description Expression Severity Dependencies and additional info
{#DEVNAME}: Disk read/write request responses are too high

This trigger might indicate the disk {#DEVNAME} saturation.

min(/Linux by Prom/vfs.dev.read.await[node_exporter,"{#DEVNAME}"],15m) > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} or min(/Linux by Prom/vfs.dev.write.await[node_exporter,"{#DEVNAME}"],15m) > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"} Warning Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums