yzl
93958d0fb0
|
1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
template_os_linux_prom.yaml | 1 year ago |
README.md
Linux by Prom
Overview
This template collects Linux metrics from node_exporter 0.18 and above. Support for older node_exporter versions is provided as 'best effort'.
Known Issues
-
Description: node_exporter v0.16.0 renamed many metrics. CPU utilization for 'guest' and 'guest_nice' metrics are not supported in this template with node_exporter < 0.16. Disk IO metrics are not supported. Other metrics provided as 'best effort'. See https://github.com/prometheus/node_exporter/releases/tag/v0.16.0 for details.
- version: below 0.16.0
-
Description: metric node_network_info with label 'device' cannot be found, so network discovery is not possible.
- version: below 0.18
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- node_exporter 0.17.0
- node_exporter 0.18.1
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Please refer to the node_exporter docs. Use node_exporter v0.18.0 or above.
Macros used
Name | Description | Default |
---|---|---|
{$CPU.UTIL.CRIT} | 90 |
|
{$IF.ERRORS.WARN} | 2 |
|
{$IF.UTIL.MAX} | 90 |
|
{$SYSTEM.FUZZYTIME.MAX} | 60 |
|
{$KERNEL.MAXFILES.MIN} | 256 |
|
{$LOAD_AVG_PER_CPU.MAX.WARN} | Load per CPU considered sustainable. Tune if needed. |
1.5 |
{$NODE_EXPORTER_PORT} | TCP Port node_exporter is listening on. |
9100 |
{$SWAP.PFREE.MIN.WARN} | 50 |
|
{$VFS.DEV.READ.AWAIT.WARN} | Disk read average response time (in ms) before the trigger would fire. |
20 |
{$VFS.DEV.WRITE.AWAIT.WARN} | Disk write average response time (in ms) before the trigger would fire. |
20 |
{$VFS.DEV.DEVNAME.NOT_MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level. |
Macro too long. Please see the template. |
{$VFS.DEV.DEVNAME.MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level. |
.+ |
{$VFS.FS.FSNAME.NOT_MATCHES} | This macro is used in filesystems discovery. Can be overridden on the host or linked template level. |
^(/dev|/sys|/run|/proc|.+/shm$) |
{$VFS.FS.FSNAME.MATCHES} | This macro is used in filesystems discovery. Can be overridden on the host or linked template level. |
.+ |
{$VFS.FS.FSTYPE.MATCHES} | This macro is used in filesystems discovery. Can be overridden on the host or linked template level. |
Macro too long. Please see the template. |
{$VFS.FS.FSTYPE.NOT_MATCHES} | This macro is used in filesystems discovery. Can be overridden on the host or linked template level. |
^\s$ |
{$VFS.FS.FSDEVICE.MATCHES} | This macro is used in filesystems discovery. Can be overridden on the host or linked template level. |
^.+$ |
{$VFS.FS.FSDEVICE.NOT_MATCHES} | This macro is used in filesystems discovery. Can be overridden on the host or linked template level. |
^\s$ |
{$MEMORY.UTIL.MAX} | 90 |
|
{$MEMORY.AVAILABLE.MIN} | 20M |
|
{$IFCONTROL} | 1 |
|
{$NET.IF.IFNAME.MATCHES} | ^.*$ |
|
{$NET.IF.IFNAME.NOT_MATCHES} | Filter out loopbacks, nulls, docker veth links and docker0 bridge by default. |
Macro too long. Please see the template. |
{$NET.IF.IFOPERSTATUS.MATCHES} | ^.*$ |
|
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(7). |
^7$ |
{$NET.IF.IFALIAS.MATCHES} | ^.*$ |
|
{$NET.IF.IFALIAS.NOT_MATCHES} | CHANGE_IF_NEEDED |
|
{$VFS.FS.FREE.MIN.CRIT} | The critical threshold of the filesystem utilization. |
5G |
{$VFS.FS.FREE.MIN.WARN} | The warning threshold of the filesystem utilization. |
10G |
{$VFS.FS.INODE.PFREE.MIN.CRIT} | 10 |
|
{$VFS.FS.INODE.PFREE.MIN.WARN} | 20 |
|
{$VFS.FS.PUSED.MAX.CRIT} | 90 |
|
{$VFS.FS.PUSED.MAX.WARN} | 80 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Linux: Get node_exporter metrics | HTTP agent | node_exporter.get | |
Linux: Version of node_exporter running | Dependent item | agent.version[node_exporter] Preprocessing
|
|
Linux: System boot time | Dependent item | system.boottime[node_exporter] Preprocessing
|
|
Linux: System local time | The local system time of the host. |
Dependent item | system.localtime[node_exporter] Preprocessing
|
Linux: System name | The host name of the system. |
Dependent item | system.name[node_exporter] Preprocessing
|
Linux: System description | Labeled system information as provided by the uname system call. |
Dependent item | system.descr[node_exporter] Preprocessing
|
Linux: Maximum number of open file descriptors | It could be increased by using |
Dependent item | kernel.maxfiles[node_exporter] Preprocessing
|
Linux: Number of open file descriptors | Dependent item | fd.open[node_exporter] Preprocessing
|
|
Linux: Operating system | Dependent item | system.sw.os[node_exporter] Preprocessing
|
|
Linux: Operating system architecture | The architecture of the operating system. |
Dependent item | system.sw.arch[node_exporter] Preprocessing
|
Linux: System uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | system.uptime[node_exporter] Preprocessing
|
Linux: Load average (1m avg) | Dependent item | system.cpu.load.avg1[node_exporter] Preprocessing
|
|
Linux: Load average (5m avg) | Dependent item | system.cpu.load.avg5[node_exporter] Preprocessing
|
|
Linux: Load average (15m avg) | Dependent item | system.cpu.load.avg15[node_exporter] Preprocessing
|
|
Linux: Number of CPUs | Dependent item | system.cpu.num[node_exporter] Preprocessing
|
|
Linux: CPU idle time | The time the CPU has spent doing nothing. |
Dependent item | system.cpu.idle[node_exporter] Preprocessing
|
Linux: CPU utilization | The CPU utilization expressed in %. |
Dependent item | system.cpu.util[node_exporter] Preprocessing
|
Linux: CPU system time | The time the CPU has spent running the kernel and its processes. |
Dependent item | system.cpu.system[node_exporter] Preprocessing
|
Linux: CPU user time | The time the CPU has spent running users' processes that are not niced. |
Dependent item | system.cpu.user[node_exporter] Preprocessing
|
Linux: CPU steal time | The amount of "stolen" CPU from this virtual machine by the hypervisor for other tasks, such as running another virtual machine. |
Dependent item | system.cpu.steal[node_exporter] Preprocessing
|
Linux: CPU softirq time | The amount of time the CPU has been servicing software interrupts. |
Dependent item | system.cpu.softirq[node_exporter] Preprocessing
|
Linux: CPU nice time | The time the CPU has spent running users' processes that have been niced. |
Dependent item | system.cpu.nice[node_exporter] Preprocessing
|
Linux: CPU iowait time | The amount of time the CPU has been waiting for I/O to complete. |
Dependent item | system.cpu.iowait[node_exporter] Preprocessing
|
Linux: CPU interrupt time | The amount of time the CPU has been servicing hardware interrupts. |
Dependent item | system.cpu.interrupt[node_exporter] Preprocessing
|
Linux: CPU guest time | Guest time - the time spent on running a virtual CPU for a guest operating system. |
Dependent item | system.cpu.guest[node_exporter] Preprocessing
|
Linux: CPU guest nice time | The time spent on running a niced guest (a virtual CPU for guest operating systems under the control of the Linux kernel). |
Dependent item | system.cpu.guest_nice[node_exporter] Preprocessing
|
Linux: Interrupts per second | Dependent item | system.cpu.intr[node_exporter] Preprocessing
|
|
Linux: Context switches per second | Dependent item | system.cpu.switches[node_exporter] Preprocessing
|
|
Linux: Memory utilization | Memory used percentage is calculated as (total-available)/total*100. |
Calculated | vm.memory.util[node_exporter] |
Linux: Total memory | The total memory expressed in bytes. |
Dependent item | vm.memory.total[node_exporter] Preprocessing
|
Linux: Available memory | The available memory: - in Linux - available = free + buffers + cache; - on other platforms calculation may vary. See also Appendixes in Zabbix Documentation about parameters of the |
Dependent item | vm.memory.available[node_exporter] Preprocessing
|
Linux: Total swap space | The total space of the swap volume/file expressed in bytes. |
Dependent item | system.swap.total[node_exporter] Preprocessing
|
Linux: Free swap space | The free space of the swap volume/file expressed in bytes. |
Dependent item | system.swap.free[node_exporter] Preprocessing
|
Linux: Free swap space in % | The free space of the swap volume/file expressed in %. |
Calculated | system.swap.pfree[node_exporter] |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Linux: node_exporter is not available | Failed to fetch system metrics from node_exporter in time. |
nodata(/Linux by Prom/node_exporter.get,30m)=1 |
Warning | Manual close: Yes |
Linux: System time is out of sync | The host's system time is different from Zabbix server time. |
fuzzytime(/Linux by Prom/system.localtime[node_exporter],{$SYSTEM.FUZZYTIME.MAX})=0 |
Warning | Manual close: Yes |
Linux: System name has changed | The name of the system has changed. Acknowledge to close the problem manually. |
last(/Linux by Prom/system.name[node_exporter],#1)<>last(/Linux by Prom/system.name[node_exporter],#2) and length(last(/Linux by Prom/system.name[node_exporter]))>0 |
Info | Manual close: Yes |
Linux: Configured max number of open filedescriptors is too low | last(/Linux by Prom/kernel.maxfiles[node_exporter])<{$KERNEL.MAXFILES.MIN} |
Info | Depends on:
|
|
Linux: Running out of file descriptors | last(/Linux by Prom/fd.open[node_exporter])/last(/Linux by Prom/kernel.maxfiles[node_exporter])*100>80 |
Warning | ||
Linux: Operating system description has changed | The description of the operating system has changed. Possible reasons are that the system has been updated or replaced. Acknowledge to close the problem manually. |
last(/Linux by Prom/system.sw.os[node_exporter],#1)<>last(/Linux by Prom/system.sw.os[node_exporter],#2) and length(last(/Linux by Prom/system.sw.os[node_exporter]))>0 |
Info | Manual close: Yes Depends on:
|
Linux: {HOST.NAME} has been restarted | The device uptime is less than 10 minutes. |
last(/Linux by Prom/system.uptime[node_exporter])<10m |
Warning | Manual close: Yes |
Linux: Load average is too high | The load average per CPU is too high. The system may be slow to respond. |
min(/Linux by Prom/system.cpu.load.avg1[node_exporter],5m)/last(/Linux by Prom/system.cpu.num[node_exporter])>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/Linux by Prom/system.cpu.load.avg5[node_exporter])>0 and last(/Linux by Prom/system.cpu.load.avg15[node_exporter])>0 |
Average | |
Linux: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/Linux by Prom/system.cpu.util[node_exporter],5m)>{$CPU.UTIL.CRIT} |
Warning | Depends on:
|
Linux: High memory utilization | The system is running out of free memory. |
min(/Linux by Prom/vm.memory.util[node_exporter],5m)>{$MEMORY.UTIL.MAX} |
Average | Depends on:
|
Linux: Lack of available memory | max(/Linux by Prom/vm.memory.available[node_exporter],5m)<{$MEMORY.AVAILABLE.MIN} and last(/Linux by Prom/vm.memory.total[node_exporter])>0 |
Average | ||
Linux: High swap space usage | If there is no swap configured, this trigger is ignored. |
max(/Linux by Prom/system.swap.pfree[node_exporter],5m)<{$SWAP.PFREE.MIN.WARN} and last(/Linux by Prom/system.swap.total[node_exporter])>0 |
Warning | Depends on:
|
LLD rule Network interface discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interface discovery | Discovery of network interfaces. Requires node_exporter v0.18 and up. |
Dependent item | net.if.discovery[node_exporter] Preprocessing
|
Item prototypes for Network interface discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Interface {#IFNAME}({#IFALIAS}): Bits received | Dependent item | net.if.in[node_exporter,"{#IFNAME}"] Preprocessing
|
|
Interface {#IFNAME}({#IFALIAS}): Bits sent | Dependent item | net.if.out[node_exporter,"{#IFNAME}"] Preprocessing
|
|
Interface {#IFNAME}({#IFALIAS}): Outbound packets with errors | Dependent item | net.if.out.errors[node_exporter"{#IFNAME}"] Preprocessing
|
|
Interface {#IFNAME}({#IFALIAS}): Inbound packets with errors | Dependent item | net.if.in.errors[node_exporter,"{#IFNAME}"] Preprocessing
|
|
Interface {#IFNAME}({#IFALIAS}): Inbound packets discarded | Dependent item | net.if.in.discards[node_exporter,"{#IFNAME}"] Preprocessing
|
|
Interface {#IFNAME}({#IFALIAS}): Outbound packets discarded | Dependent item | net.if.out.discards[node_exporter,"{#IFNAME}"] Preprocessing
|
|
Interface {#IFNAME}({#IFALIAS}): Speed | Sets value to 0 if metric is missing in node_exporter output. |
Dependent item | net.if.speed[node_exporter,"{#IFNAME}"] Preprocessing
|
Interface {#IFNAME}({#IFALIAS}): Interface type | node_network_protocol_type protocol_type value of /sys/class/net/. |
Dependent item | net.if.type[node_exporter,"{#IFNAME}"] Preprocessing
|
Interface {#IFNAME}({#IFALIAS}): Operational status | Reference: https://www.kernel.org/doc/Documentation/networking/operstates.txt |
Dependent item | net.if.status[node_exporter,"{#IFNAME}"] Preprocessing
|
Trigger prototypes for Network interface discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Interface {#IFNAME}({#IFALIAS}): High bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/Linux by Prom/net.if.in[node_exporter,"{#IFNAME}"],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"]) or avg(/Linux by Prom/net.if.out[node_exporter,"{#IFNAME}"],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])) and last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])>0 |
Warning | Manual close: Yes Depends on:
|
Interface {#IFNAME}({#IFALIAS}): High error rate | It recovers when it is below 80% of the |
min(/Linux by Prom/net.if.in.errors[node_exporter,"{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} or min(/Linux by Prom/net.if.out.errors[node_exporter"{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |
Warning | Manual close: Yes Depends on:
|
Interface {#IFNAME}({#IFALIAS}): Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])<0 and last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])>0 and ( last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=6 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=7 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=11 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=62 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=69 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=117 ) and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])<>2) |
Info | Manual close: Yes Depends on:
|
Interface {#IFNAME}({#IFALIAS}): Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])<0 and last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])>0 and (last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=6 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=1) and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])<>2) |
Info | Manual close: Yes Depends on:
|
Interface {#IFNAME}({#IFALIAS}): Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])=2 and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"],#1)<>last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"],#2)) |
Average | Manual close: Yes |
LLD rule Mounted filesystem discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Mounted filesystem discovery | Discovery of file systems of different types. |
Dependent item | vfs.fs.discovery[node_exporter] Preprocessing
|
Item prototypes for Mounted filesystem discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
{#FSNAME}: Free space | Dependent item | vfs.fs.free[node_exporter,"{#FSNAME}"] Preprocessing
|
|
{#FSNAME}: Total space | Total space in bytes |
Dependent item | vfs.fs.total[node_exporter,"{#FSNAME}"] Preprocessing
|
{#FSNAME}: Used space | Used storage in bytes |
Calculated | vfs.fs.used[node_exporter,"{#FSNAME}"] |
{#FSNAME}: Space utilization | The space utilization expressed in % for {#FSNAME}. |
Calculated | vfs.fs.pused[node_exporter,"{#FSNAME}"] |
{#FSNAME}: Free inodes in % | Dependent item | vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"] Preprocessing
|
Trigger prototypes for Mounted filesystem discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#FSNAME}: Disk space is critically low | Two conditions should match: |
last(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and ((last(/Linux by Prom/vfs.fs.total[node_exporter,"{#FSNAME}"])-last(/Linux by Prom/vfs.fs.used[node_exporter,"{#FSNAME}"]))<{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} or timeleft(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"],1h,100)<1d) |
Average | Manual close: Yes |
{#FSNAME}: Disk space is low | Two conditions should match: |
last(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"])>{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"} and ((last(/Linux by Prom/vfs.fs.total[node_exporter,"{#FSNAME}"])-last(/Linux by Prom/vfs.fs.used[node_exporter,"{#FSNAME}"]))<{$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"} or timeleft(/Linux by Prom/vfs.fs.pused[node_exporter,"{#FSNAME}"],1h,100)<1d) |
Warning | Manual close: Yes Depends on:
|
{#FSNAME}: Running out of free inodes | It may become impossible to write to a disk if there are no index nodes left. |
min(/Linux by Prom/vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"],5m)<{$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"} |
Average | |
{#FSNAME}: Running out of free inodes | It may become impossible to write to a disk if there are no index nodes left. |
min(/Linux by Prom/vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"],5m)<{$VFS.FS.INODE.PFREE.MIN.WARN:"{#FSNAME}"} |
Warning | Depends on:
|
LLD rule Block devices discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Block devices discovery | Dependent item | vfs.dev.discovery[node_exporter] Preprocessing
|
Item prototypes for Block devices discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
{#DEVNAME}: Disk read rate | r/s. The number (after merges) of read requests completed per second for the device. |
Dependent item | vfs.dev.read.rate[node_exporter,"{#DEVNAME}"] Preprocessing
|
{#DEVNAME}: Disk write rate | w/s. The number (after merges) of write requests completed per second for the device. |
Dependent item | vfs.dev.write.rate[node_exporter,"{#DEVNAME}"] Preprocessing
|
{#DEVNAME}: Disk read time (rate) | Rate of total read time counter. Used in |
Dependent item | vfs.dev.read.time.rate[node_exporter,"{#DEVNAME}"] Preprocessing
|
{#DEVNAME}: Disk write time (rate) | Rate of total write time counter. Used in |
Dependent item | vfs.dev.write.time.rate[node_exporter,"{#DEVNAME}"] Preprocessing
|
{#DEVNAME}: Disk read request avg waiting time (r_await) | This formula contains two Boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception. |
Calculated | vfs.dev.read.await[node_exporter,"{#DEVNAME}"] |
{#DEVNAME}: Disk write request avg waiting time (w_await) | This formula contains two Boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception. |
Calculated | vfs.dev.write.await[node_exporter,"{#DEVNAME}"] |
{#DEVNAME}: Disk average queue size (avgqu-sz) | The current average disk queue; the number of requests outstanding on the disk while the performance data is being collected. |
Dependent item | vfs.dev.queue_size[node_exporter,"{#DEVNAME}"] Preprocessing
|
{#DEVNAME}: Disk utilization | This item is the percentage of elapsed time during which the selected disk drive was busy while servicing read or write requests. |
Dependent item | vfs.dev.util[node_exporter,"{#DEVNAME}"] Preprocessing
|
Trigger prototypes for Block devices discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#DEVNAME}: Disk read/write request responses are too high | This trigger might indicate the disk {#DEVNAME} saturation. |
min(/Linux by Prom/vfs.dev.read.await[node_exporter,"{#DEVNAME}"],15m) > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} or min(/Linux by Prom/vfs.dev.write.await[node_exporter,"{#DEVNAME}"],15m) > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"} |
Warning | Manual close: Yes |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums