# Linux by Prom ## Overview This template collects Linux metrics from node_exporter 0.18 and above. Support for older node_exporter versions is provided as 'best effort'. ### Known Issues - Description: node_exporter v0.16.0 renamed many metrics. CPU utilization for 'guest' and 'guest_nice' metrics are not supported in this template with node_exporter < 0.16. Disk IO metrics are not supported. Other metrics provided as 'best effort'. See https://github.com/prometheus/node_exporter/releases/tag/v0.16.0 for details. - version: below 0.16.0 - Description: metric node_network_info with label 'device' cannot be found, so network discovery is not possible. - version: below 0.18 ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - node_exporter 0.17.0 - node_exporter 0.18.1 ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup Please refer to the node_exporter docs. Use node_exporter v0.18.0 or above. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$CPU.UTIL.CRIT}||`90`| |{$IF.ERRORS.WARN}||`2`| |{$IF.UTIL.MAX}||`90`| |{$SYSTEM.FUZZYTIME.MAX}||`60`| |{$KERNEL.MAXFILES.MIN}||`256`| |{$LOAD_AVG_PER_CPU.MAX.WARN}|
Load per CPU considered sustainable. Tune if needed.
|`1.5`| |{$NODE_EXPORTER_PORT}|TCP Port node_exporter is listening on.
|`9100`| |{$SWAP.PFREE.MIN.WARN}||`50`| |{$VFS.DEV.READ.AWAIT.WARN}|Disk read average response time (in ms) before the trigger would fire.
|`20`| |{$VFS.DEV.WRITE.AWAIT.WARN}|Disk write average response time (in ms) before the trigger would fire.
|`20`| |{$VFS.DEV.DEVNAME.NOT_MATCHES}|This macro is used in block devices discovery. Can be overridden on the host or linked template level.
|`Macro too long. Please see the template.`| |{$VFS.DEV.DEVNAME.MATCHES}|This macro is used in block devices discovery. Can be overridden on the host or linked template level.
|`.+`| |{$VFS.FS.FSNAME.NOT_MATCHES}|This macro is used in filesystems discovery. Can be overridden on the host or linked template level.
|`^(/dev\|/sys\|/run\|/proc\|.+/shm$)`| |{$VFS.FS.FSNAME.MATCHES}|This macro is used in filesystems discovery. Can be overridden on the host or linked template level.
|`.+`| |{$VFS.FS.FSTYPE.MATCHES}|This macro is used in filesystems discovery. Can be overridden on the host or linked template level.
|`Macro too long. Please see the template.`| |{$VFS.FS.FSTYPE.NOT_MATCHES}|This macro is used in filesystems discovery. Can be overridden on the host or linked template level.
|`^\s$`| |{$VFS.FS.FSDEVICE.MATCHES}|This macro is used in filesystems discovery. Can be overridden on the host or linked template level.
|`^.+$`| |{$VFS.FS.FSDEVICE.NOT_MATCHES}|This macro is used in filesystems discovery. Can be overridden on the host or linked template level.
|`^\s$`| |{$MEMORY.UTIL.MAX}||`90`| |{$MEMORY.AVAILABLE.MIN}||`20M`| |{$IFCONTROL}||`1`| |{$NET.IF.IFNAME.MATCHES}||`^.*$`| |{$NET.IF.IFNAME.NOT_MATCHES}|Filter out loopbacks, nulls, docker veth links and docker0 bridge by default.
|`Macro too long. Please see the template.`| |{$NET.IF.IFOPERSTATUS.MATCHES}||`^.*$`| |{$NET.IF.IFOPERSTATUS.NOT_MATCHES}|Ignore notPresent(7).
|`^7$`| |{$NET.IF.IFALIAS.MATCHES}||`^.*$`| |{$NET.IF.IFALIAS.NOT_MATCHES}||`CHANGE_IF_NEEDED`| |{$VFS.FS.FREE.MIN.CRIT}|The critical threshold of the filesystem utilization.
|`5G`| |{$VFS.FS.FREE.MIN.WARN}|The warning threshold of the filesystem utilization.
|`10G`| |{$VFS.FS.INODE.PFREE.MIN.CRIT}||`10`| |{$VFS.FS.INODE.PFREE.MIN.WARN}||`20`| |{$VFS.FS.PUSED.MAX.CRIT}||`90`| |{$VFS.FS.PUSED.MAX.WARN}||`80`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Linux: Get node_exporter metrics||HTTP agent|node_exporter.get| |Linux: Version of node_exporter running||Dependent item|agent.version[node_exporter]**Preprocessing**
Prometheus pattern: `node_exporter_build_info` label `version`
Discard unchanged with heartbeat: `1d`
**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"^node_boot_time(?:_seconds)?$"})`
The local system time of the host.
|Dependent item|system.localtime[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"^node_time(?:_seconds)?$"})`
The host name of the system.
|Dependent item|system.name[node_exporter]**Preprocessing**
Prometheus pattern: `node_uname_info` label `nodename`
Discard unchanged with heartbeat: `1d`
Labeled system information as provided by the uname system call.
|Dependent item|system.descr[node_exporter]**Preprocessing**
Prometheus to JSON: `node_uname_info`
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `1d`
It could be increased by using `sysctl` utility or modifying the file `/etc/sysctl.conf`.
|Dependent item|kernel.maxfiles[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE(node_filefd_maximum)`
Discard unchanged with heartbeat: `1d`
**Preprocessing**
Prometheus pattern: `VALUE(node_filefd_allocated)`
**Preprocessing**
Discard unchanged with heartbeat: `1d`
The architecture of the operating system.
|Dependent item|system.sw.arch[node_exporter]**Preprocessing**
Prometheus pattern: `node_uname_info` label `machine`
Discard unchanged with heartbeat: `1d`
The system uptime expressed in the following format: "N days, hh:mm:ss".
|Dependent item|system.uptime[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"^node_boot_time(?:_seconds)?$"})`
JavaScript: `The text is too long. Please see the template.`
**Preprocessing**
Prometheus pattern: `VALUE(node_load1)`
**Preprocessing**
Prometheus pattern: `VALUE(node_load5)`
**Preprocessing**
Prometheus pattern: `VALUE(node_load15)`
**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
The time the CPU has spent doing nothing.
|Dependent item|system.cpu.idle[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The CPU utilization expressed in %.
|Dependent item|system.cpu.util[node_exporter]**Preprocessing**
JavaScript: `//Calculate utilization
return (100 - value)`
The time the CPU has spent running the kernel and its processes.
|Dependent item|system.cpu.system[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The time the CPU has spent running users' processes that are not niced.
|Dependent item|system.cpu.user[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The amount of "stolen" CPU from this virtual machine by the hypervisor for other tasks, such as running another virtual machine.
|Dependent item|system.cpu.steal[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The amount of time the CPU has been servicing software interrupts.
|Dependent item|system.cpu.softirq[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The time the CPU has spent running users' processes that have been niced.
|Dependent item|system.cpu.nice[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The amount of time the CPU has been waiting for I/O to complete.
|Dependent item|system.cpu.iowait[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The amount of time the CPU has been servicing hardware interrupts.
|Dependent item|system.cpu.interrupt[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
Guest time - the time spent on running a virtual CPU for a guest operating system.
|Dependent item|system.cpu.guest[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
The time spent on running a niced guest (a virtual CPU for guest operating systems under the control of the Linux kernel).
|Dependent item|system.cpu.guest_nice[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
JavaScript: `The text is too long. Please see the template.`
Custom multiplier: `100`
**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"node_intr"})`
**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"node_context_switches"})`
Memory used percentage is calculated as (total-available)/total*100.
|Calculated|vm.memory.util[node_exporter]| |Linux: Total memory|The total memory expressed in bytes.
|Dependent item|vm.memory.total[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"node_memory_MemTotal"})`
The available memory:
- in Linux - available = free + buffers + cache;
- on other platforms calculation may vary.
See also Appendixes in Zabbix Documentation about parameters of the `vm.memory.size` item.
|Dependent item|vm.memory.available[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"node_memory_MemAvailable"})`
The total space of the swap volume/file expressed in bytes.
|Dependent item|system.swap.total[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"node_memory_SwapTotal"})`
The free space of the swap volume/file expressed in bytes.
|Dependent item|system.swap.free[node_exporter]**Preprocessing**
Prometheus pattern: `VALUE({__name__=~"node_memory_SwapFree"})`
The free space of the swap volume/file expressed in %.
|Calculated|system.swap.pfree[node_exporter]| ### Triggers |Name|Description|Expression|Severity|Dependencies and additional info| |----|-----------|----------|--------|--------------------------------| |Linux: node_exporter is not available|Failed to fetch system metrics from node_exporter in time.
|`nodata(/Linux by Prom/node_exporter.get,30m)=1`|Warning|**Manual close**: Yes| |Linux: System time is out of sync|The host's system time is different from Zabbix server time.
|`fuzzytime(/Linux by Prom/system.localtime[node_exporter],{$SYSTEM.FUZZYTIME.MAX})=0`|Warning|**Manual close**: Yes| |Linux: System name has changed|The name of the system has changed. Acknowledge to close the problem manually.
|`last(/Linux by Prom/system.name[node_exporter],#1)<>last(/Linux by Prom/system.name[node_exporter],#2) and length(last(/Linux by Prom/system.name[node_exporter]))>0`|Info|**Manual close**: Yes| |Linux: Configured max number of open filedescriptors is too low||`last(/Linux by Prom/kernel.maxfiles[node_exporter])<{$KERNEL.MAXFILES.MIN}`|Info|**Depends on**:The description of the operating system has changed. Possible reasons are that the system has been updated or replaced. Acknowledge to close the problem manually.
|`last(/Linux by Prom/system.sw.os[node_exporter],#1)<>last(/Linux by Prom/system.sw.os[node_exporter],#2) and length(last(/Linux by Prom/system.sw.os[node_exporter]))>0`|Info|**Manual close**: YesThe device uptime is less than 10 minutes.
|`last(/Linux by Prom/system.uptime[node_exporter])<10m`|Warning|**Manual close**: Yes| |Linux: Load average is too high|The load average per CPU is too high. The system may be slow to respond.
|`min(/Linux by Prom/system.cpu.load.avg1[node_exporter],5m)/last(/Linux by Prom/system.cpu.num[node_exporter])>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/Linux by Prom/system.cpu.load.avg5[node_exporter])>0 and last(/Linux by Prom/system.cpu.load.avg15[node_exporter])>0`|Average|| |Linux: High CPU utilization|The CPU utilization is too high. The system might be slow to respond.
|`min(/Linux by Prom/system.cpu.util[node_exporter],5m)>{$CPU.UTIL.CRIT}`|Warning|**Depends on**:The system is running out of free memory.
|`min(/Linux by Prom/vm.memory.util[node_exporter],5m)>{$MEMORY.UTIL.MAX}`|Average|**Depends on**:If there is no swap configured, this trigger is ignored.
|`max(/Linux by Prom/system.swap.pfree[node_exporter],5m)<{$SWAP.PFREE.MIN.WARN} and last(/Linux by Prom/system.swap.total[node_exporter])>0`|Warning|**Depends on**:Discovery of network interfaces. Requires node_exporter v0.18 and up.
|Dependent item|net.if.discovery[node_exporter]**Preprocessing**
Prometheus to JSON: `{__name__=~"^node_network_info$"}`
**Preprocessing**
Prometheus pattern: `VALUE(node_network_receive_bytes_total{device="{#IFNAME}"})`
Custom multiplier: `8`
**Preprocessing**
Prometheus pattern: `VALUE(node_network_transmit_bytes_total{device="{#IFNAME}"})`
Custom multiplier: `8`
**Preprocessing**
Prometheus pattern: `VALUE(node_network_transmit_errs_total{device="{#IFNAME}"})`
**Preprocessing**
Prometheus pattern: `VALUE(node_network_receive_errs_total{device="{#IFNAME}"})`
**Preprocessing**
Prometheus pattern: `VALUE(node_network_receive_drop_total{device="{#IFNAME}"})`
**Preprocessing**
Prometheus pattern: `VALUE(node_network_transmit_drop_total{device="{#IFNAME}"})`
Sets value to 0 if metric is missing in node_exporter output.
|Dependent item|net.if.speed[node_exporter,"{#IFNAME}"]**Preprocessing**
Prometheus pattern: `VALUE(node_network_speed_bytes{device="{#IFNAME}"})`
⛔️Custom on fail: Set value to: `0`
Custom multiplier: `8`
node_network_protocol_type protocol_type value of /sys/class/net/
**Preprocessing**
Prometheus pattern: `VALUE(node_network_protocol_type{device="{#IFNAME}"})`
Reference: https://www.kernel.org/doc/Documentation/networking/operstates.txt
|Dependent item|net.if.status[node_exporter,"{#IFNAME}"]**Preprocessing**
Prometheus pattern: `node_network_info{device="{#IFNAME}"}` label `operstate`
JavaScript: `The text is too long. Please see the template.`
The utilization of the network interface is close to its estimated maximum bandwidth.
|`(avg(/Linux by Prom/net.if.in[node_exporter,"{#IFNAME}"],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"]) or avg(/Linux by Prom/net.if.out[node_exporter,"{#IFNAME}"],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])) and last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])>0`|Warning|**Manual close**: YesIt recovers when it is below 80% of the `{$IF.ERRORS.WARN:"{#IFNAME}"}` threshold.
|`min(/Linux by Prom/net.if.in.errors[node_exporter,"{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} or min(/Linux by Prom/net.if.out.errors[node_exporter"{#IFNAME}"],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"}`|Warning|**Manual close**: YesThis Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.
|`change(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])<0 and last(/Linux by Prom/net.if.speed[node_exporter,"{#IFNAME}"])>0 and ( last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=6 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=7 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=11 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=62 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=69 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=117 ) and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])<>2)`|Info|**Manual close**: YesThis Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually.
|`change(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])<0 and last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])>0 and (last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=6 or last(/Linux by Prom/net.if.type[node_exporter,"{#IFNAME}"])=1) and (last(/Linux by Prom/net.if.status[node_exporter,"{#IFNAME}"])<>2)`|Info|**Manual close**: YesThis trigger expression works as follows:
1. It can be triggered if the operations status is down.
2. `{$IFCONTROL:"{#IFNAME}"}=1` - a user can redefine context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.
3. `{TEMPLATE_NAME:METRIC.diff()}=1` - the trigger fires only if the operational status was up to (1) sometime before (so, do not fire for the 'eternal off' interfaces.)
WARNING: if closed manually - it will not fire again on the next poll, because of .diff.
Discovery of file systems of different types.
|Dependent item|vfs.fs.discovery[node_exporter]**Preprocessing**
Prometheus to JSON: `The text is too long. Please see the template.`
**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
Total space in bytes
|Dependent item|vfs.fs.total[node_exporter,"{#FSNAME}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
Used storage in bytes
|Calculated|vfs.fs.used[node_exporter,"{#FSNAME}"]| |{#FSNAME}: Space utilization|The space utilization expressed in % for {#FSNAME}.
|Calculated|vfs.fs.pused[node_exporter,"{#FSNAME}"]| |{#FSNAME}: Free inodes in %||Dependent item|vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"]**Preprocessing**
Prometheus to JSON: `{__name__=~"node_filesystem_files.*",mountpoint="{#FSNAME}"}`
JavaScript: `The text is too long. Please see the template.`
Two conditions should match:
1. The first condition - utilization of the space should be above `{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}`.
2. The second condition should be one of the following:
- the disk free space is less than `{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"}`;
- the disk will be full in less than 24 hours.
Two conditions should match:
1. The first condition - utilization of the space should be above `{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"}`.
2. The second condition should be one of the following:
- the disk free space is less than `{$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"}`;
- the disk will be full in less than 24 hours.
It may become impossible to write to a disk if there are no index nodes left.
The following error messages may be returned as symptoms, even though the free space is available:
- 'No space left on device';
- 'Disk is full'.
It may become impossible to write to a disk if there are no index nodes left.
The following error messages may be returned as symptoms, even though the free space is available:
- 'No space left on device';
- 'Disk is full'.
**Preprocessing**
Prometheus to JSON: `node_disk_io_now{device=~".+"}`
r/s. The number (after merges) of read requests completed per second for the device.
|Dependent item|vfs.dev.read.rate[node_exporter,"{#DEVNAME}"]**Preprocessing**
Prometheus pattern: `VALUE(node_disk_reads_completed_total{device="{#DEVNAME}"})`
w/s. The number (after merges) of write requests completed per second for the device.
|Dependent item|vfs.dev.write.rate[node_exporter,"{#DEVNAME}"]**Preprocessing**
Prometheus pattern: `VALUE(node_disk_writes_completed_total{device="{#DEVNAME}"})`
Rate of total read time counter. Used in `r_await` calculation.
|Dependent item|vfs.dev.read.time.rate[node_exporter,"{#DEVNAME}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
Rate of total write time counter. Used in `w_await` calculation.
|Dependent item|vfs.dev.write.time.rate[node_exporter,"{#DEVNAME}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
This formula contains two Boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.
|Calculated|vfs.dev.read.await[node_exporter,"{#DEVNAME}"]| |{#DEVNAME}: Disk write request avg waiting time (w_await)|This formula contains two Boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.
|Calculated|vfs.dev.write.await[node_exporter,"{#DEVNAME}"]| |{#DEVNAME}: Disk average queue size (avgqu-sz)|The current average disk queue; the number of requests outstanding on the disk while the performance data is being collected.
|Dependent item|vfs.dev.queue_size[node_exporter,"{#DEVNAME}"]**Preprocessing**
Prometheus pattern: `The text is too long. Please see the template.`
This item is the percentage of elapsed time during which the selected disk drive was busy while servicing read or write requests.
|Dependent item|vfs.dev.util[node_exporter,"{#DEVNAME}"]**Preprocessing**
Prometheus pattern: `VALUE(node_disk_io_time_seconds_total{device="{#DEVNAME}"})`
Custom multiplier: `100`
This trigger might indicate the disk {#DEVNAME} saturation.
|`min(/Linux by Prom/vfs.dev.read.await[node_exporter,"{#DEVNAME}"],15m) > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} or min(/Linux by Prom/vfs.dev.write.await[node_exporter,"{#DEVNAME}"],15m) > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"}`|Warning|**Manual close**: Yes| ## Feedback Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com) You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)