You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

506 lines
84 KiB

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

# HashiCorp Nomad by HTTP
## Overview
This template is designed to monitor HashiCorp Nomad by Zabbix.
It works without any external scripts.
Currently the template supports Nomad servers and clients discovery.
## Requirements
Zabbix version: 7.0 and higher.
## Tested versions
This template has been tested on:
- HashiCorp Nomad version 1.5.6/1.6.0
## Configuration
> Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section.
## Setup
1. Create a synthetic Nomad host. It should be one of the Nomad cluster members, load-balancing service (if cluster is used) or a single node in a selected Nomad region.
2. Define the `{$NOMAD.ENDPOINT.API.URL}` macro value with correct web protocol, host and port.
3. Prepare an ACL token with `node:read`, `namespace:read-job`, `agent:read` and `management` permissions applied. Define the `{$NOMAD.TOKEN}` macro value.
> Refer to the vendor documentation about [`Nomad native ACL`](https://developer.hashicorp.com/nomad/tutorials/access-control/access-control-policies) or [`Nomad Vault-generated tokens`](https://developer.hashicorp.com/nomad/tutorials/access-control/vault-nomad-secrets) if you have the HashiCorp Vault integration configured.
**Additional information**:
* Synthetic Nomad host will be used just as an endpoint for servers and clients discovery (general cluster information), it will not be monitored as a Nomad server or client, so that to prevent duplicate entities.
* If you're not using ACL - skip 3rd setup step.
* The Nomad servers/clients discovery is limited by region. If you're using multi-region cluster- create one synthetic host per region.
* The Nomad server/client templates are ready for separate usage. Feel free to use if you prefer manual host creation.
**Useful links**
* [HashiCorp Nomad multi-region federation](https://developer.hashicorp.com/nomad/tutorials/manage-clusters/federation)
* [HashiCorp Nomad agent API reference](https://developer.hashicorp.com/nomad/api-docs/agent)
* [HashiCorp Nomad raft operator API reference](https://developer.hashicorp.com/nomad/api-docs/operator/raft)
* [HashiCorp Nomad nodes API reference](https://developer.hashicorp.com/nomad/api-docs/nodes)
### Macros used
|Name|Description|Default|
|----|-----------|-------|
|{$NOMAD.ENDPOINT.API.URL}|<p>API endpoint URL for one of the Nomad cluster members.</p>|`http://localhost:4646`|
|{$NOMAD.TOKEN}|<p>Nomad authentication token.</p>|`<PUT YOUR AUTH TOKEN>`|
|{$NOMAD.DATA.TIMEOUT}|<p>Response timeout for an API.</p>|`15s`|
|{$NOMAD.HTTP.PROXY}|<p>Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used.</p>||
|{$NOMAD.API.RESPONSE.SUCCESS}|<p>HTTP API successful response code. Availability triggers threshold. Change, if needed.</p>|`200`|
|{$NOMAD.SERVER.NAME.MATCHES}|<p>The filter to include HashiCorp Nomad servers by name.</p>|`.*`|
|{$NOMAD.SERVER.NAME.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad servers by name.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.SERVER.DC.MATCHES}|<p>The filter to include HashiCorp Nomad servers by datacenter belonging.</p>|`.*`|
|{$NOMAD.SERVER.DC.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad servers by datacenter belonging.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.CLIENT.NAME.MATCHES}|<p>The filter to include HashiCorp Nomad clients by name.</p>|`.*`|
|{$NOMAD.CLIENT.NAME.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad clients by name.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.CLIENT.DC.MATCHES}|<p>The filter to include HashiCorp Nomad clients by datacenter belonging.</p>|`.*`|
|{$NOMAD.CLIENT.DC.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad clients by datacenter belonging.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.MATCHES}|<p>The filter to include HashiCorp Nomad clients by scheduling eligibility.</p>|`.*`|
|{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad clients by scheduling eligibility.</p>|`CHANGE_IF_NEEDED`|
### Items
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HashiCorp Nomad: Nomad clients get|<p>Nomad clients data in raw format.</p>|HTTP agent|nomad.client.nodes.get<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`</p></li></ul>|
|HashiCorp Nomad: Client nodes API response|<p>Client nodes API response message.</p>|Dependent item|nomad.client.nodes.api.response<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad: Nomad servers get|<p>Nomad servers data in raw format.</p>|Script|nomad.server.nodes.get|
|HashiCorp Nomad: Server-related APIs response|<p>Server-related (`operator/raft/configuration`, `agent/members`) APIs error response message.</p>|Dependent item|nomad.server.api.response<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.error`</p><p>Custom on fail: Set value to: `HTTP/1.1 200 OK`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad: Region|<p>Current cluster region.</p>|Dependent item|nomad.region<p>**Preprocessing**</p><ul><li><p>JSON Path: `$..region.first()`</p></li></ul>|
|HashiCorp Nomad: Nomad servers count|<p>Nomad servers count.</p>|Dependent item|nomad.servers.count<p>**Preprocessing**</p><ul><li><p>JSON Path: `$[?(@.Name)].length()`</p></li></ul>|
|HashiCorp Nomad: Nomad clients count|<p>Nomad clients count.</p>|Dependent item|nomad.clients.count<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body[?(@.Name)].length()`</p></li></ul>|
### Triggers
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|HashiCorp Nomad: Client nodes API connection has failed|<p>Client nodes API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad by HTTP/nomad.client.nodes.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes|
|HashiCorp Nomad: Server-related API connection has failed|<p>Server-related API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad by HTTP/nomad.server.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes|
### LLD rule Clients discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Clients discovery|<p>Client nodes discovery.</p>|Dependent item|nomad.clients.discovery<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body`</p><p>Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
### LLD rule Servers discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Servers discovery|<p>Server nodes discovery.</p>|Dependent item|nomad.servers.discovery<p>**Preprocessing**</p><ul><li><p>Check for error in JSON: `$.error`</p><p>Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
# HashiCorp Nomad Client by HTTP
## Overview
This template is designed to monitor HashiCorp Nomad clients by Zabbix.
It works without any external scripts.
## Requirements
Zabbix version: 7.0 and higher.
## Tested versions
This template has been tested on:
- HashiCorp Nomad version 1.5.6/1.6.0
## Configuration
> Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section.
## Setup
1. Enable telemetry in HashiCorp Nomad agent configuration file. Set the Prometheus metrics format.
>Refer to the [`vendor documentation`](https://developer.hashicorp.com/nomad/docs/configuration/telemetry).
2. Prepare an ACL token with `node:read`, `namespace:read-job` permissions applied. Define the `{$NOMAD.TOKEN}` macro value.
> Refer to the vendor documentation about [`Nomad native ACL`](https://developer.hashicorp.com/nomad/tutorials/access-control/access-control-policies) or [`Nomad Vault-generated tokens`](https://developer.hashicorp.com/nomad/tutorials/access-control/vault-nomad-secrets) if you're using integration with HashiCorp Vault.
3. Set the values for the `{$NOMAD.CLIENT.API.SCHEME}` and `{$NOMAD.CLIENT.API.PORT}` macros to define the common Nomad API web schema and connection port.
**Additional information**:
* You have to prepare an additional ACL token only if you wish to monitor Nomad clients as separate entities. If you're using clients discovery - token will be inherited from the master host linked to the HashiCorp Nomad by HTTP template.
* If you're not using ACL - skip 2nd setup step.
* The Nomad clients use the default web schema - `HTTP` and default API port - `4646`. If you're using clients discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.CLIENT.API.SCHEME:`NECESSARY.IP`}} or/and {{$NOMAD.CLIENT.API.PORT:`NECESSARY.IP`}} on master host or template level.
* Some metrics may not be collected depending on your HashiCorp Nomad agent version and configuration.
**Useful links**:
* [HashiCorp Nomad metrics list](https://developer.hashicorp.com/nomad/docs/operations/metrics-reference)
* [HashiCorp Nomad telemetry configuration reference](https://developer.hashicorp.com/nomad/docs/configuration/telemetry)
* [HashiCorp Nomad metrics API reference](https://developer.hashicorp.com/nomad/api-docs/metrics)
* [HashiCorp Nomad nodes API reference](https://developer.hashicorp.com/nomad/api-docs/nodes)
* [HashiCorp Nomad allocations API reference](https://developer.hashicorp.com/nomad/api-docs/allocations)
* [Zabbix user macros with context](https://www.zabbix.com/documentation/7.0/manual/config/macros/user_macros_context)
### Macros used
|Name|Description|Default|
|----|-----------|-------|
|{$NOMAD.CLIENT.API.SCHEME}|<p>Nomad client API scheme.</p>|`http`|
|{$NOMAD.CLIENT.API.PORT}|<p>Nomad client API port.</p>|`4646`|
|{$NOMAD.TOKEN}|<p>Nomad authentication token.</p>|`<PUT YOUR AUTH TOKEN>`|
|{$NOMAD.DATA.TIMEOUT}|<p>Response timeout for an API.</p>|`15s`|
|{$NOMAD.HTTP.PROXY}|<p>Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used.</p>||
|{$NOMAD.API.RESPONSE.SUCCESS}|<p>HTTP API successful response code. Availability triggers threshold. Change, if needed.</p>|`200`|
|{$NOMAD.CLIENT.RPC.PORT}|<p>Nomad RPC service port.</p>|`4647`|
|{$NOMAD.CLIENT.SERF.PORT}|<p>Nomad serf service port.</p>|`4648`|
|{$NOMAD.CLIENT.OPEN.FDS.MAX.WARN}|<p>Maximum percentage of used file descriptors.</p>|`90`|
|{$NOMAD.DISK.NAME.MATCHES}|<p>The filter to include HashiCorp Nomad client disks by name.</p>|`.*`|
|{$NOMAD.DISK.NAME.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client disks by name.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.JOB.NAME.MATCHES}|<p>The filter to include HashiCorp Nomad client jobs by name.</p>|`.*`|
|{$NOMAD.JOB.NAME.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client jobs by name.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.JOB.NAMESPACE.MATCHES}|<p>The filter to include HashiCorp Nomad client jobs by namespace.</p>|`.*`|
|{$NOMAD.JOB.NAMESPACE.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client jobs by namespace.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.JOB.TYPE.MATCHES}|<p>The filter to include HashiCorp Nomad client jobs by type.</p>|`.*`|
|{$NOMAD.JOB.TYPE.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client jobs by type.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.JOB.TASK.GROUP.MATCHES}|<p>The filter to include HashiCorp Nomad client jobs by task group belonging.</p>|`.*`|
|{$NOMAD.JOB.TASK.GROUP.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client jobs by task group belonging.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.DRIVER.NAME.MATCHES}|<p>The filter to include HashiCorp Nomad client drivers by name.</p>|`.*`|
|{$NOMAD.DRIVER.NAME.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client drivers by name.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.DRIVER.DETECT.MATCHES}|<p>The filter to include HashiCorp Nomad client drivers by detection state. Possible filtering values: `true`, `false`.</p>|`.*`|
|{$NOMAD.DRIVER.DETECT.NOT_MATCHES}|<p>The filter to exclude HashiCorp Nomad client drivers by detection state. Possible filtering values: `true`, `false`.</p>|`CHANGE_IF_NEEDED`|
|{$NOMAD.CPU.UTIL.MIN}|<p>CPU utilization threshold. Measured as a percentage.</p>|`90`|
|{$NOMAD.RAM.AVAIL.MIN}|<p>CPU utilization threshold. Measured as a percentage.</p>|`5`|
|{$NOMAD.INODES.FREE.MIN.WARN}|<p>Warning threshold of the filesystem metadata utilization. Measured as a percentage.</p>|`20`|
|{$NOMAD.INODES.FREE.MIN.CRIT}|<p>Critical threshold of the filesystem metadata utilization. Measured as a percentage.</p>|`10`|
### Items
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HashiCorp Nomad Client: Telemetry get|<p>Telemetry data in raw format.</p>|HTTP agent|nomad.client.data.get<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`</p></li></ul>|
|HashiCorp Nomad Client: Metrics|<p>Nomad client metrics in raw format.</p>|Dependent item|nomad.client.metrics.get<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: Monitoring API response|<p>Monitoring API response message.</p>|Dependent item|nomad.client.data.api.response<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Service [rpc] state|<p>Current [rpc] service state.</p>|Simple check|net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}]<p>**Preprocessing**</p><ul><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Service [serf] state|<p>Current [serf] service state.</p>|Simple check|net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}]<p>**Preprocessing**</p><ul><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: CPU allocated|<p>Total amount of CPU shares the scheduler has allocated to tasks.</p>|Dependent item|nomad.client.allocated.cpu<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocated_cpu)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: CPU unallocated|<p>Total amount of CPU shares free for the scheduler to allocate to tasks.</p>|Dependent item|nomad.client.unallocated.cpu<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_unallocated_cpu)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: Memory allocated|<p>Total amount of memory the scheduler has allocated to tasks.</p>|Dependent item|nomad.client.allocated.memory<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocated_memory)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1.0E+6`</p></li></ul>|
|HashiCorp Nomad Client: Memory unallocated|<p>Total amount of memory free for the scheduler to allocate to tasks.</p>|Dependent item|nomad.client.unallocated.memory<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_unallocated_memory)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1.0E+6`</p></li></ul>|
|HashiCorp Nomad Client: Disk allocated|<p>Total amount of disk space the scheduler has allocated to tasks.</p>|Dependent item|nomad.client.allocated.disk<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocated_disk)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1.0E+6`</p></li></ul>|
|HashiCorp Nomad Client: Disk unallocated|<p>Total amount of disk space free for the scheduler to allocate to tasks.</p>|Dependent item|nomad.client.unallocated.disk<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_unallocated_disk)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1.0E+6`</p></li></ul>|
|HashiCorp Nomad Client: Allocations blocked|<p>Number of allocations waiting for previous versions.</p>|Dependent item|nomad.client.allocations.blocked<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocations_blocked)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Client: Allocations migrating|<p>Number of allocations migrating data from previous versions.</p>|Dependent item|nomad.client.allocations.migrating<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocations_migrating)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Client: Allocations pending|<p>Number of allocations pending (received by the client but not yet running).</p>|Dependent item|nomad.client.allocations.pending<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocations_pending)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Client: Allocations starting|<p>Number of allocations starting.</p>|Dependent item|nomad.client.allocations.start<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocations_start)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Client: Allocations running|<p>Number of allocations running.</p>|Dependent item|nomad.client.allocations.running<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocations_running)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Client: Allocations terminal|<p>Number of allocations terminal.</p>|Dependent item|nomad.client.allocations.terminal<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocations_terminal)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Client: Allocations failed, rate|<p>Number of allocations failed.</p>|Dependent item|nomad.client.allocations.failed<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_client_allocs_failed)`</p><p>Custom on fail: Set value to: `0`</p></li><li>Change per second</li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Allocations completed, rate|<p>Number of allocations completed.</p>|Dependent item|nomad.client.allocations.complete<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_client_allocs_complete)`</p><p>Custom on fail: Set value to: `0`</p></li><li>Change per second</li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Allocations restarted, rate|<p>Number of allocations restarted.</p>|Dependent item|nomad.client.allocations.restart<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_client_allocs_restart)`</p><p>Custom on fail: Set value to: `0`</p></li><li>Change per second</li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Allocations OOM killed|<p>Number of allocations OOM killed.</p>|Dependent item|nomad.client.allocations.oom_killed<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_allocs_oom_killed)`</p><p>Custom on fail: Set value to: `0`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: CPU idle utilization|<p>CPU utilization in idle state.</p>|Dependent item|nomad.client.cpu.idle<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `AVG(nomad_client_host_cpu_idle)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: CPU system utilization|<p>CPU utilization in system space.</p>|Dependent item|nomad.client.cpu.system<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `AVG(nomad_client_host_cpu_system)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: CPU total utilization|<p>Total CPU utilization.</p>|Dependent item|nomad.client.cpu.total<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `AVG(nomad_client_host_cpu_total)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: CPU user utilization|<p>CPU utilization in user space.</p>|Dependent item|nomad.client.cpu.user<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `AVG(nomad_client_host_cpu_user)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: Memory available|<p>Total amount of memory available to processes which includes free and cached memory.</p>|Dependent item|nomad.client.memory.available<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_memory_available)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Client: Memory free|<p>Amount of memory which is free.</p>|Dependent item|nomad.client.memory.free<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_memory_free)`</p></li></ul>|
|HashiCorp Nomad Client: Memory size|<p>Total amount of physical memory on the node.</p>|Dependent item|nomad.client.memory.total<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_memory_total)`</p></li></ul>|
|HashiCorp Nomad Client: Memory used|<p>Amount of memory used by processes.</p>|Dependent item|nomad.client.memory.used<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_memory_used)`</p></li></ul>|
|HashiCorp Nomad Client: Uptime|<p>Uptime of the host running the Nomad client.</p>|Dependent item|nomad.client.uptime<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_uptime)`</p></li></ul>|
|HashiCorp Nomad Client: Node info get|<p>Node info data in raw format.</p>|HTTP agent|nomad.client.node.info.get<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`</p></li></ul>|
|HashiCorp Nomad Client: Nomad client version|<p>Nomad client version.</p>|Dependent item|nomad.client.version<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body..Version.first()`</p></li></ul>|
|HashiCorp Nomad Client: Nodes API response|<p>Nodes API response message.</p>|Dependent item|nomad.client.node.info.api.response<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Allocated jobs get|<p>Allocated jobs data in raw format.</p>|HTTP agent|nomad.client.job.allocs.get<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`</p></li></ul>|
|HashiCorp Nomad Client: Allocations API response|<p>Allocations API response message.</p>|Dependent item|nomad.client.job.allocs.api.response<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
### Triggers
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|HashiCorp Nomad Client: Monitoring API connection has failed|<p>Monitoring API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad Client by HTTP/nomad.client.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Client: Service [rpc] is down|<p>Cannot establish the connection to [rpc] service port {$NOMAD.CLIENT.RPC.PORT}.<br>Check the Nomad state and network connectivity between Nomad and Zabbix.</p>|`last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}]) = 0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Client: Service [serf] is down|<p>Cannot establish the connection to [serf] service port {$NOMAD.CLIENT.SERF.PORT}.<br>Check the Nomad state and network connectivity between Nomad and Zabbix.</p>|`last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}]) = 0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Client: OOM killed allocations found|<p>OOM killed allocations found.</p>|`last(/HashiCorp Nomad Client by HTTP/nomad.client.allocations.oom_killed) > 0`|Warning|**Manual close**: Yes|
|HashiCorp Nomad Client: High CPU utilization|<p>CPU utilization is too high. The system might be slow to respond.</p>|`min(/HashiCorp Nomad Client by HTTP/nomad.client.cpu.total, 10m) >= {$NOMAD.CPU.UTIL.MIN}`|Average||
|HashiCorp Nomad Client: High memory utilization|<p>RAM utilization is too high. The system might be slow to respond.</p>|`(min(/HashiCorp Nomad Client by HTTP/nomad.client.memory.available, 10m) / last(/HashiCorp Nomad Client by HTTP/nomad.client.memory.total))*100 <= {$NOMAD.RAM.AVAIL.MIN}`|Average||
|HashiCorp Nomad Client: The host has been restarted|<p>The host uptime is less than 10 minutes.</p>|`last(/HashiCorp Nomad Client by HTTP/nomad.client.uptime) < 10m`|Warning|**Manual close**: Yes|
|HashiCorp Nomad Client: Nomad client version has changed|<p>Nomad client version has changed.</p>|`change(/HashiCorp Nomad Client by HTTP/nomad.client.version)<>0`|Info|**Manual close**: Yes|
|HashiCorp Nomad Client: Nodes API connection has failed|<p>Nodes API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad Client by HTTP/nomad.client.node.info.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes<br>**Depends on**:<br><ul><li>HashiCorp Nomad Client: Monitoring API connection has failed</li></ul>|
|HashiCorp Nomad Client: Allocations API connection has failed|<p>Allocations API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad Client by HTTP/nomad.client.job.allocs.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes<br>**Depends on**:<br><ul><li>HashiCorp Nomad Client: Monitoring API connection has failed</li></ul>|
### LLD rule Drivers discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Drivers discovery|<p>Client drivers discovery.</p>|Dependent item|nomad.client.drivers.discovery<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
### Item prototypes for Drivers discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] state|<p>Driver [{#DRIVER.NAME}] state.</p>|Dependent item|nomad.client.driver.state["{#DRIVER.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body..Drivers.{#DRIVER.NAME}.Healthy.first()`</p></li><li>Boolean to decimal</li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state|<p>Driver [{#DRIVER.NAME}] detection state.</p>|Dependent item|nomad.client.driver.detected["{#DRIVER.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body..Drivers.{#DRIVER.NAME}.Detected.first()`</p></li><li>Boolean to decimal</li></ul>|
### Trigger prototypes for Drivers discovery
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] is in unhealthy state|<p>The [{#DRIVER.NAME}] driver detected, but its state is unhealthy.</p>|`last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.state["{#DRIVER.NAME}"]) = 0 and last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) = 1`|Warning|**Manual close**: Yes|
|HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state has changed|<p>The [{#DRIVER.NAME}] driver detection state has changed.</p>|`change(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) <> 0`|Info|**Manual close**: Yes|
### LLD rule Physical disks discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Physical disks discovery|<p>Physical disks discovery.</p>|Dependent item|nomad.client.disk.discovery<p>**Preprocessing**</p><ul><li><p>Prometheus to JSON: `nomad_client_host_disk_available{disk=~".*"}`</p></li></ul>|
### Item prototypes for Physical disks discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space available|<p>Amount of space which is available on ["{#DEV.NAME}"] disk.</p>|Dependent item|nomad.client.disk.available["{#DEV.NAME}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_disk_available{disk="{#DEV.NAME}"})`</p></li></ul>|
|HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] inodes utilization|<p>Disk space consumed by the inodes on ["{#DEV.NAME}"] disk.</p>|Dependent item|nomad.client.disk.inodes_percent["{#DEV.NAME}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] size|<p>Total size of the ["{#DEV.NAME}"] device.</p>|Dependent item|nomad.client.disk.size["{#DEV.NAME}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_disk_size{disk="{#DEV.NAME}"})`</p></li></ul>|
|HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space utilization|<p>Percentage of disk ["{#DEV.NAME}"] space used.</p>|Dependent item|nomad.client.disk.used_percent["{#DEV.NAME}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space used|<p>Amount of disk ["{#DEV.NAME}"] space which has been used.</p>|Dependent item|nomad.client.disk.used["{#DEV.NAME}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_client_host_disk_used{disk="{#DEV.NAME}"})`</p></li></ul>|
### Trigger prototypes for Physical disks discovery
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device|<p>It may become impossible to write to a disk if there are no index nodes left.<br>The following error messages may be returned as symptoms, even though the free space:<br>- No space left on device;<br>- Disk is full.</p>|`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.WARN:"{#DEV.NAME}"}`|Warning|**Manual close**: Yes<br>**Depends on**:<br><ul><li>HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device</li></ul>|
|HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device|<p>It may become impossible to write to a disk if there are no index nodes left.<br>The following error messages may be returned as symptoms, even though the free space:<br>- No space left on device;<br>- Disk is full.</p>|`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.CRIT:"{#DEV.NAME}"}`|Average|**Manual close**: Yes|
|HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization|<p>High disk [{#DEV.NAME}] utilization.</p>|`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.WARN:"{#DEV.NAME}"}`|Warning|**Manual close**: Yes<br>**Depends on**:<br><ul><li>HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device</li></ul>|
|HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization|<p>High disk [{#DEV.NAME}] utilization.</p>|`min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.CRIT:"{#DEV.NAME}"}`|Average|**Manual close**: Yes|
### LLD rule Allocated jobs discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Allocated jobs discovery|<p>Allocated jobs discovery.</p>|Dependent item|nomad.client.alloc.discovery<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
### Item prototypes for Allocated jobs discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU allocated|<p>Total CPU resources allocated by the ["{#JOB.NAME}"] job across all cores.</p>|Dependent item|nomad.client.allocs.cpu.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU system utilization|<p>Total CPU resources consumed by the ["{#JOB.NAME}"] job in system space.</p>|Dependent item|nomad.client.allocs.cpu.system["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU user utilization|<p>Total CPU resources consumed by the ["{#JOB.NAME}"] job in user space.</p>|Dependent item|nomad.client.allocs.cpu.user["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU total utilization|<p>Total CPU resources consumed by the ["{#JOB.NAME}"] job across all cores.</p>|Dependent item|nomad.client.allocs.cpu.total_percent["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU throttled periods time|<p>Total number of CPU periods that the ["{#JOB.NAME}"] job was throttled.</p>|Dependent item|nomad.client.allocs.cpu.throttled_periods["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU throttled time|<p>Total time that the ["{#JOB.NAME}"] job was throttled.</p>|Dependent item|nomad.client.allocs.cpu.throttled_time["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU ticks|<p>CPU ticks consumed by the process for the ["{#JOB.NAME}"] job in the last collection interval.</p>|Dependent item|nomad.client.allocs.cpu.total_ticks["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory allocated|<p>Amount of memory allocated by the ["{#JOB.NAME}"] job.</p>|Dependent item|nomad.client.allocs.memory.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory cached|<p>Amount of memory cached by the ["{#JOB.NAME}"] job.</p>|Dependent item|nomad.client.allocs.memory.cache["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory used|<p>Total amount of memory used by the ["{#JOB.NAME}"] job.</p>|Dependent item|nomad.client.allocs.memory.usage["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory swapped|<p>Amount of memory swapped by the ["{#JOB.NAME}"] job.</p>|Dependent item|nomad.client.allocs.memory.swap["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p></li></ul>|
# HashiCorp Nomad Server by HTTP
## Overview
This template is designed to monitor HashiCorp Nomad servers by Zabbix.
It works without any external scripts.
## Requirements
Zabbix version: 7.0 and higher.
## Tested versions
This template has been tested on:
- HashiCorp Nomad version 1.5.6/1.6.0
## Configuration
> Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section.
## Setup
1. Enable telemetry in HashiCorp Nomad agent configuration file. Set the Prometheus metrics format.
>Refer to the [`vendor documentation`](https://developer.hashicorp.com/nomad/docs/configuration/telemetry).
2. Set the values for the `{$NOMAD.SERVER.API.SCHEME}` and `{$NOMAD.SERVER.API.PORT}` macros to define the common Nomad API web schema and connection port.
**Additional information**:
* The Nomad servers use the default web schema - `HTTP` and default API port - `4646`. If you're using servers discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.SERVER.API.SCHEME:`NECESSARY.IP`}} or/and {{$NOMAD.SERVER.API.PORT:`NECESSARY.IP`}} on master host or template level.
* Some metrics may not be collected depending on your HashiCorp Nomad agent version, configuration and cluster role.
* Don't forget to define the `{$NOMAD.REDUNDANCY.MIN}` macro value, based on your cluster nodes amount to configure the failure tolerance triggers correctly.
**Useful links**:
* [HashiCorp Nomad metrics list](https://developer.hashicorp.com/nomad/docs/operations/metrics-reference)
* [HashiCorp Nomad telemetry configuration reference](https://developer.hashicorp.com/nomad/docs/configuration/telemetry)
* [HashiCorp Nomad metrics API reference](https://developer.hashicorp.com/nomad/api-docs/metrics)
* [HashiCorp Nomad agent API reference](https://developer.hashicorp.com/nomad/api-docs/agent#query-self)
* [HashiCorp Nomad cluster failure tolerance reference](https://developer.hashicorp.com/nomad/docs/concepts/consensus#deployment-table)
* [Zabbix user macros with context](https://www.zabbix.com/documentation/7.0/manual/config/macros/user_macros_context)
### Macros used
|Name|Description|Default|
|----|-----------|-------|
|{$NOMAD.SERVER.API.SCHEME}|<p>Nomad SERVER API scheme.</p>|`http`|
|{$NOMAD.SERVER.API.PORT}|<p>Nomad SERVER API port.</p>|`4646`|
|{$NOMAD.TOKEN}|<p>Nomad authentication token.</p>|`<PUT YOUR AUTH TOKEN>`|
|{$NOMAD.DATA.TIMEOUT}|<p>Response timeout for an API.</p>|`15s`|
|{$NOMAD.HTTP.PROXY}|<p>Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used.</p>||
|{$NOMAD.API.RESPONSE.SUCCESS}|<p>HTTP API successful response code. Availability triggers threshold. Change, if needed.</p>|`200`|
|{$NOMAD.SERVER.RPC.PORT}|<p>Nomad RPC service port.</p>|`4647`|
|{$NOMAD.SERVER.SERF.PORT}|<p>Nomad serf service port.</p>|`4648`|
|{$NOMAD.REDUNDANCY.MIN}|<p>Amount of redundant servers to keep the cluster safe.</p><p>Default value - '1' for the 3-nodes cluster.</p><p>Change if needed.</p>|`1`|
|{$NOMAD.OPEN.FDS.MAX}|<p>Maximum percentage of used file descriptors.</p>|`90`|
|{$NOMAD.SERVER.LEADER.LATENCY}|<p>Leader last contact latency threshold.</p>|`0.3s`|
### Items
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HashiCorp Nomad Server: Telemetry get|<p>Telemetry data in raw format.</p>|HTTP agent|nomad.server.data.get<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`</p></li></ul>|
|HashiCorp Nomad Server: Metrics|<p>Nomad server metrics in raw format.</p>|Dependent item|nomad.server.metrics.get<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Monitoring API response|<p>Monitoring API response message.</p>|Dependent item|nomad.server.data.api.response<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Server: Internal stats get|<p>Internal stats data in raw format.</p>|HTTP agent|nomad.server.stats.get<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Set value to: `{"header":{"HTTP/1.1 408 Request timeout":""}}`</p></li></ul>|
|HashiCorp Nomad Server: Internal stats API response|<p>Internal stats API response message.</p>|Dependent item|nomad.server.stats.api.response<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Server: Nomad server version|<p>Nomad server version.</p>|Dependent item|nomad.server.version<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body.config.Version.Version`</p></li></ul>|
|HashiCorp Nomad Server: Nomad raft version|<p>Nomad raft version.</p>|Dependent item|nomad.raft.version<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body.stats.raft.protocol_version`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Raft peers|<p>Current cluster raft peers amount.</p>|Dependent item|nomad.server.raft.peers<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body.stats.raft.num_peers`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Cluster role|<p>Current role in the cluster.</p>|Dependent item|nomad.server.raft.cluster_role<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.body.stats.raft.state`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|HashiCorp Nomad Server: CPU time, rate|<p>Total user and system CPU time spent in seconds.</p>|Dependent item|nomad.server.cpu.time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_cpu_seconds_total)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: Memory used|<p>Memory utilization in bytes.</p>|Dependent item|nomad.server.runtime.alloc_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_alloc_bytes)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Virtual memory size|<p>Virtual memory size in bytes.</p>|Dependent item|nomad.server.virtual_memory_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_virtual_memory_bytes)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Resident memory size|<p>Resident memory size in bytes.</p>|Dependent item|nomad.server.resident_memory_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_resident_memory_bytes)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Heap objects|<p>Number of objects on the heap.</p><p>General memory pressure indicator.</p>|Dependent item|nomad.server.runtime.heap_objects<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_heap_objects)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Open file descriptors|<p>Number of open file descriptors.</p>|Dependent item|nomad.server.process_open_fds<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_open_fds)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Open file descriptors, max|<p>Maximum number of open file descriptors.</p>|Dependent item|nomad.server.process_max_fds<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_max_fds)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Goroutines|<p>Number of goroutines and general load pressure indicator.</p>|Dependent item|nomad.server.runtime.num_goroutines<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_num_goroutines)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations pending|<p>Evaluations that are pending until an existing evaluation for the same job completes.</p>|Dependent item|nomad.server.broker.total_pending<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_total_pending)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations ready|<p>Number of evaluations ready to be processed.</p>|Dependent item|nomad.server.broker.total_ready<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_total_ready)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations unacked|<p>Evaluations dispatched for processing but incomplete.</p>|Dependent item|nomad.server.broker.total_unacked<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_total_unacked)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: CPU shares for blocked evaluations|<p>Amount of CPU shares requested by blocked evals.</p>|Dependent item|nomad.server.blocked_evals.cpu<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_cpu)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Memory shares by blocked evaluations|<p>Amount of memory requested by blocked evals.</p>|Dependent item|nomad.server.blocked_evals.memory<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_memory)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: CPU shares for blocked job evaluations|<p>Amount of CPU shares requested by blocked evals of a job.</p>|Dependent item|nomad.server.blocked_evals.job.cpu<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_job_cpu)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Memory shares for blocked job evaluations|<p>Amount of memory requested by blocked evals of a job.</p>|Dependent item|nomad.server.blocked_evals.job.memory<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_job_memory)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations blocked|<p>Count of evals in the blocked state for any reason (cluster resource exhaustion or quota limits).</p>|Dependent item|nomad.server.blocked_evals.total_blocked<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_blocked)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations escaped|<p>Count of evals that have escaped computed node classes.</p><p>This indicates a scheduler optimization was skipped and is not usually a source of concern.</p>|Dependent item|nomad.server.blocked_evals.total_escaped<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_escaped)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations waiting|<p>Count of evals waiting to be enqueued.</p>|Dependent item|nomad.server.broker.total_waiting<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_total_waiting)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations blocked due to quota limit|<p>Count of blocked evals due to quota limits (the resources for these jobs are not counted in other blocked_evals metrics, except for total_blocked).</p>|Dependent item|nomad.server.blocked_evals.total_quota_limit<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_blocked_evals_total_quota_limit)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Evaluations enqueue time|<p>Average time elapsed with evaluations waiting to be enqueued.</p>|Dependent item|nomad.server.broker.eval_waiting<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `AVG(nomad_nomad_eval_ack_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC evaluation acknowledgement time|<p>Time elapsed for Eval.Ack RPC call.</p>|Dependent item|nomad.server.eval.ack<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_eval_ack_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC job summary time|<p>Time elapsed for Job.Summary RPC call.</p>|Dependent item|nomad.server.job_summary.get_job_summary<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_summary_get_job_summary_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Heartbeats active|<p>Number of active heartbeat timers.</p><p>Each timer represents a Nomad client connection.</p>|Dependent item|nomad.server.heartbeat.active<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_heartbeat_active)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: RPC requests, rate|<p>Number of RPC requests being handled.</p>|Dependent item|nomad.server.rpc.request<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_rpc_request)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: RPC error requests, rate|<p>Number of RPC requests being handled that result in an error.</p>|Dependent item|nomad.server.rpc.request_error<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_rpc_request)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: RPC queries, rate|<p>Number of RPC queries.</p>|Dependent item|nomad.server.rpc.query<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_rpc_query)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: RPC job allocations time|<p>Time elapsed for Job.Allocations RPC call.</p>|Dependent item|nomad.server.job.allocations<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_allocations_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC job evaluations time|<p>Time elapsed for Job.Evaluations RPC call.</p>|Dependent item|nomad.server.job.evaluations<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_evaluations_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC get job time|<p>Time elapsed for Job.GetJob RPC call.</p>|Dependent item|nomad.server.job.get_job<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_get_job_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Plan apply time|<p>Time elapsed to apply a plan.</p>|Dependent item|nomad.server.plan.apply<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_plan_apply_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Plan evaluate time|<p>Time elapsed to evaluate a plan.</p>|Dependent item|nomad.server.plan.evaluate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_plan_evaluate_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC plan submit time|<p>Time elapsed for Plan.Submit RPC call.</p>|Dependent item|nomad.server.plan.submit<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_plan_submit_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Plan raft index processing time|<p>Time elapsed that planner waits for the raft index of the plan to be processed.</p>|Dependent item|nomad.server.plan.wait_for_index<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_plan_wait_for_index_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC list time|<p>Time elapsed for Node.List RPC call.</p>|Dependent item|nomad.server.client.list<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_client_list_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC update allocations time|<p>Time elapsed for Node.UpdateAlloc RPC call.</p>|Dependent item|nomad.server.client.update_alloc<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_client_update_alloc_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC update status time|<p>Time elapsed for Node.UpdateStatus RPC call.</p>|Dependent item|nomad.server.client.update_status<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_client_update_status_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC get client allocs time|<p>Time elapsed for Node.GetClientAllocs RPC call.</p>|Dependent item|nomad.server.client.get_client_allocs<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_client_get_client_allocs_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: RPC eval dequeue time|<p>Time elapsed for Eval.Dequeue RPC call.</p>|Dependent item|nomad.server.client.dequeue<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_eval_dequeue_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Vault token last renewal|<p>Time since last successful Vault token renewal.</p>|Dependent item|nomad.server.vault.token_last_renewal<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_vault_token_last_renewal)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `0.001`</p></li></ul>|
|HashiCorp Nomad Server: Vault token next renewal|<p>Time until next Vault token renewal attempt.</p>|Dependent item|nomad.server.vault.token_next_renewal<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_vault_token_next_renewal)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `0.001`</p></li></ul>|
|HashiCorp Nomad Server: Vault token TTL|<p>Time to live for Vault token.</p>|Dependent item|nomad.server.vault.token_ttl<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_vault_token_ttl)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `0.001`</p></li></ul>|
|HashiCorp Nomad Server: Vault tokens revoked|<p>Count of revoked tokens.</p>|Dependent item|nomad.server.vault.distributed_tokens_revoked<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_vault_distributed_tokens_revoking)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Jobs dead|<p>Number of dead jobs.</p>|Dependent item|nomad.server.job_status.dead<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_status_dead)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Jobs pending|<p>Number of pending jobs.</p>|Dependent item|nomad.server.job_status.pending<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_status_pending)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Jobs running|<p>Number of running jobs.</p>|Dependent item|nomad.server.job_status.running<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_job_status_running)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations completed|<p>Number of complete allocations for a job.</p>|Dependent item|nomad.server.job_summary.complete<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_complete)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations failed|<p>Number of failed allocations for a job.</p>|Dependent item|nomad.server.job_summary.failed<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_failed)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations lost|<p>Number of lost allocations for a job.</p>|Dependent item|nomad.server.job_summary.lost<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_lost)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations unknown|<p>Number of unknown allocations for a job.</p>|Dependent item|nomad.server.job_summary.unknown<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_unknown)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations queued|<p>Number of queued allocations for a job.</p>|Dependent item|nomad.server.job_summary.queued<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_queued)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations running|<p>Number of running allocations for a job.</p>|Dependent item|nomad.server.job_summary.running<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_running)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Job allocations starting|<p>Number of starting allocations for a job.</p>|Dependent item|nomad.server.job_summary.starting<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_nomad_job_summary_starting)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Gossip time|<p>Time elapsed to broadcast gossip messages.</p>|Dependent item|nomad.server.memberlist.gossip<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_memberlist_gossip_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Leader barrier time|<p>Time elapsed to establish a raft barrier during leader transition.</p>|Dependent item|nomad.server.leader.barrier<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_leader_barrier_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Reconcile peer time|<p>Time elapsed to reconcile a serf peer with state store.</p>|Dependent item|nomad.server.leader.reconcile_member<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_leader_reconcileMember_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Total reconcile time|<p>Time elapsed to reconcile all serf peers with state store.</p>|Dependent item|nomad.server.leader.reconcile<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_leader_reconcile_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Leader last contact|<p>Time since last contact to leader.</p><p>General indicator of Raft latency.</p>|Dependent item|nomad.server.raft.leader.lastContact<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_leader_lastContact{quantile="0.99"})`</p><p>Custom on fail: Discard value</p></li><li><p>Replace: `NaN -> 0`</p></li><li><p>Custom multiplier: `0.001`</p></li></ul>|
|HashiCorp Nomad Server: Plan queue|<p>Count of evals in the plan queue.</p>|Dependent item|nomad.server.plan.queue_depth<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_plan_queue_depth)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Worker evaluation create time|<p>Time elapsed for worker to create an eval.</p>|Dependent item|nomad.server.worker.create_eval<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Worker evaluation dequeue time|<p>Time elapsed for worker to dequeue an eval.</p>|Dependent item|nomad.server.worker.dequeue_eval<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Worker invoke scheduler time|<p>Time elapsed for worker to invoke the scheduler.</p>|Dependent item|nomad.server.worker.invoke_scheduler_service<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_invoke_scheduler_service_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Worker acknowledgement send time|<p>Time elapsed for worker to send acknowledgement.</p>|Dependent item|nomad.server.worker.send_ack<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_send_ack_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Worker submit plan time|<p>Time elapsed for worker to submit plan.</p>|Dependent item|nomad.server.worker.submit_plan<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_submit_plan_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Worker update evaluation time|<p>Time elapsed for worker to submit updated eval.</p>|Dependent item|nomad.server.worker.update_eval<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_update_eval_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Worker log replication time|<p>Time elapsed that worker waits for the raft index of the eval to be processed.</p>|Dependent item|nomad.server.worker.wait_for_index<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_wait_for_index_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Raft calls blocked, rate|<p>Count of blocking raft API calls.</p>|Dependent item|nomad.server.raft.barrier<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_barrier)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: Raft commit logs enqueued|<p>Count of logs enqueued.</p>|Dependent item|nomad.server.raft.commit_num_logs<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_commitNumLogs)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Raft transactions, rate|<p>Number of Raft transactions.</p>|Dependent item|nomad.server.raft.apply<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_apply)`</p><p>Custom on fail: Set value to: `0`</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: Raft commit time|<p>Time elapsed to commit writes.</p>|Dependent item|nomad.server.raft.commit_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_worker_dequeue_eval_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Raft transaction commit time|<p>Raft transaction commit time.</p>|Dependent item|nomad.server.raft.replication.appendEntries<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `AVG(nomad_raft_replication_appendEntries_rpc)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `0.001`</p></li></ul>|
|HashiCorp Nomad Server: FSM apply time|<p>Time elapsed to apply write to FSM.</p>|Dependent item|nomad.server.raft.fsm.apply<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_fsm_apply_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM enqueue time|<p>Time elapsed to enqueue write to FSM.</p>|Dependent item|nomad.server.raft.fsm.enqueue<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_fsm_enqueue_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM autopilot time|<p>Time elapsed to apply Autopilot raft entry.</p>|Dependent item|nomad.server.raft.fsm.autopilot<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_fsm_autopilot_sum)`</p><p>Custom on fail: Set value to: `0`</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM register node time|<p>Time elapsed to apply RegisterNode raft entry.</p>|Dependent item|nomad.server.raft.fsm.register_node<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_fsm_register_node_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM index|<p>Current index applied to FSM.</p>|Dependent item|nomad.server.raft.applied_index<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_appliedIndex)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Raft last index|<p>Most recent index seen.</p>|Dependent item|nomad.server.raft.last_index<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_lastIndex)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Dispatch log time|<p>Time elapsed to write log, mark in flight, and start replication.</p>|Dependent item|nomad.server.raft.leader.dispatch_log<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_leader_dispatchLog_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Logs dispatched|<p>Count of logs dispatched.</p>|Dependent item|nomad.server.raft.leader.dispatch_num_logs<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_leader_dispatchNumLogs)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
|HashiCorp Nomad Server: Heartbeat fails|<p>Count of failing to heartbeat and starting election.</p>|Dependent item|nomad.server.raft.transition.heartbeat_timeout<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_transition_heartbeat_timeout)`</p><p>Custom on fail: Set value to: `0`</p></li><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Server: Objects freed, rate|<p>Count of objects freed from heap by go runtime GC.</p>|Dependent item|nomad.server.runtime.free_count<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_free_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: GC pause time|<p>Go runtime GC pause times.</p>|Dependent item|nomad.server.runtime.gc_pause_ns<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_gc_pause_ns_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: GC metadata size|<p>Go runtime GC metadata size in bytes.</p>|Dependent item|nomad.server.runtime.sys_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_sys_bytes)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: GC runs|<p>Count of go runtime GC runs.</p>|Dependent item|nomad.server.runtime.total_gc_runs<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_runtime_total_gc_runs)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Memberlist events|<p>Count of memberlist events received.</p>|Dependent item|nomad.server.serf.queue.event<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_serf_queue_Event_sum)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Memberlist changes|<p>Count of memberlist changes.</p>|Dependent item|nomad.server.serf.queue.intent<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_serf_queue_Intent_sum)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Memberlist queries|<p>Count of memberlist queries.</p>|Dependent item|nomad.server.serf.queue.queries<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_serf_queue_Query_sum)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Snapshot index|<p>Current snapshot index.</p>|Dependent item|nomad.server.state.snapshot.index<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_state_snapshotIndex)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Services ready to schedule|<p>Count of service evals ready to be scheduled.</p>|Dependent item|nomad.server.broker.service_ready<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_service_ready)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Services unacknowledged|<p>Count of unacknowledged service evals.</p>|Dependent item|nomad.server.broker.service_unacked<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_service_unacked)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: System evaluations ready to schedule|<p>Count of service evals ready to be scheduled.</p>|Dependent item|nomad.server.broker.system_ready<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_system_ready)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: System evaluations unacknowledged|<p>Count of unacknowledged system evals.</p>|Dependent item|nomad.server.broker.system_unacked<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_broker_system_unacked)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB free pages|<p>Number of BoltDB free pages.</p>|Dependent item|nomad.server.raft.boltdb.num_free_pages<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_numFreePages)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB pending pages|<p>Number of BoltDB pending pages.</p>|Dependent item|nomad.server.raft.boltdb.num_pending_pages<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_numPendingPages)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB free page bytes|<p>Number of free page bytes.</p>|Dependent item|nomad.server.raft.boltdb.free_page_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_freePageBytes)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB freelist bytes|<p>Number of freelist bytes.</p>|Dependent item|nomad.server.raft.boltdb.freelist_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_freelistBytes)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB read transactions, rate|<p>Count of total read transactions.</p>|Dependent item|nomad.server.raft.boltdb.total_read_txn<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_totalReadTxn)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB open read transactions|<p>Number of current open read transactions.</p>|Dependent item|nomad.server.raft.boltdb.open_read_txn<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_openReadTxn)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB pages in use|<p>Number of pages in use.</p>|Dependent item|nomad.server.raft.boltdb.txstats.page_count<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_pageCount)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: BoltDB page allocations, rate|<p>Number of page allocations.</p>|Dependent item|nomad.server.raft.boltdb.txstats.page_alloc<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_pageAlloc)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB cursors|<p>Count of total database cursors.</p>|Dependent item|nomad.server.raft.boltdb.txstats.cursor_count<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_cursorCount)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB nodes, rate|<p>Count of total database nodes.</p>|Dependent item|nomad.server.raft.boltdb.txstats.node_count<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_nodeCount)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB node dereferences, rate|<p>Count of total database node dereferences.</p>|Dependent item|nomad.server.raft.boltdb.txstats.node_deref<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_nodeDeref)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB rebalance operations, rate|<p>Count of total rebalance operations.</p>|Dependent item|nomad.server.raft.boltdb.txstats.rebalance<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_rebalance)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB split operations, rate|<p>Count of total split operations.</p>|Dependent item|nomad.server.raft.boltdb.txstats.split<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_split)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB spill operations, rate|<p>Count of total spill operations.</p>|Dependent item|nomad.server.raft.boltdb.txstats.spill<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_spill)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB write operations, rate|<p>Count of total write operations.</p>|Dependent item|nomad.server.raft.boltdb.txstats.write<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_write)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|HashiCorp Nomad Server: BoltDB rebalance time|<p>Sample of rebalance operation times.</p>|Dependent item|nomad.server.raft.boltdb.txstats.rebalance_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_rebalanceTime_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: BoltDB spill time|<p>Sample of spill operation times.</p>|Dependent item|nomad.server.raft.boltdb.txstats.spill_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_spillTime_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: BoltDB write time|<p>Sample of write operation times.</p>|Dependent item|nomad.server.raft.boltdb.txstats.write_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_raft_boltdb_txstats_writeTime_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Service [rpc] state|<p>Current [rpc] service state.</p>|Simple check|net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}]<p>**Preprocessing**</p><ul><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Server: Service [serf] state|<p>Current [serf] service state.</p>|Simple check|net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}]<p>**Preprocessing**</p><ul><li><p>Discard unchanged with heartbeat: `1h`</p></li></ul>|
|HashiCorp Nomad Server: Namespace list time|<p>Time elapsed for Namespace.ListNamespaces.</p>|Dependent item|nomad.server.namespace.list_namespace<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_namespace_list_namespace_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Autopilot state|<p>Current autopilot state.</p>|Dependent item|nomad.server.autopilot.state<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_autopilot_healthy)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: Autopilot failure tolerance|<p>The number of redundant healthy servers that can fail without causing an outage.</p>|Dependent item|nomad.server.autopilot.failure_tolerance<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_autopilot_failure_tolerance)`</p><p>Custom on fail: Discard value</p></li></ul>|
|HashiCorp Nomad Server: FSM allocation client update time|<p>Time elapsed to apply AllocClientUpdate raft entry.</p>|Dependent item|nomad.server.alloc_client_update<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_fsm_alloc_client_update_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM apply plan results time|<p>Time elapsed to apply ApplyPlanResults raft entry.</p>|Dependent item|nomad.server.fsm.apply_plan_results<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_fsm_apply_plan_results_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM update evaluation time|<p>Time elapsed to apply UpdateEval raft entry.</p>|Dependent item|nomad.server.fsm.update_eval<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_fsm_update_eval_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: FSM job registration time|<p>Time elapsed to apply RegisterJob raft entry.</p>|Dependent item|nomad.server.fsm.register_job<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(nomad_nomad_fsm_register_job_sum)`</p><p>Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1e-09`</p></li></ul>|
|HashiCorp Nomad Server: Allocation reschedule attempts|<p>Count of attempts to reschedule an allocation.</p>|Dependent item|nomad.server.scheduler.allocs.rescheduled.attempted<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(nomad_scheduler_allocs_reschedule_attempted)`</p><p>Custom on fail: Set value to: `0`</p></li></ul>|
### Triggers
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|HashiCorp Nomad Server: Monitoring API connection has failed|<p>Monitoring API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad Server by HTTP/nomad.server.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Server: Internal stats API connection has failed|<p>Internal stats API connection has failed.<br>Ensure that Nomad API URL and the necessary permissions have been defined correctly, check the service state and network connectivity between Nomad and Zabbix.</p>|`find(/HashiCorp Nomad Server by HTTP/nomad.server.stats.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0`|Average|**Manual close**: Yes<br>**Depends on**:<br><ul><li>HashiCorp Nomad Server: Monitoring API connection has failed</li></ul>|
|HashiCorp Nomad Server: Nomad server version has changed|<p>Nomad server version has changed.</p>|`change(/HashiCorp Nomad Server by HTTP/nomad.server.version)<>0`|Info|**Manual close**: Yes|
|HashiCorp Nomad Server: Cluster role has changed|<p>Cluster role has changed.</p>|`change(/HashiCorp Nomad Server by HTTP/nomad.server.raft.cluster_role) <> 0`|Info|**Manual close**: Yes|
|HashiCorp Nomad Server: Current number of open files is too high|<p>Heavy file descriptor usage (i.e., near the process file descriptor limit) indicates a potential file descriptor exhaustion issue.</p>|`min(/HashiCorp Nomad Server by HTTP/nomad.server.process_open_fds,5m)/last(/HashiCorp Nomad Server by HTTP/nomad.server.process_max_fds)*100>{$NOMAD.OPEN.FDS.MAX}`|Warning||
|HashiCorp Nomad Server: Dead jobs found|<p>Jobs with the `Dead` state discovered.<br>Check the {$NOMAD.SERVER.API.SCHEME}://{HOST.IP}:{$NOMAD.SERVER.API.PORT}/v1/jobs URL for the details.</p>|`last(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead) > 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead,5m) = 0`|Warning|**Manual close**: Yes|
|HashiCorp Nomad Server: Leader last contact timeout exceeded|<p>The nomad.raft.leader.lastContact metric is a general indicator of Raft latency which can be used to observe how Raft timing is performing and guide infrastructure provisioning.<br>If this number trends upwards, look at CPU, disk IOPs, and network latency. nomad.raft.leader.lastContact should not get too close to the leader lease timeout of 500ms.</p>|`min(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) >= {$NOMAD.SERVER.LEADER.LATENCY} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) = 0`|Warning||
|HashiCorp Nomad Server: Service [rpc] is down|<p>Cannot establish the connection to [rpc] service port {$NOMAD.SERVER.RPC.PORT}.<br>Check the Nomad state and network connectivity between Nomad and Zabbix.</p>|`last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}]) = 0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Server: Service [serf] is down|<p>Cannot establish the connection to [serf] service port {$NOMAD.SERVER.SERF.PORT}.<br>Check the Nomad state and network connectivity between Nomad and Zabbix.</p>|`last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}]) = 0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Server: Autopilot is unhealthy|<p>The autopilot is in unhealthy state. The successful failover probability is extremely low.</p>|`last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state) = 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state,5m) = 0`|Average|**Manual close**: Yes|
|HashiCorp Nomad Server: Autopilot redundancy is low|<p>The autopilot redundancy is low.<br>Cluster crash risk is high due to one more server failure.</p>|`last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance) < {$NOMAD.REDUNDANCY.MIN} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance,5m) = 0`|Warning|**Manual close**: Yes|
## Feedback
Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com)
You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)