You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

200 lines
34 KiB

1 year ago
# HashiCorp Consul Node by HTTP
## Overview
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics.
See [documentation](https://www.consul.io/docs/agent/options#telemetry-prometheus_retention_time).
More information about metrics you can find in [official documentation](https://www.consul.io/docs/agent/telemetry).
Template `HashiCorp Consul Node by HTTP` — collects metrics by HTTP agent from /v1/agent/metrics endpoint.
## Requirements
Zabbix version: 7.0 and higher.
## Tested versions
This template has been tested on:
- HashiCorp Consul 1.10.0
## Configuration
> Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section.
## Setup
Internal service metrics are collected from /v1/agent/metrics endpoint.
Do not forget to enable Prometheus format for export metrics. See [documentation](https://www.consul.io/docs/agent/options#telemetry-prometheus_retention_time).
Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.
More information about metrics you can find in [official documentation](https://www.consul.io/docs/agent/telemetry).
This template support [Consul namespaces](https://www.consul.io/docs/enterprise/namespaces). You can set macros {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.
*NOTE.* Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
*NOTE.* You maybe are interested in Envoy Proxy by HTTP [template](../../envoy_proxy_http).
### Macros used
|Name|Description|Default|
|----|-----------|-------|
|{$CONSUL.NODE.API.URL}|<p>Consul instance URL.</p>|`http://localhost:8500`|
|{$CONSUL.TOKEN}|<p>Consul auth token.</p>|`<PUT YOUR AUTH TOKEN>`|
|{$CONSUL.OPEN.FDS.MAX.WARN}|<p>Maximum percentage of used file descriptors.</p>|`90`|
|{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}|<p>Filter of discoverable discovered services on local node.</p>|`.*`|
|{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}|<p>Filter to exclude discovered services on local node.</p>|`CHANGE IF NEEDED`|
|{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}|<p>Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.</p>|`.*`|
|{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}|<p>Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.</p>|`CHANGE IF NEEDED`|
|{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}|<p>Maximum acceptable value of node's health score for WARNING trigger expression.</p>|`2`|
|{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}|<p>Maximum acceptable value of node's health score for AVERAGE trigger expression.</p>|`4`|
### Items
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Consul: Get instance metrics|<p>Get raw metrics from Consul instance /metrics endpoint.</p>|HTTP agent|consul.get_metrics<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Get node info|<p>Get configuration and member information of the local agent.</p>|HTTP agent|consul.get_node_info<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Role|<p>Role of current Consul agent.</p>|Dependent item|consul.role<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.Config.Server`</p></li><li>Boolean to decimal</li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: Version|<p>Version of Consul agent.</p>|Dependent item|consul.version<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.Config.Version`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: Number of services|<p>Number of services on current node.</p>|Dependent item|consul.services_number<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.Stats.agent.services`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: Number of checks|<p>Number of checks on current node.</p>|Dependent item|consul.checks_number<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.Stats.agent.checks`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: Number of check monitors|<p>Number of check monitors on current node.</p>|Dependent item|consul.check_monitors_number<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.Stats.agent.check_monitors`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: Process CPU seconds, total|<p>Total user and system CPU time spent in seconds.</p>|Dependent item|consul.cpu_seconds_total.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_cpu_seconds_total)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Virtual memory size|<p>Virtual memory size in bytes.</p>|Dependent item|consul.virtual_memory_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_virtual_memory_bytes)`</p></li></ul>|
|Consul: RSS memory usage|<p>Resident memory size in bytes.</p>|Dependent item|consul.resident_memory_bytes<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_resident_memory_bytes)`</p></li></ul>|
|Consul: Goroutine count|<p>The number of Goroutines on Consul instance.</p>|Dependent item|consul.goroutines<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(go_goroutines)`</p></li></ul>|
|Consul: Open file descriptors|<p>Number of open file descriptors.</p>|Dependent item|consul.process_open_fds<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_open_fds)`</p></li></ul>|
|Consul: Open file descriptors, max|<p>Maximum number of open file descriptors.</p>|Dependent item|consul.process_max_fds<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(process_max_fds)`</p></li></ul>|
|Consul: Client RPC, per second|<p>Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server.</p><p>This gives a measure of how much a given agent is loading the Consul servers.</p><p>This is only generated by agents in client mode, not Consul servers.</p>|Dependent item|consul.client_rpc<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_client_rpc)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Client RPC failed ,per second|<p>Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.</p>|Dependent item|consul.client_rpc_failed<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_client_rpc_failed)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: TCP connections, accepted per second|<p>This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.</p>|Dependent item|consul.memberlist.tcp_accept<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: TCP connections, per second|<p>This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.</p>|Dependent item|consul.memberlist.tcp_connect<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: TCP send bytes, per second|<p>This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.</p>|Dependent item|consul.memberlist.tcp_sent<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: UDP received bytes, per second|<p>This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.</p>|Dependent item|consul.memberlist.udp_received<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_udp_received)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: UDP sent bytes, per second|<p>This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.</p>|Dependent item|consul.memberlist.udp_sent<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_udp_sent)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: GC pause, p90|<p>The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.</p>|Dependent item|consul.gc_pause.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Custom multiplier: `1.0E-9`</p></li></ul>|
|Consul: GC pause, p50|<p>The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.</p>|Dependent item|consul.gc_pause.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Custom multiplier: `1.0E-9`</p></li></ul>|
|Consul: Memberlist: degraded|<p>This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate.</p><p>The agent uses its own health metric as an indicator to perform this action.</p><p>If its health score is low, it means that the node is healthy, and vice versa.</p>|Dependent item|consul.memberlist.degraded<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_degraded)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Memberlist: health score|<p>This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol.</p><p>This metric ranges from 0 to 8, where 0 indicates "totally healthy".</p>|Dependent item|consul.memberlist.health_score<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_health_score)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Memberlist: gossip, p90|<p>The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.</p>|Dependent item|consul.memberlist.dispatch_log.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Memberlist: gossip, p50|<p>The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.</p>|Dependent item|consul.memberlist.gossip.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Memberlist: msg alive|<p>This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.</p>|Dependent item|consul.memberlist.msg.alive<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_msg_alive)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Memberlist: msg dead|<p>This metric counts the number of times a Consul agent has marked another agent to be a dead node.</p>|Dependent item|consul.memberlist.msg.dead<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_msg_dead)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Memberlist: msg suspect|<p>The number of times a Consul agent suspects another as failed while probing during gossip protocol.</p>|Dependent item|consul.memberlist.msg.suspect<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Memberlist: probe node, p90|<p>The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.</p>|Dependent item|consul.memberlist.probe_node.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Memberlist: probe node, p50|<p>The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.</p>|Dependent item|consul.memberlist.probe_node.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Memberlist: push pull node, p90|<p>The 90 percentile for the number of Consul agents that have exchanged state with this agent.</p>|Dependent item|consul.memberlist.push_pull_node.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Memberlist: push pull node, p50|<p>The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.</p>|Dependent item|consul.memberlist.push_pull_node.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: KV store: apply, p90|<p>The 90 percentile for the time it takes to complete an update to the KV store.</p>|Dependent item|consul.kvs.apply.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: KV store: apply, p50|<p>The 50 percentile (median) for the time it takes to complete an update to the KV store.</p>|Dependent item|consul.kvs.apply.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: KV store: apply, rate|<p>The number of updates to the KV store per second.</p>|Dependent item|consul.kvs.apply.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_kvs_apply_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Serf member: flap, rate|<p>Increments when an agent is marked dead and then recovers within a short time period.</p><p>This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.</p><p>Shown as events per second.</p>|Dependent item|consul.serf.member.flap.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_member_flap)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Serf member: failed, rate|<p>Increments when an agent is marked dead.</p><p>This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.</p><p>Shown as events per second.</p>|Dependent item|consul.serf.member.failed.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_member_failed)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Serf member: join, rate|<p>Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins.</p><p>Shown as events per second.</p>|Dependent item|consul.serf.member.join.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_member_join)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Serf member: left, rate|<p>Increments when an agent leaves the cluster. Shown as events per second.</p>|Dependent item|consul.serf.member.left.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_member_left)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Serf member: update, rate|<p>Increments when a Consul agent updates. Shown as events per second.</p>|Dependent item|consul.serf.member.update.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_member_update)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: ACL: resolves, rate|<p>The number of ACL resolves per second.</p>|Dependent item|consul.acl.resolves.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Catalog: register, rate|<p>The number of catalog register operation per second.</p>|Dependent item|consul.catalog.register.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_catalog_register_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Catalog: deregister, rate|<p>The number of catalog deregister operation per second.</p>|Dependent item|consul.catalog.deregister.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_catalog_deregister_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Snapshot: append line, p90|<p>The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.</p>|Dependent item|consul.snapshot.append_line.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Snapshot: append line, p50|<p>The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.</p>|Dependent item|consul.snapshot.append_line.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Snapshot: append line, rate|<p>The number of snapshot appendLine operations per second.</p>|Dependent item|consul.snapshot.append_line.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Snapshot: compact, p90|<p>The 90 percentile for the time taken by the Consul agent to compact a log.</p><p>This operation occurs only when the snapshot becomes large enough to justify the compaction.</p>|Dependent item|consul.snapshot.compact.p90<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Snapshot: compact, p50|<p>The 50 percentile (median) for the time taken by the Consul agent to compact a log.</p><p>This operation occurs only when the snapshot becomes large enough to justify the compaction.</p>|Dependent item|consul.snapshot.compact.p50<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Snapshot: compact, rate|<p>The number of snapshot compact operations per second.</p>|Dependent item|consul.snapshot.compact.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Get local services|<p>Get all the services that are registered with the local agent and their status.</p>|Script|consul.get_local_services|
|Consul: Get local services check|<p>Data collection check.</p>|Dependent item|consul.get_local_services.check<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.error`</p><p>Custom on fail: Set value to</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
### Triggers
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|Consul: Version has been changed|<p>Consul version has changed. Acknowledge to close the problem manually.</p>|`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`|Info|**Manual close**: Yes|
|Consul: Current number of open files is too high|<p>"Heavy file descriptor usage (i.e., near the processs file descriptor limit) indicates a potential file descriptor exhaustion issue."</p>|`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`|Warning||
|Consul: Node's health score is warning|<p>This metric ranges from 0 to 8, where 0 indicates "totally healthy".<br>This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals.<br>For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf</p>|`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`|Warning|**Depends on**:<br><ul><li>Consul: Node's health score is critical</li></ul>|
|Consul: Node's health score is critical|<p>This metric ranges from 0 to 8, where 0 indicates "totally healthy".<br>This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals.<br>For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf</p>|`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`|Average||
|Consul: Failed to get local services|<p>Failed to get local services. Check debug log for more information.</p>|`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`|Warning||
### LLD rule Local node services discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Local node services discovery|<p>Discover metrics for services that are registered with the local agent.</p>|Dependent item|consul.node_services_lld<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
### Item prototypes for Local node services discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Consul: ["{#SERVICE_NAME}"]: Aggregated status|<p>Aggregated values of all health checks for the service instance.</p>|Dependent item|consul.service.aggregated_state["{#SERVICE_ID}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$[?(@.id == "{#SERVICE_ID}")].status.first()`</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status|<p>Current state of health check for the service.</p>|Dependent item|consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output|<p>Current output of health check for the service.</p>|Dependent item|consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
### Trigger prototypes for Local node services discovery
|Name|Description|Expression|Severity|Dependencies and additional info|
|----|-----------|----------|--------|--------------------------------|
|Consul: Aggregated status is 'warning'|<p>Aggregated state of service on the local agent is 'warning'.</p>|`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`|Warning||
|Consul: Aggregated status is 'critical'|<p>Aggregated state of service on the local agent is 'critical'.</p>|`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`|Average||
### LLD rule HTTP API methods discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|HTTP API methods discovery|<p>Discovery HTTP API methods specific metrics.</p>|Dependent item|consul.http_api_discovery<p>**Preprocessing**</p><ul><li><p>Prometheus to JSON: `consul_api_http{method =~ ".*"}`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
### Item prototypes for HTTP API methods discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Consul: HTTP request: ["{#HTTP_METHOD}"], p90|<p>The 90 percentile of how long it takes to service the given HTTP request for the given verb.</p>|Dependent item|consul.http.api.p90["{#HTTP_METHOD}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: HTTP request: ["{#HTTP_METHOD}"], p50|<p>The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.</p>|Dependent item|consul.http.api.p50["{#HTTP_METHOD}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `The text is too long. Please see the template.`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: HTTP request: ["{#HTTP_METHOD}"], rate|<p>The number of HTTP request for the given verb per second.</p>|Dependent item|consul.http.api.rate["{#HTTP_METHOD}"]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
### LLD rule Raft server metrics discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Raft server metrics discovery|<p>Discover raft metrics for server nodes.</p>|Dependent item|consul.raft.server.discovery<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
### Item prototypes for Raft server metrics discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Consul: Raft state|<p>Current state of Consul agent.</p>|Dependent item|consul.raft.state[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.Stats.raft.state`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|Consul: Raft state: leader|<p>Increments when a server becomes a leader.</p>|Dependent item|consul.raft.state_leader[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_state_leader)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Raft state: candidate|<p>The number of initiated leader elections.</p>|Dependent item|consul.raft.state_candidate[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_state_candidate)`</p><p>Custom on fail: Discard value</p></li></ul>|
|Consul: Raft: apply, rate|<p>Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation).</p><p>This metric describes the arrival rate of new logs into Raft per second.</p>|Dependent item|consul.raft.apply.rate[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_apply)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
### LLD rule Raft leader metrics discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Raft leader metrics discovery|<p>Discover raft metrics for leader nodes.</p>|Dependent item|consul.raft.leader.discovery<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
### Item prototypes for Raft leader metrics discovery
|Name|Description|Type|Key and additional info|
|----|-----------|----|-----------------------|
|Consul: Raft state: leader last contact, p90|<p>The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.</p>|Dependent item|consul.raft.leader_last_contact.p90[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Raft state: leader last contact, p50|<p>The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.</p>|Dependent item|consul.raft.leader_last_contact.p50[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Raft state: commit time, p90|<p>The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.</p>|Dependent item|consul.raft.commit_time.p90[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Raft state: commit time, p50|<p>The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.</p>|Dependent item|consul.raft.commit_time.p50[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Raft state: dispatch log, p90|<p>The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.</p>|Dependent item|consul.raft.dispatch_log.p90[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Raft state: dispatch log, p50|<p>The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.</p>|Dependent item|consul.raft.dispatch_log.p50[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})`</p><p>Custom on fail: Discard value</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|Consul: Raft state: dispatch log, rate|<p>The number of times a Raft leader writes a log to disk per second.</p>|Dependent item|consul.raft.dispatch_log.rate[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Raft state: commit, rate|<p>The number of commits a new entry to the Raft log on the leader per second.</p>|Dependent item|consul.raft.commit_time.rate[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_raft_commitTime_count)`</p><p>Custom on fail: Discard value</p></li><li>Change per second</li></ul>|
|Consul: Autopilot healthy|<p>Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.</p>|Dependent item|consul.autopilot.healthy[{#SINGLETON}]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(consul_autopilot_healthy)`</p><p>Custom on fail: Discard value</p></li></ul>|
## Feedback
Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com)
You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)