You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yzl 93958d0fb0
zabbix6.0
1 year ago
..
README.md zabbix6.0 1 year ago
template_consul_node_http.yaml zabbix6.0 1 year ago

README.md

HashiCorp Consul Node by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics. See documentation.
More information about metrics you can find in official documentation.

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • HashiCorp Consul 1.10.0

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values. More information about metrics you can find in official documentation.

This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name Description Default
{$CONSUL.NODE.API.URL}

Consul instance URL.

http://localhost:8500
{$CONSUL.TOKEN}

Consul auth token.

<PUT YOUR AUTH TOKEN>
{$CONSUL.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

90
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}

Filter of discoverable discovered services on local node.

.*
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}

Filter to exclude discovered services on local node.

CHANGE IF NEEDED
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}

Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.

.*
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}

Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.

CHANGE IF NEEDED
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}

Maximum acceptable value of node's health score for WARNING trigger expression.

2
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}

Maximum acceptable value of node's health score for AVERAGE trigger expression.

4

Items

Name Description Type Key and additional info
Consul: Get instance metrics

Get raw metrics from Consul instance /metrics endpoint.

HTTP agent consul.get_metrics

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

Consul: Get node info

Get configuration and member information of the local agent.

HTTP agent consul.get_node_info

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

Consul: Role

Role of current Consul agent.

Dependent item consul.role

Preprocessing

  • JSON Path: $.Config.Server

  • Boolean to decimal
  • Discard unchanged with heartbeat: 3h

Consul: Version

Version of Consul agent.

Dependent item consul.version

Preprocessing

  • JSON Path: $.Config.Version

  • Discard unchanged with heartbeat: 3h

Consul: Number of services

Number of services on current node.

Dependent item consul.services_number

Preprocessing

  • JSON Path: $.Stats.agent.services

  • Discard unchanged with heartbeat: 3h

Consul: Number of checks

Number of checks on current node.

Dependent item consul.checks_number

Preprocessing

  • JSON Path: $.Stats.agent.checks

  • Discard unchanged with heartbeat: 3h

Consul: Number of check monitors

Number of check monitors on current node.

Dependent item consul.check_monitors_number

Preprocessing

  • JSON Path: $.Stats.agent.check_monitors

  • Discard unchanged with heartbeat: 3h

Consul: Process CPU seconds, total

Total user and system CPU time spent in seconds.

Dependent item consul.cpu_seconds_total.rate

Preprocessing

  • Prometheus pattern: VALUE(process_cpu_seconds_total)

    Custom on fail: Discard value

  • Change per second
Consul: Virtual memory size

Virtual memory size in bytes.

Dependent item consul.virtual_memory_bytes

Preprocessing

  • Prometheus pattern: VALUE(process_virtual_memory_bytes)

Consul: RSS memory usage

Resident memory size in bytes.

Dependent item consul.resident_memory_bytes

Preprocessing

  • Prometheus pattern: VALUE(process_resident_memory_bytes)

Consul: Goroutine count

The number of Goroutines on Consul instance.

Dependent item consul.goroutines

Preprocessing

  • Prometheus pattern: VALUE(go_goroutines)

Consul: Open file descriptors

Number of open file descriptors.

Dependent item consul.process_open_fds

Preprocessing

  • Prometheus pattern: VALUE(process_open_fds)

Consul: Open file descriptors, max

Maximum number of open file descriptors.

Dependent item consul.process_max_fds

Preprocessing

  • Prometheus pattern: VALUE(process_max_fds)

Consul: Client RPC, per second

Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server.

This gives a measure of how much a given agent is loading the Consul servers.

This is only generated by agents in client mode, not Consul servers.

Dependent item consul.client_rpc

Preprocessing

  • Prometheus pattern: VALUE(consul_client_rpc)

    Custom on fail: Discard value

  • Change per second
Consul: Client RPC failed ,per second

Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.

Dependent item consul.client_rpc_failed

Preprocessing

  • Prometheus pattern: VALUE(consul_client_rpc_failed)

    Custom on fail: Discard value

  • Change per second
Consul: TCP connections, accepted per second

This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.

Dependent item consul.memberlist.tcp_accept

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_tcp_accept)

    Custom on fail: Discard value

  • Change per second
Consul: TCP connections, per second

This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.

Dependent item consul.memberlist.tcp_connect

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_tcp_connect)

    Custom on fail: Discard value

  • Change per second
Consul: TCP send bytes, per second

This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.

Dependent item consul.memberlist.tcp_sent

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_tcp_sent)

    Custom on fail: Discard value

  • Change per second
Consul: UDP received bytes, per second

This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.

Dependent item consul.memberlist.udp_received

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_udp_received)

    Custom on fail: Discard value

  • Change per second
Consul: UDP sent bytes, per second

This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.

Dependent item consul.memberlist.udp_sent

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_udp_sent)

    Custom on fail: Discard value

  • Change per second
Consul: GC pause, p90

The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.

Dependent item consul.gc_pause.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

  • Custom multiplier: 1.0E-9

Consul: GC pause, p50

The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.

Dependent item consul.gc_pause.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

  • Custom multiplier: 1.0E-9

Consul: Memberlist: degraded

This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate.

The agent uses its own health metric as an indicator to perform this action.

If its health score is low, it means that the node is healthy, and vice versa.

Dependent item consul.memberlist.degraded

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_degraded)

    Custom on fail: Discard value

Consul: Memberlist: health score

This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol.

This metric ranges from 0 to 8, where 0 indicates "totally healthy".

Dependent item consul.memberlist.health_score

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_health_score)

    Custom on fail: Discard value

Consul: Memberlist: gossip, p90

The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.

Dependent item consul.memberlist.dispatch_log.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_gossip{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Memberlist: gossip, p50

The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.

Dependent item consul.memberlist.gossip.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_gossip{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Memberlist: msg alive

This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.

Dependent item consul.memberlist.msg.alive

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_msg_alive)

    Custom on fail: Discard value

Consul: Memberlist: msg dead

This metric counts the number of times a Consul agent has marked another agent to be a dead node.

Dependent item consul.memberlist.msg.dead

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_msg_dead)

    Custom on fail: Discard value

Consul: Memberlist: msg suspect

The number of times a Consul agent suspects another as failed while probing during gossip protocol.

Dependent item consul.memberlist.msg.suspect

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_msg_suspect)

    Custom on fail: Discard value

Consul: Memberlist: probe node, p90

The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.

Dependent item consul.memberlist.probe_node.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_probeNode{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Memberlist: probe node, p50

The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.

Dependent item consul.memberlist.probe_node.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_probeNode{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Memberlist: push pull node, p90

The 90 percentile for the number of Consul agents that have exchanged state with this agent.

Dependent item consul.memberlist.push_pull_node.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_pushPullNode{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Memberlist: push pull node, p50

The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.

Dependent item consul.memberlist.push_pull_node.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_memberlist_pushPullNode{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: KV store: apply, p90

The 90 percentile for the time it takes to complete an update to the KV store.

Dependent item consul.kvs.apply.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_kvs_apply{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: KV store: apply, p50

The 50 percentile (median) for the time it takes to complete an update to the KV store.

Dependent item consul.kvs.apply.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_kvs_apply{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: KV store: apply, rate

The number of updates to the KV store per second.

Dependent item consul.kvs.apply.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_kvs_apply_count)

    Custom on fail: Discard value

  • Change per second
Consul: Serf member: flap, rate

Increments when an agent is marked dead and then recovers within a short time period.

This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.

Shown as events per second.

Dependent item consul.serf.member.flap.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_member_flap)

    Custom on fail: Discard value

  • Change per second
Consul: Serf member: failed, rate

Increments when an agent is marked dead.

This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.

Shown as events per second.

Dependent item consul.serf.member.failed.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_member_failed)

    Custom on fail: Discard value

  • Change per second
Consul: Serf member: join, rate

Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins.

Shown as events per second.

Dependent item consul.serf.member.join.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_member_join)

    Custom on fail: Discard value

  • Change per second
Consul: Serf member: left, rate

Increments when an agent leaves the cluster. Shown as events per second.

Dependent item consul.serf.member.left.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_member_left)

    Custom on fail: Discard value

  • Change per second
Consul: Serf member: update, rate

Increments when a Consul agent updates. Shown as events per second.

Dependent item consul.serf.member.update.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_member_update)

    Custom on fail: Discard value

  • Change per second
Consul: ACL: resolves, rate

The number of ACL resolves per second.

Dependent item consul.acl.resolves.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_acl_ResolveToken_count)

    Custom on fail: Discard value

  • Change per second
Consul: Catalog: register, rate

The number of catalog register operation per second.

Dependent item consul.catalog.register.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_catalog_register_count)

    Custom on fail: Discard value

  • Change per second
Consul: Catalog: deregister, rate

The number of catalog deregister operation per second.

Dependent item consul.catalog.deregister.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_catalog_deregister_count)

    Custom on fail: Discard value

  • Change per second
Consul: Snapshot: append line, p90

The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.

Dependent item consul.snapshot.append_line.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Snapshot: append line, p50

The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.

Dependent item consul.snapshot.append_line.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Snapshot: append line, rate

The number of snapshot appendLine operations per second.

Dependent item consul.snapshot.append_line.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_snapshot_appendLine_count)

    Custom on fail: Discard value

  • Change per second
Consul: Snapshot: compact, p90

The 90 percentile for the time taken by the Consul agent to compact a log.

This operation occurs only when the snapshot becomes large enough to justify the compaction.

Dependent item consul.snapshot.compact.p90

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_snapshot_compact{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Snapshot: compact, p50

The 50 percentile (median) for the time taken by the Consul agent to compact a log.

This operation occurs only when the snapshot becomes large enough to justify the compaction.

Dependent item consul.snapshot.compact.p50

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_snapshot_compact{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Snapshot: compact, rate

The number of snapshot compact operations per second.

Dependent item consul.snapshot.compact.rate

Preprocessing

  • Prometheus pattern: VALUE(consul_serf_snapshot_compact_count)

    Custom on fail: Discard value

  • Change per second
Consul: Get local services

Get all the services that are registered with the local agent and their status.

Script consul.get_local_services
Consul: Get local services check

Data collection check.

Dependent item consul.get_local_services.check

Preprocessing

  • JSON Path: $.error

    Custom on fail: Set value to

  • Discard unchanged with heartbeat: 3h

Triggers

Name Description Expression Severity Dependencies and additional info
Consul: Version has been changed

Consul version has changed. Acknowledge to close the problem manually.

last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0 Info Manual close: Yes
Consul: Current number of open files is too high

"Heavy file descriptor usage (i.e., near the processs file descriptor limit) indicates a potential file descriptor exhaustion issue."

min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN} Warning
Consul: Node's health score is warning

This metric ranges from 0 to 8, where 0 indicates "totally healthy".
This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals.
For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf

max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} Warning Depends on:
  • Consul: Node's health score is critical
Consul: Node's health score is critical

This metric ranges from 0 to 8, where 0 indicates "totally healthy".
This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals.
For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf

max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} Average
Consul: Failed to get local services

Failed to get local services. Check debug log for more information.

length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0 Warning

LLD rule Local node services discovery

Name Description Type Key and additional info
Local node services discovery

Discover metrics for services that are registered with the local agent.

Dependent item consul.node_services_lld

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info
Consul: ["{#SERVICE_NAME}"]: Aggregated status

Aggregated values of all health checks for the service instance.

Dependent item consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing

  • JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name Description Expression Severity Dependencies and additional info
Consul: Aggregated status is 'warning'

Aggregated state of service on the local agent is 'warning'.

last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1 Warning
Consul: Aggregated status is 'critical'

Aggregated state of service on the local agent is 'critical'.

last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2 Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info
HTTP API methods discovery

Discovery HTTP API methods specific metrics.

Dependent item consul.http_api_discovery

Preprocessing

  • Prometheus to JSON: consul_api_http{method =~ ".*"}

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info
Consul: HTTP request: ["{#HTTP_METHOD}"], p90

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

    Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

    Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

  • Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})

    Custom on fail: Discard value

  • Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info
Raft server metrics discovery

Discover raft metrics for server nodes.

Dependent item consul.raft.server.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name Description Type Key and additional info
Consul: Raft state

Current state of Consul agent.

Dependent item consul.raft.state[{#SINGLETON}]

Preprocessing

  • JSON Path: $.Stats.raft.state

  • Discard unchanged with heartbeat: 3h

Consul: Raft state: leader

Increments when a server becomes a leader.

Dependent item consul.raft.state_leader[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_state_leader)

    Custom on fail: Discard value

Consul: Raft state: candidate

The number of initiated leader elections.

Dependent item consul.raft.state_candidate[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_state_candidate)

    Custom on fail: Discard value

Consul: Raft: apply, rate

Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation).

This metric describes the arrival rate of new logs into Raft per second.

Dependent item consul.raft.apply.rate[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_apply)

    Custom on fail: Discard value

  • Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info
Raft leader metrics discovery

Discover raft metrics for leader nodes.

Dependent item consul.raft.leader.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name Description Type Key and additional info
Consul: Raft state: leader last contact, p90

The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.

Dependent item consul.raft.leader_last_contact.p90[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_leader_lastContact{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Raft state: leader last contact, p50

The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.

Dependent item consul.raft.leader_last_contact.p50[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_leader_lastContact{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Raft state: commit time, p90

The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.

Dependent item consul.raft.commit_time.p90[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_commitTime{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Raft state: commit time, p50

The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.

Dependent item consul.raft.commit_time.p50[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_commitTime{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Raft state: dispatch log, p90

The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.

Dependent item consul.raft.dispatch_log.p90[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Raft state: dispatch log, p50

The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.

Dependent item consul.raft.dispatch_log.p50[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Consul: Raft state: dispatch log, rate

The number of times a Raft leader writes a log to disk per second.

Dependent item consul.raft.dispatch_log.rate[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_leader_dispatchLog_count)

    Custom on fail: Discard value

  • Change per second
Consul: Raft state: commit, rate

The number of commits a new entry to the Raft log on the leader per second.

Dependent item consul.raft.commit_time.rate[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_raft_commitTime_count)

    Custom on fail: Discard value

  • Change per second
Consul: Autopilot healthy

Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.

Dependent item consul.autopilot.healthy[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(consul_autopilot_healthy)

    Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums