You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

39 KiB

HashiCorp Vault by HTTP

Overview

The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Vault by HTTP — collects metrics by HTTP agent from /sys/metrics API endpoint. See https://www.vaultproject.io/api-docs/system/metrics.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • Vault 1.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

See Zabbix template operation for basic instructions.

Configure Vault API. See Vault Configuration. Create a Vault service token and set it to the macro {$VAULT.TOKEN}.

Macros used

Name Description Default
{$VAULT.API.PORT}

Vault port.

8200
{$VAULT.API.SCHEME}

Vault API scheme.

http
{$VAULT.HOST}

Vault host name.

<PUT YOUR VAULT HOST>
{$VAULT.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors for trigger expression.

90
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN}

Maximum number of Vault leadership setup failed.

5
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN}

Maximum number of Vault leadership losses.

5
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN}

Maximum number of Vault leadership step downs.

5
{$VAULT.LLD.FILTER.STORAGE.MATCHES}

Filter of discoverable storage backends.

.+
{$VAULT.TOKEN}

Vault auth token.

<PUT YOUR AUTH TOKEN>
{$VAULT.TOKEN.ACCESSORS}

Vault accessors separated by spaces for monitoring token expiration time.

{$VAULT.TOKEN.TTL.MIN.CRIT}

Token TTL critical threshold.

3d
{$VAULT.TOKEN.TTL.MIN.WARN}

Token TTL warning threshold.

7d

Items

Name Description Type Key and additional info
Vault: Get health HTTP agent vault.get_health

Preprocessing

  • Check for not supported value

    Custom on fail: Set value to: {"healthcheck": 0}

Vault: Get leader HTTP agent vault.get_leader

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

Vault: Get metrics HTTP agent vault.get_metrics

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

Vault: Clear metrics Dependent item vault.clear_metrics

Preprocessing

  • Check for error in JSON: $.errors

    Custom on fail: Discard value

Vault: Get tokens

Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}".

Script vault.get_tokens
Vault: Check WAL discovery Dependent item vault.check_wal_discovery

Preprocessing

  • Prometheus to JSON: {__name__=~"^vault_wal_(?:.+)$"}

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 15m

Vault: Check replication discovery Dependent item vault.check_replication_discovery

Preprocessing

  • Prometheus to JSON: {__name__=~"^replication_(?:.+)$"}

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 15m

Vault: Check storage discovery Dependent item vault.check_storage_discovery

Preprocessing

  • Prometheus to JSON: `{name=~"^vault_(?:.+)_(?:get

Vault: Check mountpoint discovery Dependent item vault.check_mountpoint_discovery

Preprocessing

  • Prometheus to JSON: {__name__=~"^vault_rollback_attempt_(?:.+?)_count$"}

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 15m

Vault: Initialized

Initialization status.

Dependent item vault.health.initialized

Preprocessing

  • JSON Path: $.initialized

    Custom on fail: Discard value

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Sealed

Seal status.

Dependent item vault.health.sealed

Preprocessing

  • JSON Path: $.sealed

    Custom on fail: Discard value

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Standby

Standby status.

Dependent item vault.health.standby

Preprocessing

  • JSON Path: $.standby

    Custom on fail: Discard value

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Performance standby

Performance standby status.

Dependent item vault.health.performance_standby

Preprocessing

  • JSON Path: $.performance_standby

    Custom on fail: Discard value

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Performance replication

Performance replication mode

https://www.vaultproject.io/docs/enterprise/replication

Dependent item vault.health.replication_performance_mode

Preprocessing

  • JSON Path: $.replication_performance_mode

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 1h

Vault: Disaster Recovery replication

Disaster recovery replication mode

https://www.vaultproject.io/docs/enterprise/replication

Dependent item vault.health.replication_dr_mode

Preprocessing

  • JSON Path: $.replication_dr_mode

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 1h

Vault: Version

Server version.

Dependent item vault.health.version

Preprocessing

  • JSON Path: $.version

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 1h

Vault: Healthcheck

Vault healthcheck.

Dependent item vault.health.check

Preprocessing

  • JSON Path: $.healthcheck

    Custom on fail: Set value to: 1

  • Discard unchanged with heartbeat: 1h

Vault: HA enabled

HA enabled status.

Dependent item vault.leader.ha_enabled

Preprocessing

  • JSON Path: $.ha_enabled

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Is leader

Leader status.

Dependent item vault.leader.is_self

Preprocessing

  • JSON Path: $.is_self

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Get metrics error

Get metrics error.

Dependent item vault.get_metrics.error

Preprocessing

  • JSON Path: $.errors[0]

    Custom on fail: Set value to: ``

  • Discard unchanged with heartbeat: 1h

Vault: Process CPU seconds, total

Total user and system CPU time spent in seconds.

Dependent item vault.metrics.process.cpu.seconds.total

Preprocessing

  • Prometheus pattern: VALUE(process_cpu_seconds_total)

    Custom on fail: Discard value

Vault: Open file descriptors, max

Maximum number of open file descriptors.

Dependent item vault.metrics.process.max.fds

Preprocessing

  • Prometheus pattern: VALUE(process_max_fds)

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 1h

Vault: Open file descriptors, current

Number of open file descriptors.

Dependent item vault.metrics.process.open.fds

Preprocessing

  • Prometheus pattern: VALUE(process_open_fds)

    Custom on fail: Discard value

Vault: Process resident memory

Resident memory size in bytes.

Dependent item vault.metrics.process.resident_memory.bytes

Preprocessing

  • Prometheus pattern: VALUE(process_resident_memory_bytes)

    Custom on fail: Discard value

Vault: Uptime

Server uptime.

Dependent item vault.metrics.process.uptime

Preprocessing

  • Prometheus pattern: VALUE(process_start_time_seconds)

    Custom on fail: Discard value

  • JavaScript: The text is too long. Please see the template.

Vault: Process virtual memory, current

Virtual memory size in bytes.

Dependent item vault.metrics.process.virtual_memory.bytes

Preprocessing

  • Prometheus pattern: VALUE(process_virtual_memory_bytes)

    Custom on fail: Discard value

Vault: Process virtual memory, max

Maximum amount of virtual memory available in bytes.

Dependent item vault.metrics.process.virtual_memory.max.bytes

Preprocessing

  • Prometheus pattern: VALUE(process_virtual_memory_max_bytes)

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 1h

Vault: Audit log requests, rate

Number of all audit log requests across all audit log devices.

Dependent item vault.metrics.audit.log.request.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_audit_log_request_count)

    Custom on fail: Discard value

  • Change per second
Vault: Audit log request failures, rate

Number of audit log request failures.

Dependent item vault.metrics.audit.log.request.failure.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_audit_log_request_failure)

    Custom on fail: Discard value

  • Change per second
Vault: Audit log response, rate

Number of audit log responses across all audit log devices.

Dependent item vault.metrics.audit.log.response.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_audit_log_response_count)

    Custom on fail: Discard value

  • Change per second
Vault: Audit log response failures, rate

Number of audit log response failures.

Dependent item vault.metrics.audit.log.response.failure.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_audit_log_response_failure)

    Custom on fail: Discard value

  • Change per second
Vault: Barrier DELETE ops, rate

Number of DELETE operations at the barrier.

Dependent item vault.metrics.barrier.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_barrier_delete_count)

    Custom on fail: Discard value

  • Change per second
Vault: Barrier GET ops, rate

Number of GET operations at the barrier.

Dependent item vault.metrics.vault.barrier.get.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_barrier_get_count)

    Custom on fail: Discard value

  • Change per second
Vault: Barrier LIST ops, rate

Number of LIST operations at the barrier.

Dependent item vault.metrics.barrier.list.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_barrier_list_count)

    Custom on fail: Discard value

  • Change per second
Vault: Barrier PUT ops, rate

Number of PUT operations at the barrier.

Dependent item vault.metrics.barrier.put.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_barrier_put_count)

    Custom on fail: Discard value

  • Change per second
Vault: Cache hit, rate

Number of times a value was retrieved from the LRU cache.

Dependent item vault.metrics.cache.hit.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_cache_hit)

    Custom on fail: Discard value

  • Change per second
Vault: Cache miss, rate

Number of times a value was not in the LRU cache. The results in a read from the configured storage.

Dependent item vault.metrics.cache.miss.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_cache_miss)

    Custom on fail: Discard value

  • Change per second
Vault: Cache write, rate

Number of times a value was written to the LRU cache.

Dependent item vault.metrics.cache.write.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_cache_write)

    Custom on fail: Discard value

  • Change per second
Vault: Check token, rate

Number of token checks handled by Vault core.

Dependent item vault.metrics.core.check.token.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_core_check_token_count)

    Custom on fail: Discard value

  • Change per second
Vault: Fetch ACL and token, rate

Number of ACL and corresponding token entry fetches handled by Vault core.

Dependent item vault.metrics.core.fetch.acl_and_token

Preprocessing

  • Prometheus pattern: VALUE(vault_core_fetch_acl_and_token_count)

    Custom on fail: Discard value

  • Change per second
Vault: Requests, rate

Number of requests handled by Vault core.

Dependent item vault.metrics.core.handle.request

Preprocessing

  • Prometheus pattern: VALUE(vault_core_handle_request_count)

    Custom on fail: Discard value

  • Change per second
Vault: Leadership setup failed, counter

Cluster leadership setup failures which have occurred in a highly available Vault cluster.

Dependent item vault.metrics.core.leadership.setup_failed

Preprocessing

  • Prometheus to JSON: vault_core_leadership_setup_failed

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Set value to: 0

Vault: Leadership setup lost, counter

Cluster leadership losses which have occurred in a highly available Vault cluster.

Dependent item vault.metrics.core.leadership_lost

Preprocessing

  • Prometheus to JSON: vault_core_leadership_lost_count

  • JSON Path: $[?(@.name=="vault_core_leadership_lost_count")].value.sum()

    Custom on fail: Set value to: 0

Vault: Post-unseal ops, counter

Duration of time taken by post-unseal operations handled by Vault core.

Dependent item vault.metrics.core.post_unseal

Preprocessing

  • Prometheus pattern: VALUE(vault_core_post_unseal_count)

    Custom on fail: Discard value

Vault: Pre-seal ops, counter

Duration of time taken by pre-seal operations.

Dependent item vault.metrics.core.pre_seal

Preprocessing

  • Prometheus pattern: VALUE(vault_core_pre_seal_count)

    Custom on fail: Discard value

Vault: Requested seal ops, counter

Duration of time taken by requested seal operations.

Dependent item vault.metrics.core.seal_with_request

Preprocessing

  • Prometheus pattern: VALUE(vault_core_seal_with_request_count)

    Custom on fail: Discard value

Vault: Seal ops, counter

Duration of time taken by seal operations.

Dependent item vault.metrics.core.seal

Preprocessing

  • Prometheus pattern: VALUE(vault_core_seal_count)

    Custom on fail: Discard value

Vault: Internal seal ops, counter

Duration of time taken by internal seal operations.

Dependent item vault.metrics.core.seal_internal

Preprocessing

  • Prometheus pattern: VALUE(vault_core_seal_internal_count)

    Custom on fail: Discard value

Vault: Leadership step downs, counter

Cluster leadership step down.

Dependent item vault.metrics.core.step_down

Preprocessing

  • Prometheus to JSON: vault_core_step_down_count

  • JSON Path: $[?(@.name=="vault_core_step_down_count")].value.sum()

    Custom on fail: Set value to: 0

Vault: Unseal ops, counter

Duration of time taken by unseal operations.

Dependent item vault.metrics.core.unseal

Preprocessing

  • Prometheus pattern: VALUE(vault_core_unseal_count)

    Custom on fail: Discard value

Vault: Fetch lease times, counter

Time taken to fetch lease times.

Dependent item vault.metrics.expire.fetch.lease.times

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_fetch_lease_times_count)

    Custom on fail: Discard value

Vault: Fetch lease times by token, counter

Time taken to fetch lease times by token.

Dependent item vault.metrics.expire.fetch.lease.times.by_token

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_fetch_lease_times_by_token_count)

    Custom on fail: Discard value

Vault: Number of expiring leases

Number of all leases which are eligible for eventual expiry.

Dependent item vault.metrics.expire.num_leases

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_num_leases)

    Custom on fail: Discard value

Vault: Expire revoke, count

Time taken to revoke a token.

Dependent item vault.metrics.expire.revoke

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_revoke_count)

    Custom on fail: Discard value

Vault: Expire revoke force, count

Time taken to forcibly revoke a token.

Dependent item vault.metrics.expire.revoke.force

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_revoke_force_count)

    Custom on fail: Discard value

Vault: Expire revoke prefix, count

Tokens revoke on a prefix.

Dependent item vault.metrics.expire.revoke.prefix

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_revoke_prefix_count)

    Custom on fail: Discard value

Vault: Revoke secrets by token, count

Time taken to revoke all secrets issued with a given token.

Dependent item vault.metrics.expire.revoke.by_token

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_revoke_by_token_count)

    Custom on fail: Discard value

Vault: Expire renew, count

Time taken to renew a lease.

Dependent item vault.metrics.expire.renew

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_renew_count)

    Custom on fail: Discard value

Vault: Renew token, count

Time taken to renew a token which does not need to invoke a logical backend.

Dependent item vault.metrics.expire.renew_token

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_renew_token_count)

    Custom on fail: Discard value

Vault: Register ops, count

Time taken for register operations.

Dependent item vault.metrics.expire.register

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_register_count)

    Custom on fail: Discard value

Vault: Register auth ops, count

Time taken for register authentication operations which create lease entries without lease ID.

Dependent item vault.metrics.expire.register.auth

Preprocessing

  • Prometheus pattern: VALUE(vault_expire_register_auth_count)

    Custom on fail: Discard value

Vault: Policy GET ops, rate

Number of operations to get a policy.

Dependent item vault.metrics.policy.get_policy.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_policy_get_policy_count)

    Custom on fail: Discard value

  • Change per second
Vault: Policy LIST ops, rate

Number of operations to list policies.

Dependent item vault.metrics.policy.list_policies.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_policy_list_policies_count)

    Custom on fail: Discard value

  • Change per second
Vault: Policy DELETE ops, rate

Number of operations to delete a policy.

Dependent item vault.metrics.policy.delete_policy.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_policy_delete_policy_count)

    Custom on fail: Discard value

  • Change per second
Vault: Policy SET ops, rate

Number of operations to set a policy.

Dependent item vault.metrics.policy.set_policy.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_policy_set_policy_count)

    Custom on fail: Discard value

  • Change per second
Vault: Token create, count

The time taken to create a token.

Dependent item vault.metrics.token.create

Preprocessing

  • Prometheus pattern: VALUE(vault_token_create_count)

    Custom on fail: Discard value

Vault: Token createAccessor, count

The time taken to create a token accessor.

Dependent item vault.metrics.token.createAccessor

Preprocessing

  • Prometheus pattern: VALUE(vault_token_createAccessor_count)

    Custom on fail: Discard value

Vault: Token lookup, rate

Number of token look up.

Dependent item vault.metrics.token.lookup.rate

Preprocessing

  • Prometheus pattern: VALUE(vault_token_lookup_count)

    Custom on fail: Discard value

  • Change per second
Vault: Token revoke, count

The time taken to look up a token.

Dependent item vault.metrics.token.revoke

Preprocessing

  • Prometheus pattern: VALUE(vault_token_revoke_count)

    Custom on fail: Discard value

Vault: Token revoke tree, count

Time taken to revoke a token tree.

Dependent item vault.metrics.token.revoke.tree

Preprocessing

  • Prometheus pattern: VALUE(vault_token_revoke_tree_count)

    Custom on fail: Discard value

Vault: Token store, count

Time taken to store an updated token entry without writing to the secondary index.

Dependent item vault.metrics.token.store

Preprocessing

  • Prometheus pattern: VALUE(vault_token_store_count)

    Custom on fail: Discard value

Vault: Runtime allocated bytes

Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value.

Dependent item vault.metrics.runtime.alloc.bytes

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_alloc_bytes)

    Custom on fail: Discard value

Vault: Runtime freed objects

Number of freed objects.

Dependent item vault.metrics.runtime.free.count

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_free_count)

    Custom on fail: Discard value

Vault: Runtime heap objects

Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting.

Dependent item vault.metrics.runtime.heap.objects

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_heap_objects)

    Custom on fail: Discard value

Vault: Runtime malloc count

Cumulative count of allocated heap objects.

Dependent item vault.metrics.runtime.malloc.count

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_malloc_count)

    Custom on fail: Discard value

Vault: Runtime num goroutines

Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting.

Dependent item vault.metrics.runtime.num_goroutines

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_num_goroutines)

    Custom on fail: Discard value

Vault: Runtime sys bytes

Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system.

Dependent item vault.metrics.runtime.sys.bytes

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_sys_bytes)

    Custom on fail: Discard value

Vault: Runtime GC pause, total

The total garbage collector pause time since Vault was last started.

Dependent item vault.metrics.total.gc.pause

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_total_gc_pause_ns)

    Custom on fail: Discard value

  • Custom multiplier: 1e-09

Vault: Runtime GC runs, total

Total number of garbage collection runs since Vault was last started.

Dependent item vault.metrics.runtime.total.gc.runs

Preprocessing

  • Prometheus pattern: VALUE(vault_runtime_total_gc_runs)

    Custom on fail: Discard value

Vault: Token count, total

Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes.

Dependent item vault.metrics.token

Preprocessing

  • Prometheus to JSON: vault_token_count

  • JSON Path: $[?(@.name=="vault_token_count")].value.sum()

    Custom on fail: Set value to: 0

Vault: Token count by auth, total

Total number of service tokens that were created by a auth method.

Dependent item vault.metrics.token.by_auth

Preprocessing

  • Prometheus to JSON: vault_token_count_by_auth

  • JSON Path: $[?(@.name=="vault_token_count_by_auth")].value.sum()

    Custom on fail: Set value to: 0

Vault: Token count by policy, total

Total number of service tokens that have a policy attached.

Dependent item vault.metrics.token.by_policy

Preprocessing

  • Prometheus to JSON: vault_token_count_by_policy

  • JSON Path: $[?(@.name=="vault_token_count_by_policy")].value.sum()

    Custom on fail: Set value to: 0

Vault: Token count by ttl, total

Number of service tokens, grouped by the TTL range they were assigned at creation.

Dependent item vault.metrics.token.by_ttl

Preprocessing

  • Prometheus to JSON: vault_token_count_by_ttl

  • JSON Path: $[?(@.name=="vault_token_count_by_ttl")].value.sum()

    Custom on fail: Set value to: 0

Vault: Token creation, rate

Number of service or batch tokens created.

Dependent item vault.metrics.token.creation.rate

Preprocessing

  • Prometheus to JSON: vault_token_creation

  • JSON Path: $[?(@.name=="vault_token_creation")].value.sum()

    Custom on fail: Set value to: 0

  • Change per second
Vault: Secret kv entries

Number of entries in each key-value secret engine.

Dependent item vault.metrics.secret.kv.count

Preprocessing

  • Prometheus to JSON: vault_secret_kv_count

  • JSON Path: $[?(@.name=="vault_secret_kv_count")].value.sum()

    Custom on fail: Set value to: 0

Vault: Token secret lease creation, rate

Counts the number of leases created by secret engines.

Dependent item vault.metrics.secret.lease.creation.rate

Preprocessing

  • Prometheus to JSON: vault_secret_lease_creation

  • JSON Path: $[?(@.name=="vault_secret_lease_creation")].value.sum()

    Custom on fail: Set value to: 0

  • Change per second

Triggers

Name Description Expression Severity Dependencies and additional info
Vault: Vault server is sealed

https://www.vaultproject.io/docs/concepts/seal

last(/HashiCorp Vault by HTTP/vault.health.sealed)=1 Average
Vault: Version has changed

Vault version has changed. Acknowledge to close the problem manually.

last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0 Info Manual close: Yes
Vault: Vault server is not responding last(/HashiCorp Vault by HTTP/vault.health.check)=0 High
Vault: Failed to get metrics length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 Warning Depends on:
  • Vault: Vault server is sealed
Vault: Current number of open files is too high min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} Warning
Vault: has been restarted

Uptime is less than 10 minutes.

last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m Info Manual close: Yes
Vault: High frequency of leadership setup failures

There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h.

(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Average
Vault: High frequency of leadership losses

There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h.

(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Average
Vault: High frequency of leadership step downs

There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h.

(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Average

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Storage backend metrics discovery.

Dependent item vault.storage.discovery

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
Vault: Storage [{#STORAGE}] {#OPERATION} ops, rate

Number of a {#OPERATION} operation against the {#STORAGE} storage backend.

Dependent item vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}]

Preprocessing

  • Prometheus pattern: VALUE({#PATTERN_C})

    Custom on fail: Discard value

  • Change per second

LLD rule Mountpoint metrics discovery

Name Description Type Key and additional info
Mountpoint metrics discovery

Mountpoint metrics discovery.

Dependent item vault.mountpoint.discovery

Item prototypes for Mountpoint metrics discovery

Name Description Type Key and additional info
Vault: Rollback attempt [{#MOUNTPOINT}] ops, rate

Number of operations to perform a rollback operation on the given mount point.

Dependent item vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}]

Preprocessing

  • Prometheus pattern: VALUE({#PATTERN_C})

    Custom on fail: Discard value

  • Change per second
Vault: Route rollback [{#MOUNTPOINT}] ops, rate

Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors.

Dependent item vault.metrics.route.rollback.rate[{#MOUNTPOINT}]

Preprocessing

  • Prometheus pattern: VALUE({#PATTERN_C})

    Custom on fail: Discard value

  • Change per second

LLD rule WAL metrics discovery

Name Description Type Key and additional info
WAL metrics discovery

Discovery for WAL metrics.

Dependent item vault.wal.discovery

Item prototypes for WAL metrics discovery

Name Description Type Key and additional info
Vault: Delete WALs, count{#SINGLETON}

Time taken to delete a Write Ahead Log (WAL).

Dependent item vault.metrics.wal.deletewals[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(vault_wal_deletewals_count)

    Custom on fail: Discard value

Vault: GC deleted WAL{#SINGLETON}

Number of Write Ahead Logs (WAL) deleted during each garbage collection run.

Dependent item vault.metrics.wal.gc.deleted[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(vault_wal_gc_deleted)

    Custom on fail: Discard value

Vault: WALs on disk, total{#SINGLETON}

Total Number of Write Ahead Logs (WAL) on disk.

Dependent item vault.metrics.wal.gc.total[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(vault_wal_gc_total)

    Custom on fail: Discard value

Vault: Load WALs, count{#SINGLETON}

Time taken to load a Write Ahead Log (WAL).

Dependent item vault.metrics.wal.loadWAL[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(vault_wal_loadWAL_count)

    Custom on fail: Discard value

Vault: Persist WALs, count{#SINGLETON}

Time taken to persist a Write Ahead Log (WAL).

Dependent item vault.metrics.wal.persistwals[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(vault_wal_persistwals_count)

    Custom on fail: Discard value

Vault: Flush ready WAL, count{#SINGLETON}

Time taken to flush a ready Write Ahead Log (WAL) to storage.

Dependent item vault.metrics.wal.flushready[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(vault_wal_flushready_count)

    Custom on fail: Discard value

LLD rule Replication metrics discovery

Name Description Type Key and additional info
Replication metrics discovery

Discovery for replication metrics.

Dependent item vault.replication.discovery

Item prototypes for Replication metrics discovery

Name Description Type Key and additional info
Vault: Stream WAL missing guard, count{#SINGLETON}

Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found.

Dependent item vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(logshipper_streamWALs_missing_guard)

    Custom on fail: Discard value

Vault: Stream WAL guard found, count{#SINGLETON}

Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found.

Dependent item vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(logshipper_streamWALs_guard_found)

    Custom on fail: Discard value

Vault: Merkle commit index{#SINGLETON}

The last committed index in the Merkle Tree.

Dependent item vault.metrics.replication.merkle.commit_index[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(replication_merkle_commit_index)

    Custom on fail: Discard value

Vault: Last WAL{#SINGLETON}

The index of the last WAL.

Dependent item vault.metrics.replication.wal.last_wal[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(replication_wal_last_wal)

    Custom on fail: Discard value

Vault: Last DR WAL{#SINGLETON}

The index of the last DR WAL.

Dependent item vault.metrics.replication.wal.last_dr_wal[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(replication_wal_last_dr_wal)

    Custom on fail: Discard value

Vault: Last performance WAL{#SINGLETON}

The index of the last Performance WAL.

Dependent item vault.metrics.replication.wal.last_performance_wal[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(replication_wal_last_performance_wal)

    Custom on fail: Discard value

Vault: Last remote WAL{#SINGLETON}

The index of the last remote WAL.

Dependent item vault.metrics.replication.fsm.last_remote_wal[{#SINGLETON}]

Preprocessing

  • Prometheus pattern: VALUE(replication_fsm_last_remote_wal)

    Custom on fail: Discard value

LLD rule Token metrics discovery

Name Description Type Key and additional info
Token metrics discovery

Tokens metrics discovery.

Dependent item vault.tokens.discovery

Item prototypes for Token metrics discovery

Name Description Type Key and additional info
Vault: Token [{#TOKEN_NAME}] error

Token lookup error text.

Dependent item vault.token_via_accessor.error["{#ACCESSOR}"]

Preprocessing

  • JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].error.first()

  • Discard unchanged with heartbeat: 1h

Vault: Token [{#TOKEN_NAME}] has TTL

The Token has TTL.

Dependent item vault.token_via_accessor.has_ttl["{#ACCESSOR}"]

Preprocessing

  • JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].has_ttl.first()

  • Boolean to decimal
  • Discard unchanged with heartbeat: 1h

Vault: Token [{#TOKEN_NAME}] TTL

The TTL period of the token.

Dependent item vault.token_via_accessor.ttl["{#ACCESSOR}"]

Preprocessing

  • JSON Path: $.[?(@.accessor == "{#ACCESSOR}")].ttl.first()

Trigger prototypes for Token metrics discovery

Name Description Expression Severity Dependencies and additional info
Vault: Token [{#TOKEN_NAME}] lookup error occurred length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 Warning Depends on:
  • Vault: Vault server is sealed
Vault: Token [{#TOKEN_NAME}] will expire soon last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} Average
Vault: Token [{#TOKEN_NAME}] will expire soon last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} Warning Depends on:
  • Vault: Token [{#TOKEN_NAME}] will expire soon

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums