You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yzl 93958d0fb0
zabbix6.0
1 year ago
..
README.md zabbix6.0 1 year ago
template_db_cockroachdb_http.yaml zabbix6.0 1 year ago

README.md

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template CockroachDB node by HTTP — collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.

Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.

Macros used

Name Description Default
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
CockroachDB: Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

CockroachDB: Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

CockroachDB: Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

CockroachDB: Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

CockroachDB: Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

CockroachDB: Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CockroachDB: CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

CockroachDB: Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
CockroachDB: Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
CockroachDB: Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
CockroachDB: Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
CockroachDB: Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
CockroachDB: File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

CockroachDB: File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

CockroachDB: GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
CockroachDB: Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

CockroachDB: KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
CockroachDB: KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
CockroachDB: Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

CockroachDB: Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
CockroachDB: Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

CockroachDB: Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

CockroachDB: Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

CockroachDB: Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

CockroachDB: Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

CockroachDB: Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
CockroachDB: Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
CockroachDB: Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
CockroachDB: Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
CockroachDB: Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

CockroachDB: SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
CockroachDB: SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
CockroachDB: Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

CockroachDB: SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
CockroachDB: SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

CockroachDB: SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

CockroachDB: SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
CockroachDB: SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
CockroachDB: SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
CockroachDB: SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
CockroachDB: SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
CockroachDB: SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
CockroachDB: SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
CockroachDB: SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
CockroachDB: SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
CockroachDB: SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

CockroachDB: SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
CockroachDB: SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
CockroachDB: SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
CockroachDB: SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
CockroachDB: Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

CockroachDB: Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CockroachDB: CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
CockroachDB: Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

CockroachDB: Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
CockroachDB: Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
CockroachDB: Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums