|
|
|
|
|
|
|
|
|
# CockroachDB by HTTP
|
|
|
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
|
|
|
|
The template to monitor CockroachDB nodes by Zabbix that works without any external scripts.
|
|
|
|
|
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
|
|
|
|
|
|
|
|
|
|
Template `CockroachDB node by HTTP` — collects metrics by HTTP agent from Prometheus endpoint and health endpoints.
|
|
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
|
|
Zabbix version: 7.0 and higher.
|
|
|
|
|
|
|
|
|
|
## Tested versions
|
|
|
|
|
|
|
|
|
|
This template has been tested on:
|
|
|
|
|
- CockroachDB 21.2.8
|
|
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
|
|
> Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section.
|
|
|
|
|
|
|
|
|
|
## Setup
|
|
|
|
|
|
|
|
|
|
Internal node metrics are collected from Prometheus /_status/vars endpoint.
|
|
|
|
|
Node health metrics are collected from /health and /health?ready=1 endpoints.
|
|
|
|
|
Template doesn't require usage of session token.
|
|
|
|
|
|
|
|
|
|
Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node).
|
|
|
|
|
Also, see the Macros section for a list of macros used to set trigger values.
|
|
|
|
|
|
|
|
|
|
*NOTE.* Some metrics may not be collected depending on your CockroachDB version and configuration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Macros used
|
|
|
|
|
|
|
|
|
|
|Name|Description|Default|
|
|
|
|
|
|----|-----------|-------|
|
|
|
|
|
|{$COCKROACHDB.API.PORT}|<p>The port of CockroachDB API and Prometheus endpoint.</p>|`8080`|
|
|
|
|
|
|{$COCKROACHDB.API.SCHEME}|<p>Request scheme which may be http or https.</p>|`http`|
|
|
|
|
|
|{$COCKROACHDB.STORE.USED.MIN.WARN}|<p>The warning threshold of the available disk space in percent.</p>|`20`|
|
|
|
|
|
|{$COCKROACHDB.STORE.USED.MIN.CRIT}|<p>The critical threshold of the available disk space in percent.</p>|`10`|
|
|
|
|
|
|{$COCKROACHDB.OPEN.FDS.MAX.WARN}|<p>Maximum percentage of used file descriptors.</p>|`80`|
|
|
|
|
|
|{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}|<p>Number of days until the node certificate expires.</p>|`30`|
|
|
|
|
|
|{$COCKROACHDB.CERT.CA.EXPIRY.WARN}|<p>Number of days until the CA certificate expires.</p>|`90`|
|
|
|
|
|
|{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}|<p>Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.</p>|`300`|
|
|
|
|
|
|{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}|<p>Maximum number of SQL statements errors for trigger expression.</p>|`2`|
|
|
|
|
|
|
|
|
|
|
### Items
|
|
|
|
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|
|
|CockroachDB: Get metrics|<p>Get raw metrics from the Prometheus endpoint.</p>|HTTP agent|cockroachdb.get_metrics<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|
|
|CockroachDB: Get health|<p>Get node /health endpoint</p>|HTTP agent|cockroachdb.get_health<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Regular expression: `HTTP.*\s(\d+) \1`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Get readiness|<p>Get node /health?ready=1 endpoint</p>|HTTP agent|cockroachdb.get_readiness<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Regular expression: `HTTP.*\s(\d+) \1`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Service ping|<p>Check if HTTP/HTTPS service accepts TCP connections.</p>|Simple check|net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]<p>**Preprocessing**</p><ul><li><p>Discard unchanged with heartbeat: `10m`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Clock offset|<p>Mean clock offset of the node against the rest of the cluster.</p>|Dependent item|cockroachdb.clock.offset<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(clock_offset_meannanos)`</p></li><li><p>Custom multiplier: `0.000000001`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Version|<p>Build information.</p>|Dependent item|cockroachdb.version<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `build_timestamp` label `tag`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: CPU: System time|<p>System CPU time.</p>|Dependent item|cockroachdb.cpu.system_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_cpu_sys_ns)`</p></li><li>Change per second</li><li><p>Custom multiplier: `0.000000001`</p></li></ul>|
|
|
|
|
|
|CockroachDB: CPU: User time|<p>User CPU time.</p>|Dependent item|cockroachdb.cpu.user_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_cpu_user_ns)`</p></li><li>Change per second</li><li><p>Custom multiplier: `0.000000001`</p></li></ul>|
|
|
|
|
|
|CockroachDB: CPU: Utilization|<p>The CPU utilization expressed in %.</p>|Dependent item|cockroachdb.cpu.util<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_cpu_combined_percent_normalized)`</p></li><li><p>Custom multiplier: `100`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Disk: IOPS in progress, rate|<p>Number of disk IO operations currently in progress on this host.</p>|Dependent item|cockroachdb.disk.iops.in_progress.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_disk_iopsinprogress)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Disk: Reads, rate|<p>Bytes read from all disks per second since this process started</p>|Dependent item|cockroachdb.disk.read.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_disk_read_bytes)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Disk: Read IOPS, rate|<p>Number of disk read operations per second across all disks since this process started.</p>|Dependent item|cockroachdb.disk.iops.read.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_disk_read_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Disk: Writes, rate|<p>Bytes written to all disks per second since this process started.</p>|Dependent item|cockroachdb.disk.write.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_disk_write_bytes)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Disk: Write IOPS, rate|<p>Disk write operations per second across all disks since this process started.</p>|Dependent item|cockroachdb.disk.iops.write.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_disk_write_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: File descriptors: Limit|<p>Open file descriptors soft limit of the process.</p>|Dependent item|cockroachdb.descriptors.limit<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_fd_softlimit)`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: File descriptors: Open|<p>The number of open file descriptors.</p>|Dependent item|cockroachdb.descriptors.open<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_fd_open)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: GC: Pause time|<p>The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.</p>|Dependent item|cockroachdb.gc.pause_time<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_gc_pause_ns)`</p></li><li>Change per second</li><li><p>Custom multiplier: `0.000000001`</p></li></ul>|
|
|
|
|
|
|CockroachDB: GC: Runs, rate|<p>The number of times that Go's garbage collector was invoked per second across all nodes.</p>|Dependent item|cockroachdb.gc.runs.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_gc_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Go: Goroutines count|<p>Current number of Goroutines. This count should rise and fall based on load.</p>|Dependent item|cockroachdb.go.goroutines.count<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_goroutines)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: KV transactions: Aborted, rate|<p>Number of aborted KV transactions per second.</p>|Dependent item|cockroachdb.kv.transactions.aborted.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(txn_aborts)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: KV transactions: Committed, rate|<p>Number of KV transactions (including 1PC) committed per second.</p>|Dependent item|cockroachdb.kv.transactions.committed.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(txn_commits)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Live nodes count|<p>The number of live nodes in the cluster (will be 0 if this node is not itself live).</p>|Dependent item|cockroachdb.live_count<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(liveness_livenodes)`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Liveness heartbeats, rate|<p>Number of successful node liveness heartbeats per second from this node.</p>|Dependent item|cockroachdb.heartbeaths.success.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(liveness_heartbeatsuccesses)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Memory: Allocated by Cgo|<p>Current bytes of memory allocated by the C layer.</p>|Dependent item|cockroachdb.memory.cgo.allocated<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_cgo_allocbytes)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Memory: Allocated by Go|<p>Current bytes of memory allocated by the Go layer.</p>|Dependent item|cockroachdb.memory.go.allocated<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_go_allocbytes)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Memory: Managed by Cgo|<p>Total bytes of memory managed by the C layer.</p>|Dependent item|cockroachdb.memory.cgo.managed<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_cgo_totalbytes)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Memory: Managed by Go|<p>Total bytes of memory managed by the Go layer.</p>|Dependent item|cockroachdb.memory.go.managed<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_go_totalbytes)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Memory: Total usage|<p>Resident set size (RSS) of memory in use by the node.</p>|Dependent item|cockroachdb.memory.total<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_rss)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Network: Bytes received, rate|<p>Bytes received per second on all network interfaces since this process started.</p>|Dependent item|cockroachdb.network.bytes.received.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_net_recv_bytes)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Network: Bytes sent, rate|<p>Bytes sent per second on all network interfaces since this process started.</p>|Dependent item|cockroachdb.network.bytes.sent.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_host_net_send_bytes)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Time series: Sample errors, rate|<p>The number of errors encountered while attempting to write metrics to disk, per second.</p>|Dependent item|cockroachdb.ts.samples.errors.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(timeseries_write_errors)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Time series: Samples written, rate|<p>The number of successfully written metric samples per second.</p>|Dependent item|cockroachdb.ts.samples.written.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(timeseries_write_samples)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Slow requests: DistSender RPCs|<p>Number of RPCs stuck or retrying for a long time.</p>|Dependent item|cockroachdb.slow_requests.rpc<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(requests_slow_distsender)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: SQL: Bytes received, rate|<p>Total amount of incoming SQL client network traffic in bytes per second.</p>|Dependent item|cockroachdb.sql.bytes.received.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_bytesin)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL: Bytes sent, rate|<p>Total amount of outgoing SQL client network traffic in bytes per second.</p>|Dependent item|cockroachdb.sql.bytes.sent.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_bytesout)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Memory: Allocated by SQL|<p>Current SQL statement memory usage for root.</p>|Dependent item|cockroachdb.memory.sql<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_mem_root_current)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: SQL: Schema changes, rate|<p>Total number of SQL DDL statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.schema_changes.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_ddl_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL sessions: Open|<p>Total number of open SQL sessions.</p>|Dependent item|cockroachdb.sql.sessions<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_conns)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: Active|<p>Total number of SQL statements currently active.</p>|Dependent item|cockroachdb.sql.statements.active<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_distsql_queries_active)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: DELETE, rate|<p>A moving average of the number of DELETE statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.statements.delete.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_delete_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: Executed, rate|<p>Number of SQL queries executed per second.</p>|Dependent item|cockroachdb.sql.statements.executed.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_query_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: Denials, rate|<p>The number of statements denied per second by a feature flag.</p>|Dependent item|cockroachdb.sql.statements.denials.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_feature_flag_denial)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: Active flows distributed, rate|<p>The number of distributed SQL flows currently active per second.</p>|Dependent item|cockroachdb.sql.statements.flows.active.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_distsql_flows_active)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: INSERT, rate|<p>A moving average of the number of INSERT statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.statements.insert.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_insert_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: SELECT, rate|<p>A moving average of the number of SELECT statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.statements.select.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_select_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: UPDATE, rate|<p>A moving average of the number of UPDATE statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.statements.update.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_update_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: Contention, rate|<p>Total number of SQL statements that experienced contention per second.</p>|Dependent item|cockroachdb.sql.statements.contention.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_distsql_contended_queries_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL statements: Errors, rate|<p>Total number of statements which returned a planning or runtime error per second.</p>|Dependent item|cockroachdb.sql.statements.errors.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_failure_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL transactions: Open|<p>Total number of currently open SQL transactions.</p>|Dependent item|cockroachdb.sql.transactions.open<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_txns_open)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: SQL transactions: Aborted, rate|<p>Total number of SQL transaction abort errors per second.</p>|Dependent item|cockroachdb.sql.transactions.aborted.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_txn_abort_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL transactions: Committed, rate|<p>Total number of SQL transaction COMMIT statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.transactions.committed.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_txn_commit_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL transactions: Initiated, rate|<p>Total number of SQL transaction BEGIN statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.transactions.initiated.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_txn_begin_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: SQL transactions: Rolled back, rate|<p>Total number of SQL transaction ROLLBACK statements successfully executed per second.</p>|Dependent item|cockroachdb.sql.transactions.rollbacks.rate<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sql_txn_rollback_count)`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Uptime|<p>Process uptime.</p>|Dependent item|cockroachdb.uptime<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sys_uptime)`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Node certificate expiration date|<p>Node certificate expires at that date.</p>|Dependent item|cockroachdb.cert.expire_date.node<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(security_certificate_expiration_node)`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `6h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: CA certificate expiration date|<p>CA certificate expires at that date.</p>|Dependent item|cockroachdb.cert.expire_date.ca<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(security_certificate_expiration_ca)`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `6h`</p></li></ul>|
|
|
|
|
|
|
|
|
|
|
### Triggers
|
|
|
|
|
|
|
|
|
|
|Name|Description|Expression|Severity|Dependencies and additional info|
|
|
|
|
|
|----|-----------|----------|--------|--------------------------------|
|
|
|
|
|
|CockroachDB: Node is unhealthy|<p>Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.</p>|`last(/CockroachDB by HTTP/cockroachdb.get_health) = 500`|Average|**Depends on**:<br><ul><li>CockroachDB: Service is down</li></ul>|
|
|
|
|
|
|CockroachDB: Node is not ready|<p>Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:<br>- node is in the wait phase of the node shutdown sequence;<br>- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.</p>|`last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m`|Average|**Depends on**:<br><ul><li>CockroachDB: Service is down</li></ul>|
|
|
|
|
|
|CockroachDB: Service is down||`last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0`|Average||
|
|
|
|
|
|CockroachDB: Clock offset is too high|<p>Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).</p>|`min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001`|Warning||
|
|
|
|
|
|CockroachDB: Version has changed||`last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0`|Info||
|
|
|
|
|
|CockroachDB: Current number of open files is too high|<p>Getting close to open file descriptor limit.</p>|`min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN}`|Warning||
|
|
|
|
|
|CockroachDB: Node is not executing SQL|<p>Node is not executing SQL despite having connections.</p>|`last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0`|Warning||
|
|
|
|
|
|CockroachDB: SQL statements errors rate is too high||`min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}`|Warning||
|
|
|
|
|
|CockroachDB: Node has been restarted|<p>Uptime is less than 10 minutes.</p>|`last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m`|Info||
|
|
|
|
|
|CockroachDB: Failed to fetch node data|<p>Zabbix has not received data for items for the last 5 minutes.</p>|`nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1`|Warning|**Depends on**:<br><ul><li>CockroachDB: Service is down</li></ul>|
|
|
|
|
|
|CockroachDB: Node certificate expires soon|<p>Node certificate expires soon.</p>|`(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN}`|Warning||
|
|
|
|
|
|CockroachDB: CA certificate expires soon|<p>CA certificate expires soon.</p>|`(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN}`|Warning||
|
|
|
|
|
|
|
|
|
|
### LLD rule Storage metrics discovery
|
|
|
|
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|
|
|Storage metrics discovery|<p>Discover per store metrics.</p>|Dependent item|cockroachdb.store.discovery<p>**Preprocessing**</p><ul><li><p>Prometheus to JSON: `capacity`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|
|
|
|
|
### Item prototypes for Storage metrics discovery
|
|
|
|
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Bytes: Live|<p>Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.</p>|Dependent item|cockroachdb.storage.bytes.[{#STORE},live]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(livebytes{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Bytes: System|<p>Number of physical bytes stored in system key-value pairs.</p>|Dependent item|cockroachdb.storage.bytes.[{#STORE},system]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(sysbytes{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Capacity available|<p>Available storage capacity.</p>|Dependent item|cockroachdb.storage.capacity.[{#STORE},available]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(capacity_available{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Capacity total|<p>Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.</p>|Dependent item|cockroachdb.storage.capacity.[{#STORE},total]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(capacity{store="{#STORE}"})`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Capacity used|<p>Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.</p>|Dependent item|cockroachdb.storage.capacity.[{#STORE},used]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(capacity_used{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Capacity available in %|<p>Available storage capacity in %.</p>|Calculated|cockroachdb.storage.capacity.[{#STORE},available_percent]|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Replication: Lease holders|<p>Number of lease holders.</p>|Dependent item|cockroachdb.replication.[{#STORE},lease_holders]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(replicas_leaseholders{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Bytes: Logical|<p>Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.</p>|Dependent item|cockroachdb.storage.bytes.[{#STORE},logical]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(totalbytes{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate|<p>Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.</p>|Dependent item|cockroachdb.rebalancing.queries.average.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(rebalancing_queriespersecond{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate|<p>Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.</p>|Dependent item|cockroachdb.rebalancing.writes.average.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(rebalancing_writespersecond{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate|<p>Number of replicas which failed processing in the consistency checker queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_consistency_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate|<p>Number of replicas which failed processing in the GC queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.gc.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_gc_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate|<p>Number of replicas which failed processing in the Raft log queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_raftlog_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate|<p>Number of replicas which failed processing in the Raft repair queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate|<p>Number of replicas which failed processing in the replica GC queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_replicagc_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate|<p>Number of replicas which failed processing in the replicate queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_replicate_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate|<p>Number of replicas which failed processing in the split queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.split.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_split_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate|<p>Number of replicas which failed processing in the time series maintenance queue per second.</p>|Dependent item|cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Ranges count|<p>Number of ranges.</p>|Dependent item|cockroachdb.ranges.[{#STORE},count]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(ranges{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Ranges unavailable|<p>Number of ranges with fewer live replicas than needed for quorum.</p>|Dependent item|cockroachdb.ranges.[{#STORE},unavailable]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(ranges_unavailable{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Ranges underreplicated|<p>Number of ranges with fewer live replicas than the replication target.</p>|Dependent item|cockroachdb.ranges.[{#STORE},underreplicated]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(ranges_underreplicated{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: RocksDB read amplification|<p>The average number of real read operations executed per logical read operation.</p>|Dependent item|cockroachdb.rocksdb.[{#STORE},read_amp]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(rocksdb_read_amplification{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate|<p>Count of block cache hits per second.</p>|Dependent item|cockroachdb.rocksdb.cache.hits.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(rocksdb_block_cache_hits{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate|<p>Count of block cache misses per second.</p>|Dependent item|cockroachdb.rocksdb.cache.misses.[{#STORE},rate]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(rocksdb_block_cache_misses{store="{#STORE}"})`</p></li><li>Change per second</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio|<p>Block cache hit ratio in %.</p>|Calculated|cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Replication: Replicas|<p>Number of replicas.</p>|Dependent item|cockroachdb.replication.replicas.[{#STORE},count]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(replicas{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced|<p>Number of quiesced replicas.</p>|Dependent item|cockroachdb.replication.replicas.[{#STORE},quiesced]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(replicas_quiescent{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions|<p>Number of requests that have been stuck for a long time acquiring latches.</p>|Dependent item|cockroachdb.slow_requests.[{#STORE},latch_acquisitions]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(requests_slow_latch{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions|<p>Number of requests that have been stuck for a long time acquiring a lease.</p>|Dependent item|cockroachdb.slow_requests.[{#STORE},lease_acquisitions]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(requests_slow_lease{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals|<p>Number of requests that have been stuck for a long time in raft.</p>|Dependent item|cockroachdb.slow_requests.[{#STORE},raft_proposals]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(requests_slow_raft{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: RocksDB SSTables|<p>The number of SSTables in use.</p>|Dependent item|cockroachdb.rocksdb.[{#STORE},sstables]<p>**Preprocessing**</p><ul><li><p>Prometheus pattern: `VALUE(rocksdb_num_sstables{store="{#STORE}"})`</p></li></ul>|
|
|
|
|
|
|
|
|
|
|
### Trigger prototypes for Storage metrics discovery
|
|
|
|
|
|
|
|
|
|
|Name|Description|Expression|Severity|Dependencies and additional info|
|
|
|
|
|
|----|-----------|----------|--------|--------------------------------|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Available storage capacity is low|<p>Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).</p>|`max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN}`|Warning|**Depends on**:<br><ul><li>CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low</li></ul>|
|
|
|
|
|
|CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low|<p>Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).</p>|`max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT}`|Average||
|
|
|
|
|
|
|
|
|
|
## Feedback
|
|
|
|
|
|
|
|
|
|
Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com)
|
|
|
|
|
|
|
|
|
|
You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)
|
|
|
|
|
|