You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

17 KiB

TiDB TiKV by HTTP

Overview

The template to monitor TiKV server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template TiDB TiKV by HTTP — collects metrics by HTTP agent from TiKV /metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • TiDB cluster 4.0.10, 6.5.1

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with TiKV server of TiDB cluster. Internal service metrics are collected from TiKV /metrics endpoint. Don't forget to change the macros {$TIKV.URL}, {$TIKV.PORT}. Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name Description Default
{$TIKV.PORT}

The port of TiKV server metrics web endpoint

20180
{$TIKV.URL}

TiKV server URL

localhost
{$TIKV.COPOCESSOR.ERRORS.MAX.WARN}

Maximum number of coprocessor request errors

1
{$TIKV.STORE.ERRORS.MAX.WARN}

Maximum number of failure messages

1
{$TIKV.PENDING_COMMANDS.MAX.WARN}

Maximum number of pending commands

1
{$TIKV.PENDING_TASKS.MAX.WARN}

Maximum number of tasks currently running by the worker or pending

1

Items

Name Description Type Key and additional info
TiKV: Get instance metrics

Get TiKV instance metrics.

HTTP agent tikv.get_metrics

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

  • Prometheus to JSON
TiKV: Store size

The storage size of TiKV instance.

Dependent item tikv.engine_size

Preprocessing

  • JSON Path: $[?(@.name == "tikv_engine_size_bytes")].value.sum()

TiKV: Get store size metrics

Get capacity metrics of TiKV instance.

Dependent item tikv.store_size.metrics

Preprocessing

  • JSON Path: $[?(@.name == "tikv_store_size_bytes")]

    Custom on fail: Discard value

TiKV: Available size

The available capacity of TiKV instance.

Dependent item tikv.store_size.available

Preprocessing

  • JSON Path: $[?(@.labels.type == "available")].value.first()

TiKV: Capacity size

The capacity size of TiKV instance.

Dependent item tikv.store_size.capacity

Preprocessing

  • JSON Path: $[?(@.labels.type == "capacity")].value.first()

TiKV: Bytes read

The total bytes of read in TiKV instance.

Dependent item tikv.engine_flow_bytes.read

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Bytes write

The total bytes of write in TiKV instance.

Dependent item tikv.engine_flow_bytes.write

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Storage: commands total, rate

Total number of commands received per second.

Dependent item tikv.storage_command.rate

Preprocessing

  • JSON Path: $[?(@.name == "tikv_storage_command_total")].value.sum()

  • Change per second
TiKV: CPU util

The CPU usage ratio on TiKV instance.

Dependent item tikv.cpu.util

Preprocessing

  • JSON Path: $[?(@.name == "tikv_thread_cpu_seconds_total")].value.sum()

  • Change per second
  • Custom multiplier: 100

TiKV: RSS memory usage

Resident memory size in bytes.

Dependent item tikv.rss_bytes

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Regions, count

The number of regions collected in TiKV instance.

Dependent item tikv.region_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Regions, leader

The number of leaders in TiKV instance.

Dependent item tikv.region_leader

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Get QPS metrics

Get QPS metrics in TiKV instance.

Dependent item tikv.grpc_msgs.metrics

Preprocessing

  • JSON Path: $[?(@.name == "tikv_grpc_msg_duration_seconds_count")]

    Custom on fail: Discard value

TiKV: Total query, rate

The total QPS in TiKV instance.

Dependent item tikv.grpc_msg.rate

Preprocessing

  • JSON Path: $..value.sum()

  • Change per second
TiKV: Total query errors, rate

The total number of gRPC message handling failure per second.

Dependent item tikv.grpc_msg_fail.rate

Preprocessing

  • JSON Path: $[?(@.name == "tikv_grpc_msg_fail_total")].value.sum()

    Custom on fail: Discard value

  • Change per second
TiKV: Coprocessor: Errors, rate

Total number of push down request error per second.

Dependent item tikv.coprocessor_request_error.rate

Preprocessing

  • JSON Path: $[?(@.name == "tikv_coprocessor_request_error")].value.sum()

    Custom on fail: Discard value

  • Change per second
TiKV: Get coprocessor requests metrics

Get metrics of coprocessor requests.

Dependent item tikv.coprocessor_requests.metrics

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

TiKV: Coprocessor: Requests, rate

Total number of coprocessor requests per second.

Dependent item tikv.coprocessor_request.rate

Preprocessing

  • JSON Path: $..value.sum()

  • Change per second
TiKV: Coprocessor: Scan keys, rate

Total number of scan keys observed per request per second.

Dependent item tikv.coprocessor_scan_keys_sum.rate

Preprocessing

  • JSON Path: $[?(@.name == "tikv_coprocessor_scan_keys")].value.sum()

    Custom on fail: Discard value

  • Change per second
TiKV: Coprocessor: RocksDB ops, rate

Total number of RocksDB internal operations from PerfContext per second.

Dependent item tikv.coprocessor_rocksdb_perf.rate

Preprocessing

  • JSON Path: $[?(@.name == "tikv_coprocessor_rocksdb_perf")].value.sum()

    Custom on fail: Discard value

  • Change per second
TiKV: Coprocessor: Response size, rate

The total size of coprocessor response per second.

Dependent item tikv.coprocessor_response_bytes.rate

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second
TiKV: Scheduler: Pending commands

The total number of pending commands. The scheduler receives commands from clients, executes them against the MVCC layer storage engine.

Dependent item tikv.scheduler_contex

Preprocessing

  • JSON Path: $[?(@.name == "tikv_scheduler_contex_total")].value.first()

TiKV: Scheduler: Busy, rate

The total count of too busy schedulers per second.

Dependent item tikv.scheduler_too_busy.rate

Preprocessing

  • JSON Path: $[?(@.name == "tikv_scheduler_too_busy_total")].value.sum()

    Custom on fail: Discard value

  • Change per second
TiKV: Get scheduler metrics

Get metrics of scheduler commands.

Dependent item tikv.scheduler.metrics

Preprocessing

  • JSON Path: $[?(@.name == "tikv_scheduler_stage_total")]

    Custom on fail: Discard value

TiKV: Scheduler: Commands total, rate

Total number of commands per second.

Dependent item tikv.scheduler_commands.rate

Preprocessing

  • JSON Path: $..value.sum()

    Custom on fail: Set value to: 0

  • Change per second
TiKV: Scheduler: Low priority commands total, rate

Total count of low priority commands per second.

Dependent item tikv.commands_pri.low.rate

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second
TiKV: Scheduler: Normal priority commands total, rate

Total count of normal priority commands per second.

Dependent item tikv.commands_pri.normal.rate

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second
TiKV: Scheduler: High priority commands total, rate

Total count of high priority commands per second.

Dependent item tikv.commands_pri.high.rate

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second
TiKV: Snapshot: Pending tasks

The number of tasks currently running by the worker or pending.

Dependent item tikv.worker_pending_task

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Snapshot: Sending

The total amount of raftstore snapshot traffic.

Dependent item tikv.snapshot.sending

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Snapshot: Receiving

The total amount of raftstore snapshot traffic.

Dependent item tikv.snapshot.receiving

Preprocessing

  • JSON Path: The text is too long. Please see the template.

TiKV: Snapshot: Applying

The total amount of raftstore snapshot traffic.

Dependent item tikv.snapshot.applying

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

TiKV: Uptime

The runtime of each TiKV instance.

Dependent item tikv.uptime

Preprocessing

  • JSON Path: $[?(@.name=="process_start_time_seconds")].value.first()

  • JavaScript: The text is too long. Please see the template.

TiKV: Get failure msg metrics

Get metrics of reporting failure messages.

Dependent item tikv.messages.failure.metrics

Preprocessing

  • JSON Path: $[?(@.name == "tikv_server_report_failure_msg_total")]

    Custom on fail: Discard value

TiKV: Server: failure messages total, rate

Total number of reporting failure messages per second.

Dependent item tikv.messages.failure.rate

Preprocessing

  • JSON Path: $..value.sum()

    Custom on fail: Discard value

  • Change per second

Triggers

Name Description Expression Severity Dependencies and additional info
TiKV: Too many coprocessor request error min(/TiDB TiKV by HTTP/tikv.coprocessor_request_error.rate,5m)>{$TIKV.COPOCESSOR.ERRORS.MAX.WARN} Warning
TiKV: Too many pending commands min(/TiDB TiKV by HTTP/tikv.scheduler_contex,5m)>{$TIKV.PENDING_COMMANDS.MAX.WARN} Average
TiKV: Too many pending tasks min(/TiDB TiKV by HTTP/tikv.worker_pending_task,5m)>{$TIKV.PENDING_TASKS.MAX.WARN} Average
TiKV: has been restarted

Uptime is less than 10 minutes.

last(/TiDB TiKV by HTTP/tikv.uptime)<10m Info Manual close: Yes

LLD rule QPS metrics discovery

Name Description Type Key and additional info
QPS metrics discovery

Discovery QPS metrics.

Dependent item tikv.qps.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 1h

Item prototypes for QPS metrics discovery

Name Description Type Key and additional info
TiKV: Query: {#TYPE}, rate

The QPS per command in TiKV instance.

Dependent item tikv.grpc_msg.rate[{#TYPE}]

Preprocessing

  • JSON Path: $[?(@.labels.type == "{#TYPE}")].value.first()

    Custom on fail: Set value to

LLD rule Coprocessor metrics discovery

Name Description Type Key and additional info
Coprocessor metrics discovery

Discovery coprocessor metrics.

Dependent item tikv.coprocessor.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 1h

Item prototypes for Coprocessor metrics discovery

Name Description Type Key and additional info
TiKV: Coprocessor: {#REQ_TYPE} metrics

Get metrics of {#REQ_TYPE} requests.

Dependent item tikv.coprocessor_request.metrics[{#REQ_TYPE}]

Preprocessing

  • JSON Path: $[?(@.labels.req == "{#REQ_TYPE}")]

    Custom on fail: Discard value

TiKV: Coprocessor: {#REQ_TYPE} errors, rate

Total number of push down request error per second.

Dependent item tikv.coprocessor_request_error.rate[{#REQ_TYPE}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Change per second
TiKV: Coprocessor: {#REQ_TYPE} requests, rate

Total number of coprocessor requests per second.

Dependent item tikv.coprocessor_request.rate[{#REQ_TYPE}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second
TiKV: Coprocessor: {#REQ_TYPE} scan keys, rate

Total number of scan keys observed per request per second.

Dependent item tikv.coprocessor_scan_keys.rate[{#REQ_TYPE}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second
TiKV: Coprocessor: {#REQ_TYPE} RocksDB ops, rate

Total number of RocksDB internal operations from PerfContext per second.

Dependent item tikv.coprocessor_rocksdb_perf.rate[{#REQ_TYPE}]

Preprocessing

  • JSON Path: $[?(@.name == "tikv_coprocessor_rocksdb_perf")].value.sum()

    Custom on fail: Discard value

  • Change per second

LLD rule Scheduler metrics discovery

Name Description Type Key and additional info
Scheduler metrics discovery

Discovery scheduler metrics.

Dependent item tikv.scheduler.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 1h

Item prototypes for Scheduler metrics discovery

Name Description Type Key and additional info
TiKV: Scheduler: commands {#STAGE}, rate

Total number of commands on each stage per second.

Dependent item tikv.scheduler_stage.rate[{#STAGE}]

Preprocessing

  • JSON Path: $[?(@.labels.stage == "{#STAGE}")].value.sum()

    Custom on fail: Set value to: 0

  • Change per second

LLD rule Server errors discovery

Name Description Type Key and additional info
Server errors discovery

Discovery server errors metrics.

Dependent item tikv.server_report_failure.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 1h

Item prototypes for Server errors discovery

Name Description Type Key and additional info
TiKV: Store_id {#STORE_ID}: failure messages "{#TYPE}", rate

Total number of reporting failure messages. The metric has two labels: type and store_id. type represents the failure type, and store_id represents the destination peer store id.

Dependent item tikv.messages.failure.rate[{#STORE_ID},{#TYPE}]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

  • Change per second

Trigger prototypes for Server errors discovery

Name Description Expression Severity Dependencies and additional info
TiKV: Store_id {#STORE_ID}: Too many failure messages "{#TYPE}"

Indicates that the remote TiKV cannot be connected.

min(/TiDB TiKV by HTTP/tikv.messages.failure.rate[{#STORE_ID},{#TYPE}],5m)>{$TIKV.STORE.ERRORS.MAX.WARN} Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums