17 KiB
TiDB TiKV by HTTP
Overview
The template to monitor TiKV server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB TiKV by HTTP
— collects metrics by HTTP agent from TiKV /metrics endpoint.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- TiDB cluster 4.0.10, 6.5.1
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
This template works with TiKV server of TiDB cluster. Internal service metrics are collected from TiKV /metrics endpoint. Don't forget to change the macros {$TIKV.URL}, {$TIKV.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$TIKV.PORT} | The port of TiKV server metrics web endpoint |
20180 |
{$TIKV.URL} | TiKV server URL |
localhost |
{$TIKV.COPOCESSOR.ERRORS.MAX.WARN} | Maximum number of coprocessor request errors |
1 |
{$TIKV.STORE.ERRORS.MAX.WARN} | Maximum number of failure messages |
1 |
{$TIKV.PENDING_COMMANDS.MAX.WARN} | Maximum number of pending commands |
1 |
{$TIKV.PENDING_TASKS.MAX.WARN} | Maximum number of tasks currently running by the worker or pending |
1 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Get instance metrics | Get TiKV instance metrics. |
HTTP agent | tikv.get_metrics Preprocessing
|
TiKV: Store size | The storage size of TiKV instance. |
Dependent item | tikv.engine_size Preprocessing
|
TiKV: Get store size metrics | Get capacity metrics of TiKV instance. |
Dependent item | tikv.store_size.metrics Preprocessing
|
TiKV: Available size | The available capacity of TiKV instance. |
Dependent item | tikv.store_size.available Preprocessing
|
TiKV: Capacity size | The capacity size of TiKV instance. |
Dependent item | tikv.store_size.capacity Preprocessing
|
TiKV: Bytes read | The total bytes of read in TiKV instance. |
Dependent item | tikv.engine_flow_bytes.read Preprocessing
|
TiKV: Bytes write | The total bytes of write in TiKV instance. |
Dependent item | tikv.engine_flow_bytes.write Preprocessing
|
TiKV: Storage: commands total, rate | Total number of commands received per second. |
Dependent item | tikv.storage_command.rate Preprocessing
|
TiKV: CPU util | The CPU usage ratio on TiKV instance. |
Dependent item | tikv.cpu.util Preprocessing
|
TiKV: RSS memory usage | Resident memory size in bytes. |
Dependent item | tikv.rss_bytes Preprocessing
|
TiKV: Regions, count | The number of regions collected in TiKV instance. |
Dependent item | tikv.region_count Preprocessing
|
TiKV: Regions, leader | The number of leaders in TiKV instance. |
Dependent item | tikv.region_leader Preprocessing
|
TiKV: Get QPS metrics | Get QPS metrics in TiKV instance. |
Dependent item | tikv.grpc_msgs.metrics Preprocessing
|
TiKV: Total query, rate | The total QPS in TiKV instance. |
Dependent item | tikv.grpc_msg.rate Preprocessing
|
TiKV: Total query errors, rate | The total number of gRPC message handling failure per second. |
Dependent item | tikv.grpc_msg_fail.rate Preprocessing
|
TiKV: Coprocessor: Errors, rate | Total number of push down request error per second. |
Dependent item | tikv.coprocessor_request_error.rate Preprocessing
|
TiKV: Get coprocessor requests metrics | Get metrics of coprocessor requests. |
Dependent item | tikv.coprocessor_requests.metrics Preprocessing
|
TiKV: Coprocessor: Requests, rate | Total number of coprocessor requests per second. |
Dependent item | tikv.coprocessor_request.rate Preprocessing
|
TiKV: Coprocessor: Scan keys, rate | Total number of scan keys observed per request per second. |
Dependent item | tikv.coprocessor_scan_keys_sum.rate Preprocessing
|
TiKV: Coprocessor: RocksDB ops, rate | Total number of RocksDB internal operations from PerfContext per second. |
Dependent item | tikv.coprocessor_rocksdb_perf.rate Preprocessing
|
TiKV: Coprocessor: Response size, rate | The total size of coprocessor response per second. |
Dependent item | tikv.coprocessor_response_bytes.rate Preprocessing
|
TiKV: Scheduler: Pending commands | The total number of pending commands. The scheduler receives commands from clients, executes them against the MVCC layer storage engine. |
Dependent item | tikv.scheduler_contex Preprocessing
|
TiKV: Scheduler: Busy, rate | The total count of too busy schedulers per second. |
Dependent item | tikv.scheduler_too_busy.rate Preprocessing
|
TiKV: Get scheduler metrics | Get metrics of scheduler commands. |
Dependent item | tikv.scheduler.metrics Preprocessing
|
TiKV: Scheduler: Commands total, rate | Total number of commands per second. |
Dependent item | tikv.scheduler_commands.rate Preprocessing
|
TiKV: Scheduler: Low priority commands total, rate | Total count of low priority commands per second. |
Dependent item | tikv.commands_pri.low.rate Preprocessing
|
TiKV: Scheduler: Normal priority commands total, rate | Total count of normal priority commands per second. |
Dependent item | tikv.commands_pri.normal.rate Preprocessing
|
TiKV: Scheduler: High priority commands total, rate | Total count of high priority commands per second. |
Dependent item | tikv.commands_pri.high.rate Preprocessing
|
TiKV: Snapshot: Pending tasks | The number of tasks currently running by the worker or pending. |
Dependent item | tikv.worker_pending_task Preprocessing
|
TiKV: Snapshot: Sending | The total amount of raftstore snapshot traffic. |
Dependent item | tikv.snapshot.sending Preprocessing
|
TiKV: Snapshot: Receiving | The total amount of raftstore snapshot traffic. |
Dependent item | tikv.snapshot.receiving Preprocessing
|
TiKV: Snapshot: Applying | The total amount of raftstore snapshot traffic. |
Dependent item | tikv.snapshot.applying Preprocessing
|
TiKV: Uptime | The runtime of each TiKV instance. |
Dependent item | tikv.uptime Preprocessing
|
TiKV: Get failure msg metrics | Get metrics of reporting failure messages. |
Dependent item | tikv.messages.failure.metrics Preprocessing
|
TiKV: Server: failure messages total, rate | Total number of reporting failure messages per second. |
Dependent item | tikv.messages.failure.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiKV: Too many coprocessor request error | min(/TiDB TiKV by HTTP/tikv.coprocessor_request_error.rate,5m)>{$TIKV.COPOCESSOR.ERRORS.MAX.WARN} |
Warning | ||
TiKV: Too many pending commands | min(/TiDB TiKV by HTTP/tikv.scheduler_contex,5m)>{$TIKV.PENDING_COMMANDS.MAX.WARN} |
Average | ||
TiKV: Too many pending tasks | min(/TiDB TiKV by HTTP/tikv.worker_pending_task,5m)>{$TIKV.PENDING_TASKS.MAX.WARN} |
Average | ||
TiKV: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB TiKV by HTTP/tikv.uptime)<10m |
Info | Manual close: Yes |
LLD rule QPS metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
QPS metrics discovery | Discovery QPS metrics. |
Dependent item | tikv.qps.discovery Preprocessing
|
Item prototypes for QPS metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Query: {#TYPE}, rate | The QPS per command in TiKV instance. |
Dependent item | tikv.grpc_msg.rate[{#TYPE}] Preprocessing
|
LLD rule Coprocessor metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Coprocessor metrics discovery | Discovery coprocessor metrics. |
Dependent item | tikv.coprocessor.discovery Preprocessing
|
Item prototypes for Coprocessor metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Coprocessor: {#REQ_TYPE} metrics | Get metrics of {#REQ_TYPE} requests. |
Dependent item | tikv.coprocessor_request.metrics[{#REQ_TYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} errors, rate | Total number of push down request error per second. |
Dependent item | tikv.coprocessor_request_error.rate[{#REQ_TYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} requests, rate | Total number of coprocessor requests per second. |
Dependent item | tikv.coprocessor_request.rate[{#REQ_TYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} scan keys, rate | Total number of scan keys observed per request per second. |
Dependent item | tikv.coprocessor_scan_keys.rate[{#REQ_TYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} RocksDB ops, rate | Total number of RocksDB internal operations from PerfContext per second. |
Dependent item | tikv.coprocessor_rocksdb_perf.rate[{#REQ_TYPE}] Preprocessing
|
LLD rule Scheduler metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduler metrics discovery | Discovery scheduler metrics. |
Dependent item | tikv.scheduler.discovery Preprocessing
|
Item prototypes for Scheduler metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Scheduler: commands {#STAGE}, rate | Total number of commands on each stage per second. |
Dependent item | tikv.scheduler_stage.rate[{#STAGE}] Preprocessing
|
LLD rule Server errors discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Server errors discovery | Discovery server errors metrics. |
Dependent item | tikv.server_report_failure.discovery Preprocessing
|
Item prototypes for Server errors discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Store_id {#STORE_ID}: failure messages "{#TYPE}", rate | Total number of reporting failure messages. The metric has two labels: type and store_id. type represents the failure type, and store_id represents the destination peer store id. |
Dependent item | tikv.messages.failure.rate[{#STORE_ID},{#TYPE}] Preprocessing
|
Trigger prototypes for Server errors discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiKV: Store_id {#STORE_ID}: Too many failure messages "{#TYPE}" | Indicates that the remote TiKV cannot be connected. |
min(/TiDB TiKV by HTTP/tikv.messages.failure.rate[{#STORE_ID},{#TYPE}],5m)>{$TIKV.STORE.ERRORS.MAX.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums