# TiDB TiKV by HTTP ## Overview The template to monitor TiKV server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. Template `TiDB TiKV by HTTP` — collects metrics by HTTP agent from TiKV /metrics endpoint. ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - TiDB cluster 4.0.10, 6.5.1 ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup This template works with TiKV server of TiDB cluster. Internal service metrics are collected from TiKV /metrics endpoint. Don't forget to change the macros {$TIKV.URL}, {$TIKV.PORT}. Also, see the Macros section for a list of macros used to set trigger values. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$TIKV.PORT}|
The port of TiKV server metrics web endpoint
|`20180`| |{$TIKV.URL}|TiKV server URL
|`localhost`| |{$TIKV.COPOCESSOR.ERRORS.MAX.WARN}|Maximum number of coprocessor request errors
|`1`| |{$TIKV.STORE.ERRORS.MAX.WARN}|Maximum number of failure messages
|`1`| |{$TIKV.PENDING_COMMANDS.MAX.WARN}|Maximum number of pending commands
|`1`| |{$TIKV.PENDING_TASKS.MAX.WARN}|Maximum number of tasks currently running by the worker or pending
|`1`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |TiKV: Get instance metrics|Get TiKV instance metrics.
|HTTP agent|tikv.get_metrics**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
The storage size of TiKV instance.
|Dependent item|tikv.engine_size**Preprocessing**
JSON Path: `$[?(@.name == "tikv_engine_size_bytes")].value.sum()`
Get capacity metrics of TiKV instance.
|Dependent item|tikv.store_size.metrics**Preprocessing**
JSON Path: `$[?(@.name == "tikv_store_size_bytes")]`
⛔️Custom on fail: Discard value
The available capacity of TiKV instance.
|Dependent item|tikv.store_size.available**Preprocessing**
JSON Path: `$[?(@.labels.type == "available")].value.first()`
The capacity size of TiKV instance.
|Dependent item|tikv.store_size.capacity**Preprocessing**
JSON Path: `$[?(@.labels.type == "capacity")].value.first()`
The total bytes of read in TiKV instance.
|Dependent item|tikv.engine_flow_bytes.read**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The total bytes of write in TiKV instance.
|Dependent item|tikv.engine_flow_bytes.write**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Total number of commands received per second.
|Dependent item|tikv.storage_command.rate**Preprocessing**
JSON Path: `$[?(@.name == "tikv_storage_command_total")].value.sum()`
The CPU usage ratio on TiKV instance.
|Dependent item|tikv.cpu.util**Preprocessing**
JSON Path: `$[?(@.name == "tikv_thread_cpu_seconds_total")].value.sum()`
Custom multiplier: `100`
Resident memory size in bytes.
|Dependent item|tikv.rss_bytes**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The number of regions collected in TiKV instance.
|Dependent item|tikv.region_count**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The number of leaders in TiKV instance.
|Dependent item|tikv.region_leader**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Get QPS metrics in TiKV instance.
|Dependent item|tikv.grpc_msgs.metrics**Preprocessing**
JSON Path: `$[?(@.name == "tikv_grpc_msg_duration_seconds_count")]`
⛔️Custom on fail: Discard value
The total QPS in TiKV instance.
|Dependent item|tikv.grpc_msg.rate**Preprocessing**
JSON Path: `$..value.sum()`
The total number of gRPC message handling failure per second.
|Dependent item|tikv.grpc_msg_fail.rate**Preprocessing**
JSON Path: `$[?(@.name == "tikv_grpc_msg_fail_total")].value.sum()`
⛔️Custom on fail: Discard value
Total number of push down request error per second.
|Dependent item|tikv.coprocessor_request_error.rate**Preprocessing**
JSON Path: `$[?(@.name == "tikv_coprocessor_request_error")].value.sum()`
⛔️Custom on fail: Discard value
Get metrics of coprocessor requests.
|Dependent item|tikv.coprocessor_requests.metrics**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Total number of coprocessor requests per second.
|Dependent item|tikv.coprocessor_request.rate**Preprocessing**
JSON Path: `$..value.sum()`
Total number of scan keys observed per request per second.
|Dependent item|tikv.coprocessor_scan_keys_sum.rate**Preprocessing**
JSON Path: `$[?(@.name == "tikv_coprocessor_scan_keys")].value.sum()`
⛔️Custom on fail: Discard value
Total number of RocksDB internal operations from PerfContext per second.
|Dependent item|tikv.coprocessor_rocksdb_perf.rate**Preprocessing**
JSON Path: `$[?(@.name == "tikv_coprocessor_rocksdb_perf")].value.sum()`
⛔️Custom on fail: Discard value
The total size of coprocessor response per second.
|Dependent item|tikv.coprocessor_response_bytes.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The total number of pending commands. The scheduler receives commands from clients, executes them against the MVCC layer storage engine.
|Dependent item|tikv.scheduler_contex**Preprocessing**
JSON Path: `$[?(@.name == "tikv_scheduler_contex_total")].value.first()`
The total count of too busy schedulers per second.
|Dependent item|tikv.scheduler_too_busy.rate**Preprocessing**
JSON Path: `$[?(@.name == "tikv_scheduler_too_busy_total")].value.sum()`
⛔️Custom on fail: Discard value
Get metrics of scheduler commands.
|Dependent item|tikv.scheduler.metrics**Preprocessing**
JSON Path: `$[?(@.name == "tikv_scheduler_stage_total")]`
⛔️Custom on fail: Discard value
Total number of commands per second.
|Dependent item|tikv.scheduler_commands.rate**Preprocessing**
JSON Path: `$..value.sum()`
⛔️Custom on fail: Set value to: `0`
Total count of low priority commands per second.
|Dependent item|tikv.commands_pri.low.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Total count of normal priority commands per second.
|Dependent item|tikv.commands_pri.normal.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Total count of high priority commands per second.
|Dependent item|tikv.commands_pri.high.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The number of tasks currently running by the worker or pending.
|Dependent item|tikv.worker_pending_task**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The total amount of raftstore snapshot traffic.
|Dependent item|tikv.snapshot.sending**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The total amount of raftstore snapshot traffic.
|Dependent item|tikv.snapshot.receiving**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
The total amount of raftstore snapshot traffic.
|Dependent item|tikv.snapshot.applying**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
The runtime of each TiKV instance.
|Dependent item|tikv.uptime**Preprocessing**
JSON Path: `$[?(@.name=="process_start_time_seconds")].value.first()`
JavaScript: `The text is too long. Please see the template.`
Get metrics of reporting failure messages.
|Dependent item|tikv.messages.failure.metrics**Preprocessing**
JSON Path: `$[?(@.name == "tikv_server_report_failure_msg_total")]`
⛔️Custom on fail: Discard value
Total number of reporting failure messages per second.
|Dependent item|tikv.messages.failure.rate**Preprocessing**
JSON Path: `$..value.sum()`
⛔️Custom on fail: Discard value
Uptime is less than 10 minutes.
|`last(/TiDB TiKV by HTTP/tikv.uptime)<10m`|Info|**Manual close**: Yes| ### LLD rule QPS metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |QPS metrics discovery|Discovery QPS metrics.
|Dependent item|tikv.qps.discovery**Preprocessing**
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `1h`
The QPS per command in TiKV instance.
|Dependent item|tikv.grpc_msg.rate[{#TYPE}]**Preprocessing**
JSON Path: `$[?(@.labels.type == "{#TYPE}")].value.first()`
⛔️Custom on fail: Set value to
Discovery coprocessor metrics.
|Dependent item|tikv.coprocessor.discovery**Preprocessing**
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `1h`
Get metrics of {#REQ_TYPE} requests.
|Dependent item|tikv.coprocessor_request.metrics[{#REQ_TYPE}]**Preprocessing**
JSON Path: `$[?(@.labels.req == "{#REQ_TYPE}")]`
⛔️Custom on fail: Discard value
Total number of push down request error per second.
|Dependent item|tikv.coprocessor_request_error.rate[{#REQ_TYPE}]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Total number of coprocessor requests per second.
|Dependent item|tikv.coprocessor_request.rate[{#REQ_TYPE}]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Total number of scan keys observed per request per second.
|Dependent item|tikv.coprocessor_scan_keys.rate[{#REQ_TYPE}]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Total number of RocksDB internal operations from PerfContext per second.
|Dependent item|tikv.coprocessor_rocksdb_perf.rate[{#REQ_TYPE}]**Preprocessing**
JSON Path: `$[?(@.name == "tikv_coprocessor_rocksdb_perf")].value.sum()`
⛔️Custom on fail: Discard value
Discovery scheduler metrics.
|Dependent item|tikv.scheduler.discovery**Preprocessing**
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `1h`
Total number of commands on each stage per second.
|Dependent item|tikv.scheduler_stage.rate[{#STAGE}]**Preprocessing**
JSON Path: `$[?(@.labels.stage == "{#STAGE}")].value.sum()`
⛔️Custom on fail: Set value to: `0`
Discovery server errors metrics.
|Dependent item|tikv.server_report_failure.discovery**Preprocessing**
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `1h`
Total number of reporting failure messages. The metric has two labels: type and store_id. type represents the failure type, and store_id represents the destination peer store id.
|Dependent item|tikv.messages.failure.rate[{#STORE_ID},{#TYPE}]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Indicates that the remote TiKV cannot be connected.
|`min(/TiDB TiKV by HTTP/tikv.messages.failure.rate[{#STORE_ID},{#TYPE}],5m)>{$TIKV.STORE.ERRORS.MAX.WARN}`|Warning|| ## Feedback Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com) You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)