You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yzl 93958d0fb0
zabbix6.0
1 year ago
..
README.md zabbix6.0 1 year ago
template_app_ceph_agent2.yaml zabbix6.0 1 year ago

README.md

Ceph by Zabbix agent 2

Overview

The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Ceph by Zabbix agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • Ceph 14.2

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

  1. Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
  2. Set the {$CEPH.CONNSTRING} such as <protocol(host:port)> or named session.
  3. Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Zabbix agent configuration file.

Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Macros used

Name Description Default
{$CEPH.USER} zabbix
{$CEPH.API.KEY} zabbix_pass
{$CEPH.CONNSTRING} https://localhost:8003

Items

Name Description Type Key and additional info
Ceph: Get overall cluster status Zabbix agent ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Get OSD stats Zabbix agent ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Get OSD dump Zabbix agent ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Get df Zabbix agent ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ceph: Ping Zabbix agent ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Preprocessing

  • Discard unchanged with heartbeat: 30m

Ceph: Number of Monitors

The number of Monitors configured in a Ceph cluster.

Dependent item ceph.num_mon

Preprocessing

  • JSON Path: $.num_mon

  • Discard unchanged with heartbeat: 30m

Ceph: Overall cluster status

The overall Ceph cluster status, eg 0 - HEALTH_OK, 1 - HEALTH_WARN or 2 - HEALTH_ERR.

Dependent item ceph.overall_status

Preprocessing

  • JSON Path: $.overall_status

  • Discard unchanged with heartbeat: 10m

Ceph: Minimum Mon release version

min_mon_release_name

Dependent item ceph.min_mon_release_name

Preprocessing

  • JSON Path: $.min_mon_release_name

  • Discard unchanged with heartbeat: 1h

Ceph: Ceph Read bandwidth

The global read bytes per second.

Dependent item ceph.rd_bytes.rate

Preprocessing

  • JSON Path: $.rd_bytes

  • Change per second
Ceph: Ceph Write bandwidth

The global write bytes per second.

Dependent item ceph.wr_bytes.rate

Preprocessing

  • JSON Path: $.wr_bytes

  • Change per second
Ceph: Ceph Read operations per sec

The global read operations per second.

Dependent item ceph.rd_ops.rate

Preprocessing

  • JSON Path: $.rd_ops

  • Change per second
Ceph: Ceph Write operations per sec

The global write operations per second.

Dependent item ceph.wr_ops.rate

Preprocessing

  • JSON Path: $.wr_ops

  • Change per second
Ceph: Total bytes available

The total bytes available in a Ceph cluster.

Dependent item ceph.total_avail_bytes

Preprocessing

  • JSON Path: $.total_avail_bytes

Ceph: Total bytes

The total (RAW) capacity of a Ceph cluster in bytes.

Dependent item ceph.total_bytes

Preprocessing

  • JSON Path: $.total_bytes

Ceph: Total bytes used

The total bytes used in a Ceph cluster.

Dependent item ceph.total_used_bytes

Preprocessing

  • JSON Path: $.total_used_bytes

Ceph: Total number of objects

The total number of objects in a Ceph cluster.

Dependent item ceph.total_objects

Preprocessing

  • JSON Path: $.total_objects

Ceph: Number of Placement Groups

The total number of Placement Groups in a Ceph cluster.

Dependent item ceph.num_pg

Preprocessing

  • JSON Path: $.num_pg

  • Discard unchanged with heartbeat: 10m

Ceph: Number of Placement Groups in Temporary state

The total number of Placement Groups in a pg_temp state

Dependent item ceph.num_pg_temp

Preprocessing

  • JSON Path: $.num_pg_temp

Ceph: Number of Placement Groups in Active state

The total number of Placement Groups in an active state.

Dependent item ceph.pg_states.active

Preprocessing

  • JSON Path: $.pg_states.active

Ceph: Number of Placement Groups in Clean state

The total number of Placement Groups in a clean state.

Dependent item ceph.pg_states.clean

Preprocessing

  • JSON Path: $.pg_states.clean

Ceph: Number of Placement Groups in Peering state

The total number of Placement Groups in a peering state.

Dependent item ceph.pg_states.peering

Preprocessing

  • JSON Path: $.pg_states.peering

Ceph: Number of Placement Groups in Scrubbing state

The total number of Placement Groups in a scrubbing state.

Dependent item ceph.pg_states.scrubbing

Preprocessing

  • JSON Path: $.pg_states.scrubbing

Ceph: Number of Placement Groups in Undersized state

The total number of Placement Groups in an undersized state.

Dependent item ceph.pg_states.undersized

Preprocessing

  • JSON Path: $.pg_states.undersized

Ceph: Number of Placement Groups in Backfilling state

The total number of Placement Groups in a backfill state.

Dependent item ceph.pg_states.backfilling

Preprocessing

  • JSON Path: $.pg_states.backfilling

Ceph: Number of Placement Groups in degraded state

The total number of Placement Groups in a degraded state.

Dependent item ceph.pg_states.degraded

Preprocessing

  • JSON Path: $.pg_states.degraded

Ceph: Number of Placement Groups in inconsistent state

The total number of Placement Groups in an inconsistent state.

Dependent item ceph.pg_states.inconsistent

Preprocessing

  • JSON Path: $.pg_states.inconsistent

Ceph: Number of Placement Groups in Unknown state

The total number of Placement Groups in an unknown state.

Dependent item ceph.pg_states.unknown

Preprocessing

  • JSON Path: $.pg_states.unknown

Ceph: Number of Placement Groups in remapped state

The total number of Placement Groups in a remapped state.

Dependent item ceph.pg_states.remapped

Preprocessing

  • JSON Path: $.pg_states.remapped

Ceph: Number of Placement Groups in recovering state

The total number of Placement Groups in a recovering state.

Dependent item ceph.pg_states.recovering

Preprocessing

  • JSON Path: $.pg_states.recovering

Ceph: Number of Placement Groups in backfill_toofull state

The total number of Placement Groups in a backfill_toofull state.

Dependent item ceph.pg_states.backfill_toofull

Preprocessing

  • JSON Path: $.pg_states.backfill_toofull

Ceph: Number of Placement Groups in backfill_wait state

The total number of Placement Groups in a backfill_wait state.

Dependent item ceph.pg_states.backfill_wait

Preprocessing

  • JSON Path: $.pg_states.backfill_wait

Ceph: Number of Placement Groups in recovery_wait state

The total number of Placement Groups in a recovery_wait state.

Dependent item ceph.pg_states.recovery_wait

Preprocessing

  • JSON Path: $.pg_states.recovery_wait

Ceph: Number of Pools

The total number of pools in a Ceph cluster.

Dependent item ceph.num_pools

Preprocessing

  • JSON Path: $.num_pools

Ceph: Number of OSDs

The number of the known storage daemons in a Ceph cluster.

Dependent item ceph.num_osd

Preprocessing

  • JSON Path: $.num_osd

  • Discard unchanged with heartbeat: 10m

Ceph: Number of OSDs in state: UP

The total number of the online storage daemons in a Ceph cluster.

Dependent item ceph.num_osd_up

Preprocessing

  • JSON Path: $.num_osd_up

  • Discard unchanged with heartbeat: 10m

Ceph: Number of OSDs in state: IN

The total number of the participating storage daemons in a Ceph cluster.

Dependent item ceph.num_osd_in

Preprocessing

  • JSON Path: $.num_osd_in

  • Discard unchanged with heartbeat: 10m

Ceph: Ceph OSD avg fill

The average fill of OSDs.

Dependent item ceph.osd_fill.avg

Preprocessing

  • JSON Path: $.osd_fill.avg

Ceph: Ceph OSD max fill

The percentage of the most filled OSD.

Dependent item ceph.osd_fill.max

Preprocessing

  • JSON Path: $.osd_fill.max

Ceph: Ceph OSD min fill

The percentage fill of the minimum filled OSD.

Dependent item ceph.osd_fill.min

Preprocessing

  • JSON Path: $.osd_fill.min

Ceph: Ceph OSD max PGs

The maximum amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.max

Preprocessing

  • JSON Path: $.osd_pgs.max

Ceph: Ceph OSD min PGs

The minimum amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.min

Preprocessing

  • JSON Path: $.osd_pgs.min

Ceph: Ceph OSD avg PGs

The average amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.avg

Preprocessing

  • JSON Path: $.osd_pgs.avg

Ceph: Ceph OSD Apply latency Avg

The average apply latency of OSDs.

Dependent item ceph.osd_latency_apply.avg

Preprocessing

  • JSON Path: $.osd_latency_apply.avg

Ceph: Ceph OSD Apply latency Max

The maximum apply latency of OSDs.

Dependent item ceph.osd_latency_apply.max

Preprocessing

  • JSON Path: $.osd_latency_apply.max

Ceph: Ceph OSD Apply latency Min

The minimum apply latency of OSDs.

Dependent item ceph.osd_latency_apply.min

Preprocessing

  • JSON Path: $.osd_latency_apply.min

Ceph: Ceph OSD Commit latency Avg

The average commit latency of OSDs.

Dependent item ceph.osd_latency_commit.avg

Preprocessing

  • JSON Path: $.osd_latency_commit.avg

Ceph: Ceph OSD Commit latency Max

The maximum commit latency of OSDs.

Dependent item ceph.osd_latency_commit.max

Preprocessing

  • JSON Path: $.osd_latency_commit.max

Ceph: Ceph OSD Commit latency Min

The minimum commit latency of OSDs.

Dependent item ceph.osd_latency_commit.min

Preprocessing

  • JSON Path: $.osd_latency_commit.min

Ceph: Ceph backfill full ratio

The backfill full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_backfillfull_ratio

Preprocessing

  • JSON Path: $.osd_backfillfull_ratio

  • Discard unchanged with heartbeat: 10m

Ceph: Ceph full ratio

The full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_full_ratio

Preprocessing

  • JSON Path: $.osd_full_ratio

  • Discard unchanged with heartbeat: 10m

Ceph: Ceph nearfull ratio

The near full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_nearfull_ratio

Preprocessing

  • JSON Path: $.osd_nearfull_ratio

  • Discard unchanged with heartbeat: 10m

Triggers

Name Description Expression Severity Dependencies and additional info
Ceph: Can not connect to cluster

The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues).

last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 Average
Ceph: Cluster in ERROR state last(/Ceph by Zabbix agent 2/ceph.overall_status)=2 Average Manual close: Yes
Ceph: Cluster in WARNING state last(/Ceph by Zabbix agent 2/ceph.overall_status)=1 Warning Manual close: Yes
Depends on:
  • Ceph: Cluster in ERROR state
Ceph: Minimum monitor release version has changed

A Ceph version has changed. Acknowledge to close the problem manually.

last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0 Info Manual close: Yes

LLD rule OSD

Name Description Type Key and additional info
OSD Zabbix agent ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for OSD

Name Description Type Key and additional info
Ceph: [osd.{#OSDNAME}] OSD in Dependent item ceph.osd[{#OSDNAME},in]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.in

  • Discard unchanged with heartbeat: 10m

Ceph: [osd.{#OSDNAME}] OSD up Dependent item ceph.osd[{#OSDNAME},up]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.up

  • Discard unchanged with heartbeat: 10m

Ceph: [osd.{#OSDNAME}] OSD PGs Dependent item ceph.osd[{#OSDNAME},num_pgs]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.num_pgs

    Custom on fail: Discard value

Ceph: [osd.{#OSDNAME}] OSD fill Dependent item ceph.osd[{#OSDNAME},fill]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_fill

    Custom on fail: Discard value

Ceph: [osd.{#OSDNAME}] OSD latency apply

The time taken to flush an update to disks.

Dependent item ceph.osd[{#OSDNAME},latency_apply]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_latency_apply

    Custom on fail: Discard value

Ceph: [osd.{#OSDNAME}] OSD latency commit

The time taken to commit an operation to the journal.

Dependent item ceph.osd[{#OSDNAME},latency_commit]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_latency_commit

    Custom on fail: Discard value

Trigger prototypes for OSD

Name Description Expression Severity Dependencies and additional info
Ceph: OSD osd.{#OSDNAME} is down

OSD osd.{#OSDNAME} is marked "down" in the osdmap.
The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network.

last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0 Average
Ceph: OSD osd.{#OSDNAME} is full min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100 Average
Ceph: Ceph OSD osd.{#OSDNAME} is near full min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100 Warning Depends on:
  • Ceph: OSD osd.{#OSDNAME} is full

LLD rule Pool

Name Description Type Key and additional info
Pool Zabbix agent ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for Pool

Name Description Type Key and additional info
Ceph: [{#POOLNAME}] Pool Used

The total bytes used in a pool.

Dependent item ceph.pool["{#POOLNAME}",bytes_used]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].bytes_used

Ceph: [{#POOLNAME}] Max available

The maximum available space in the given pool.

Dependent item ceph.pool["{#POOLNAME}",max_avail]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].max_avail

Ceph: [{#POOLNAME}] Pool RAW Used

Bytes used in pool including the copies made.

Dependent item ceph.pool["{#POOLNAME}",stored_raw]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].stored_raw

Ceph: [{#POOLNAME}] Pool Percent Used

The percentage of the storage used per pool.

Dependent item ceph.pool["{#POOLNAME}",percent_used]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].percent_used

Ceph: [{#POOLNAME}] Pool objects

The number of objects in the pool.

Dependent item ceph.pool["{#POOLNAME}",objects]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].objects

Ceph: [{#POOLNAME}] Pool Read bandwidth

The read rate per pool (bytes per second).

Dependent item ceph.pool["{#POOLNAME}",rd_bytes.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].rd_bytes

  • Change per second
Ceph: [{#POOLNAME}] Pool Write bandwidth

The write rate per pool (bytes per second).

Dependent item ceph.pool["{#POOLNAME}",wr_bytes.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].wr_bytes

  • Change per second
Ceph: [{#POOLNAME}] Pool Read operations

The read rate per pool (operations per second).

Dependent item ceph.pool["{#POOLNAME}",rd_ops.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].rd_ops

  • Change per second
Ceph: [{#POOLNAME}] Pool Write operations

The write rate per pool (operations per second).

Dependent item ceph.pool["{#POOLNAME}",wr_ops.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].wr_ops

  • Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums