You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yzl 93958d0fb0
zabbix6.0
1 year ago
..
README.md zabbix6.0 1 year ago
template_db_gridgain_jmx.yaml zabbix6.0 1 year ago

README.md

GridGain by JMX

Overview

Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • GridGain 8.8.5

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

  1. Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
  2. Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name Description Default
{$GRIDGAIN.PASSWORD} <secret>
{$GRIDGAIN.USER} zabbix
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}

Filter of discoverable thread pools.

.*
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}

Filter to exclude discovered thread pools.

Macro too long. Please see the template.
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}

Filter of discoverable data regions.

.*
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}

Filter to exclude discovered data regions.

^(sysMemPlc|TxLog)$
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}

Filter of discoverable cache groups.

.*
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}

Filter to exclude discovered cache groups.

CHANGE_IF_NEEDED
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}

Threshold for thread pool queue size. Can be used with thread pool name as context.

1000
{$GRIDGAIN.PME.DURATION.MAX.WARN}

The maximum PME duration in ms for warning trigger expression.

10000
{$GRIDGAIN.PME.DURATION.MAX.HIGH}

The maximum PME duration in ms for high trigger expression.

60000
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}

The maximum number of running threads for trigger expression.

1000
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}

The maximum number of queued jobs for trigger expression.

10
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}

The maximum percent of checkpoint buffer utilization for high trigger expression.

80
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}

The maximum percent of checkpoint buffer utilization for warning trigger expression.

66
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}

The maximum percent of data region utilization for high trigger expression.

90
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}

The maximum percent of data region utilization for warning trigger expression.

80

LLD rule GridGain kernal metrics

Name Description Type Key and additional info
GridGain kernal metrics JMX agent jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of GridGain instance.

JMX agent jmx["{#JMXOBJ}",UpTime]

Preprocessing

  • Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent jmx["{#JMXOBJ}",FullVersion]

Preprocessing

  • Regular expression: (.*)-\d+ \1

  • Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

  • Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name Description Expression Severity Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted

Uptime is less than 10 minutes.

last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m Info Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data

Zabbix has not received data for items for the last 10 minutes.

nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1 Warning Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed

The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.

last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0 Info Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info
Cluster metrics JMX agent jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name Description Type Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline

Total baseline nodes that are registered in the baseline topology.

JMX agent jmx["{#JMXOBJ}",TotalBaselineNodes]

Preprocessing

  • Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline

The number of nodes that are currently active in the baseline topology.

JMX agent jmx["{#JMXOBJ}",ActiveBaselineNodes]

Preprocessing

  • Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client

The number of client nodes in the cluster.

JMX agent jmx["{#JMXOBJ}",TotalClientNodes]

Preprocessing

  • Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total

Total number of nodes.

JMX agent jmx["{#JMXOBJ}",TotalNodes]

Preprocessing

  • Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server

The number of server nodes in the cluster.

JMX agent jmx["{#JMXOBJ}",TotalServerNodes]

Preprocessing

  • Discard unchanged with heartbeat: 3h

Trigger prototypes for Cluster metrics

Name Description Expression Severity Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology

One or more server node left the topology. Acknowledge to close the problem manually.

change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0 Warning Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology

One or more server node added to the topology. Acknowledge to close the problem manually.

change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0 Info Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology

One or more server node left the topology. Acknowledge to close the problem manually.

last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes]) Info Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info
Local node metrics JMX agent jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name Description Type Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current

Number of cancelled jobs that are still running.

JMX agent jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current

Number of jobs rejected after more recent collision resolution operation.

JMX agent jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current

Number of queued jobs currently waiting to be executed.

JMX agent jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current

Number of currently active jobs concurrently executing on the node.

JMX agent jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate

Total number of jobs handled by the node per second.

JMX agent jmx["{#JMXOBJ}",TotalExecutedJobs]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate

Total number of jobs cancelled by the node per second.

JMX agent jmx["{#JMXOBJ}",TotalCancelledJobs]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate

Total number of jobs this node rejects during collision resolution operations since node startup per second.

JMX agent jmx["{#JMXOBJ}",TotalRejectedJobs]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current

Current PME duration in milliseconds.

JMX agent jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current

Current number of live threads.

JMX agent jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used

Current heap size that is used for object allocation.

JMX agent jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name Description Expression Severity Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high

Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.

min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN} Warning
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.

min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN} Warning Depends on:
  • GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.

min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH} High
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high

Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.

min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN} Warning Depends on:
  • GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info
TCP discovery SPI JMX agent jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name Description Type Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator

Current coordinator UUID.

JMX agent jmx["{#JMXOBJ}",Coordinator]

Preprocessing

  • Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left

Nodes left count.

JMX agent jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined

Nodes join count.

JMX agent jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed

Nodes failed count.

JMX agent jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue

Message worker queue current size.

JMX agent jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate

Number of times node tries to (re)establish connection to another node per second.

JMX agent jmx["{#JMXOBJ}",ReconnectCount]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages

The number of messages received per second.

JMX agent jmx["{#JMXOBJ}",TotalProcessedMessages]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate

The number of messages processed per second.

JMX agent jmx["{#JMXOBJ}",TotalReceivedMessages]

Preprocessing

  • Change per second

Trigger prototypes for TCP discovery SPI

Name Description Expression Severity Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed

The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.

last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0 Warning Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info
TCP Communication SPI metrics JMX agent jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name Description Type Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue

Outbound messages queue size.

JMX agent jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate

The number of messages received per second.

JMX agent jmx["{#JMXOBJ}",ReceivedMessagesCount]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate

The number of messages sent per second.

JMX agent jmx["{#JMXOBJ}",SentMessagesCount]

Preprocessing

  • Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate

Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.

JMX agent jmx["{#JMXOBJ}",ReconnectCount,maxNumbers]

Preprocessing

  • Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info
Transaction metrics JMX agent jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name Description Type Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys

The number of keys locked on the node.

JMX agent jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current

The number of active transactions for which this node is the initiator.

JMX agent jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current

The number of active transactions holding at least one key lock.

JMX agent jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate

The number of transactions which were rollback per second.

JMX agent jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate

The number of transactions which were committed per second.

JMX agent jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info
Cache metrics JMX agent jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name Description Type Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate

The number of gets to the cache per second.

JMX agent jmx["{#JMXOBJ}",CacheGets]

Preprocessing

  • Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate

The number of puts to the cache per second.

JMX agent jmx["{#JMXOBJ}",CachePuts]

Preprocessing

  • Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate

The number of removals from the cache per second.

JMX agent jmx["{#JMXOBJ}",CacheRemovals]

Preprocessing

  • Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct

Percentage of successful hits.

JMX agent jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct

Percentage of accesses that failed to find anything.

JMX agent jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate

The number of transaction commits per second.

JMX agent jmx["{#JMXOBJ}",CacheTxCommits]

Preprocessing

  • Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate

The number of transaction rollback per second.

JMX agent jmx["{#JMXOBJ}",CacheTxRollbacks]

Preprocessing

  • Change per second
Cache group [{#JMXGROUP}]: Cache size

The number of non-null values in the cache as a long value.

JMX agent jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries

The number of entries in heap memory.

JMX agent jmx["{#JMXOBJ}",HeapEntriesCount]

Preprocessing

  • Change per second

Trigger prototypes for Cache metrics

Name Description Expression Severity Dependencies and additional info
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0 Average
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m) Warning Depends on:
  • Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap

All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.

last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount]) Info Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info
Data region metrics JMX agent jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name Description Type Key and additional info
Data region {#JMXNAME}: Allocation, rate

Allocation rate (pages per second) averaged across rateTimeInternal.

JMX agent jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes

Total size of memory allocated in bytes.

JMX agent jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages

Number of pages in memory not yet synchronized with persistent storage.

JMX agent jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate

Eviction rate (pages per second).

JMX agent jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max

Maximum memory region size defined by its data region.

JMX agent jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size

Offheap size in bytes.

JMX agent jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size

Total used offheap size in bytes.

JMX agent jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor

The percentage of the used space.

JMX agent jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate

Rate at which pages in memory are replaced with pages from persistent storage (pages per second).

JMX agent jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size

Used checkpoint buffer size in bytes.

JMX agent jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size

Total size in bytes for checkpoint buffer.

JMX agent jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name Description Expression Severity Dependencies and additional info
Data region {#JMXNAME}: Node started to evict pages

You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.

min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0 Info Manual close: Yes
Data region {#JMXNAME}: Data region utilization is too high

Data region utilization is high. Increase data region size or delete any data.

min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} Warning Depends on:
  • Data region {#JMXNAME}: Data region utilization is too high
Data region {#JMXNAME}: Data region utilization is too high

Data region utilization is high. Increase data region size or delete any data.

min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} High
Data region {#JMXNAME}: Pages replace rate more than 0

There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.

min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0 Warning
Data region {#JMXNAME}: Checkpoint buffer utilization is too high

Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.

min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} Warning Depends on:
  • Data region {#JMXNAME}: Checkpoint buffer utilization is too high
Data region {#JMXNAME}: Checkpoint buffer utilization is too high

Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.

min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} High

LLD rule Cache groups

Name Description Type Key and additional info
Cache groups JMX agent jmx.discovery[beans,"org.apache:group="Cache groups",*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name Description Type Key and additional info
Cache group [{#JMXNAME}]: Backups

Count of backups configured for cache group.

JMX agent jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions

Count of partitions for cache group.

JMX agent jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches

List of caches.

JMX agent jmx["{#JMXOBJ}",Caches]

Preprocessing

  • Discard unchanged with heartbeat: 3h

Cache group [{#JMXNAME}]: Local node partitions, moving

Count of partitions with state MOVING for this cache group located on this node.

JMX agent jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting

Count of partitions with state RENTING for this cache group located on this node.

JMX agent jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting

Count of entries remains to evict in RENTING partitions located on this node for this cache group.

JMX agent jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning

Count of partitions with state OWNING for this cache group located on this node.

JMX agent jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min

Minimum number of partition copies for all partitions of this cache group.

JMX agent jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max

Maximum number of partition copies for all partitions of this cache group.

JMX agent jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name Description Expression Severity Dependencies and additional info
Cache group [{#JMXNAME}]: One or more backups are unavailable min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m) Warning
Cache group [{#JMXNAME}]: List of caches has changed

List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.

last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0 Info Manual close: Yes
Cache group [{#JMXNAME}]: Rebalance in progress

Acknowledge to close the problem manually.

max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0 Info Manual close: Yes
Cache group [{#JMXNAME}]: There is no copy for partitions max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0 Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info
Thread pool metrics JMX agent jmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name Description Type Key and additional info
Thread pool [{#JMXNAME}]: Queue size

Current size of the execution queue.

JMX agent jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size

Current number of threads in the pool.

JMX agent jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max

The maximum allowed number of threads.

JMX agent jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core

The core number of threads.

JMX agent jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name Description Expression Severity Dependencies and additional info
Thread pool [{#JMXNAME}]: Too many messages in queue

Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.

min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums