# Apache Kafka by JMX ## Overview This template is designed for the effortless deployment of Apache Kafka monitoring by Zabbix via JMX and doesn't require any external scripts. ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - Apache Kafka 2.6.0 ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup Metrics are collected by JMX. 1. Enable and configure JMX access to Apache Kafka. See documentation for [instructions](https://kafka.apache.org/documentation/#remote_jmx). 2. Set the user name and password in host macros {$KAFKA.USER} and {$KAFKA.PASSWORD}. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$KAFKA.USER}||`zabbix`| |{$KAFKA.PASSWORD}||`zabbix`| |{$KAFKA.TOPIC.MATCHES}|
Filter of discoverable topics
|`.*`| |{$KAFKA.TOPIC.NOT_MATCHES}|Filter to exclude discovered topics
|`__consumer_offsets`| |{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN}|The minimum Network processor average idle percent for trigger expression.
|`30`| |{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN}|The minimum Request handler average idle percent for trigger expression.
|`30`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kafka: Leader election per second|Number of leader elections per second.
|JMX agent|jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"]| |Kafka: Unclean leader election per second|Number of “unclean” elections per second.
|JMX agent|jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"]**Preprocessing**
One indicates that the broker is the controller for the cluster.
|JMX agent|jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"]**Preprocessing**
Discard unchanged with heartbeat: `1h`
The number of ineligible pending replica deletes.
|JMX agent|jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"]| |Kafka: Pending replica deletes|The number of pending replica deletes.
|JMX agent|jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"]| |Kafka: Ineligible pending topic deletes|The number of ineligible pending topic deletes.
|JMX agent|jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"]| |Kafka: Pending topic deletes|The number of pending topic deletes.
|JMX agent|jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"]| |Kafka: Offline log directory count|The number of offline log directories (for example, after a hardware failure).
|JMX agent|jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]| |Kafka: Offline partitions count|Number of partitions that don't have an active leader.
|JMX agent|jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]| |Kafka: Bytes out per second|The rate at which data is fetched and read from the broker by consumers.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"]**Preprocessing**
The rate at which data sent from producers is consumed by the broker.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"]**Preprocessing**
The rate at which individual messages are consumed by the broker.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"]**Preprocessing**
The rate at which bytes rejected per second by the broker.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"]**Preprocessing**
Number of client fetch request failures per second.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"]**Preprocessing**
Number of failed produce requests per second.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"]**Preprocessing**
Indicates the percentage of time that the request handler (IO) threads are not in use.
|JMX agent|jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"]**Preprocessing**
Custom multiplier: `100`
Average time taken, in milliseconds, to send the response.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"]| |Kafka: Fetch-Consumer response send time, p95|The time taken, in milliseconds, to send the response for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"]| |Kafka: Fetch-Consumer response send time, p99|The time taken, in milliseconds, to send the response for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"]| |Kafka: Fetch-Follower response send time, mean|Average time taken, in milliseconds, to send the response.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"]| |Kafka: Fetch-Follower response send time, p95|The time taken, in milliseconds, to send the response for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"]| |Kafka: Fetch-Follower response send time, p99|The time taken, in milliseconds, to send the response for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"]| |Kafka: Produce response send time, mean|Average time taken, in milliseconds, to send the response.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"]| |Kafka: Produce response send time, p95|The time taken, in milliseconds, to send the response for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"]| |Kafka: Produce response send time, p99|The time taken, in milliseconds, to send the response for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"]| |Kafka: Fetch-Consumer request total time, mean|Average time in ms to serve the Fetch-Consumer request.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"]| |Kafka: Fetch-Consumer request total time, p95|Time in ms to serve the Fetch-Consumer request for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"]| |Kafka: Fetch-Consumer request total time, p99|Time in ms to serve the specified Fetch-Consumer for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"]| |Kafka: Fetch-Follower request total time, mean|Average time in ms to serve the Fetch-Follower request.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"]| |Kafka: Fetch-Follower request total time, p95|Time in ms to serve the Fetch-Follower request for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"]| |Kafka: Fetch-Follower request total time, p99|Time in ms to serve the Fetch-Follower request for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"]| |Kafka: Produce request total time, mean|Average time in ms to serve the Produce request.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"]| |Kafka: Produce request total time, p95|Time in ms to serve the Produce requests for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"]| |Kafka: Produce request total time, p99|Time in ms to serve the Produce requests for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"]| |Kafka: Fetch-Consumer request total time, mean|Average time for a request to update metadata.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"]| |Kafka: UpdateMetadata request total time, p95|Time for update metadata requests for 95th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"]| |Kafka: UpdateMetadata request total time, p99|Time for update metadata requests for 99th percentile.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"]| |Kafka: Temporary memory size in bytes (Fetch), max|The maximum of temporary memory used for converting message formats and decompressing messages.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"]| |Kafka: Temporary memory size in bytes (Fetch), min|The minimum of temporary memory used for converting message formats and decompressing messages.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"]| |Kafka: Temporary memory size in bytes (Produce), max|The maximum of temporary memory used for converting message formats and decompressing messages.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"]| |Kafka: Temporary memory size in bytes (Produce), avg|The amount of temporary memory used for converting message formats and decompressing messages.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"]| |Kafka: Temporary memory size in bytes (Produce), min|The minimum of temporary memory used for converting message formats and decompressing messages.
|JMX agent|jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"]| |Kafka: Network processor average idle percent|The average percentage of time that the network processors are idle.
|JMX agent|jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"]**Preprocessing**
Custom multiplier: `100`
Number of requests waiting in producer purgatory.
|JMX agent|jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"]| |Kafka: Requests in fetch purgatory|Number of requests waiting in fetch purgatory.
|JMX agent|jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"]| |Kafka: Replication maximum lag|The maximum lag between the time that messages are received by the leader replica and by the follower replicas.
|JMX agent|jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"]| |Kafka: Under minimum ISR partition count|The number of partitions under the minimum In-Sync Replica (ISR) count.
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"]| |Kafka: Under replicated partitions|The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0).
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"]| |Kafka: ISR expands per second|The rate at which the number of ISRs in the broker increases.
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"]**Preprocessing**
Rate of replicas leaving the ISR pool.
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"]**Preprocessing**
The number of replicas for which this broker is the leader.
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"]| |Kafka: Partition count|The number of partitions in the broker.
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"]| |Kafka: Number of reassigning partitions|The number of reassigning leader partitions on a broker.
|JMX agent|jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"]| |Kafka: Request queue size|The size of the delay queue.
|JMX agent|jmx["kafka.server:type=Request","queue-size"]| |Kafka: Version|Current version of broker.
|JMX agent|jmx["kafka.server:type=app-info","version"]**Preprocessing**
Discard unchanged with heartbeat: `1h`
The service uptime expressed in seconds.
|JMX agent|jmx["kafka.server:type=app-info","start-time-ms"]**Preprocessing**
JavaScript: `The text is too long. Please see the template.`
Latency in milliseconds for ZooKeeper requests from broker.
|JMX agent|jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"]| |Kafka: ZooKeeper connection status|Connection status of broker's ZooKeeper session.
|JMX agent|jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"]**Preprocessing**
Discard unchanged with heartbeat: `1h`
ZooKeeper client disconnect per second.
|JMX agent|jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"]**Preprocessing**
ZooKeeper client session expiration per second.
|JMX agent|jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"]**Preprocessing**
ZooKeeper client readonly per second.
|JMX agent|jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"]**Preprocessing**
ZooKeeper client sync per second.
|JMX agent|jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"]**Preprocessing**
Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability.
|`last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0`|Average|| |Kafka: There are offline log directories|The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore.
|`last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0`|Warning|| |Kafka: One or more partitions have no leader|Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available.
|`last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0`|Warning|| |Kafka: Request handler average idle percent is too low|The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is.
|`max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN}`|Average|| |Kafka: Network processor average idle percent is too low|The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is.
|`max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN}`|Average|| |Kafka: Failed to fetch info data|Zabbix has not received data for items for the last 15 minutes
|`nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1`|Warning|| |Kafka: There are partitions under the min ISR|The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind.
|`last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0`|Average|| |Kafka: There are under replicated partitions|The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind.
|`last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0`|Average|| |Kafka: Version has changed|The Kafka version has changed. Acknowledge to close the problem manually.
|`last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0`|Info|**Manual close**: Yes| |Kafka: has been restarted|Uptime is less than 10 minutes.
|`last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m`|Info|**Manual close**: Yes| |Kafka: Broker is not connected to ZooKeeper||`find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0`|Average|| ### LLD rule Topic Metrics (write) |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Topic Metrics (write)||JMX agent|jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"]| ### Item prototypes for Topic Metrics (write) |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Kafka {#JMXTOPIC}: Messages in per second|The rate at which individual messages are consumed by topic.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"]**Preprocessing**
The rate at which data sent from producers is consumed by topic.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"]**Preprocessing**
The rate at which data is fetched and read from the broker by consumers (by topic).
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"]**Preprocessing**
Rejected bytes rate by topic.
|JMX agent|jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"]**Preprocessing**