21 KiB

Raw Blame History Unescape Escape

Envoy Proxy by HTTP

Overview

The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Envoy Proxy by HTTP - collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Envoy Proxy 1.20.2

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview

Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.

Macros used

Name	Description	Default
{$ENVOY.URL}	Instance URL.	`http://localhost:9901`
{$ENVOY.METRICS.PATH}	The path Zabbix will scrape metrics in prometheus format from.	`/stats/prometheus`
{$ENVOY.CERT.MIN}	Minimum number of days before certificate expiration used for trigger expression.	`7`

Items

Name	Description	Type	Key and additional info
Envoy Proxy: Get node metrics	Get server metrics.	HTTP agent	envoy.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Envoy Proxy: Server state	State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS).	Dependent item	envoy.server.state Preprocessing Prometheus pattern: `VALUE(envoy_server_state)` Discard unchanged with heartbeat: `3h`
Envoy Proxy: Server live	1 if the server is not currently draining, 0 otherwise.	Dependent item	envoy.server.live Preprocessing Prometheus pattern: `VALUE(envoy_server_live)` Discard unchanged with heartbeat: `3h`
Envoy Proxy: Uptime	Current server uptime in seconds.	Dependent item	envoy.server.uptime Preprocessing Prometheus pattern: `VALUE(envoy_server_uptime)` ⛔️Custom on fail: Discard value
Envoy Proxy: Certificate expiration, day before	Number of days until the next certificate being managed will expire.	Dependent item	envoy.server.days_until_first_cert_expiring Preprocessing Prometheus pattern: `VALUE(envoy_server_days_until_first_cert_expiring)`
Envoy Proxy: Server concurrency	Number of worker threads.	Dependent item	envoy.server.concurrency Preprocessing Prometheus pattern: `VALUE(envoy_server_concurrency)`
Envoy Proxy: Memory allocated	Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart.	Dependent item	envoy.server.memory_allocated Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_allocated)`
Envoy Proxy: Memory heap size	Current reserved heap size in bytes. New Envoy process heap size on hot restart.	Dependent item	envoy.server.memory_heap_size Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_heap_size)`
Envoy Proxy: Memory physical size	Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart.	Dependent item	envoy.server.memory_physical_size Preprocessing Prometheus pattern: `VALUE(envoy_server_memory_physical_size)`
Envoy Proxy: Filesystem, flushed by timer rate	Total number of times internal flush buffers are written to a file due to flush timeout per second.	Dependent item	envoy.filesystem.flushed_by_timer.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_flushed_by_timer)` Change per second
Envoy Proxy: Filesystem, write completed rate	Total number of times a file was written per second.	Dependent item	envoy.filesystem.write_completed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_write_completed)` Change per second
Envoy Proxy: Filesystem, write failed rate	Total number of times an error occurred during a file write operation per second.	Dependent item	envoy.filesystem.write_failed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_write_failed)` Change per second
Envoy Proxy: Filesystem, reopen failed rate	Total number of times a file was failed to be opened per second.	Dependent item	envoy.filesystem.reopen_failed.rate Preprocessing Prometheus pattern: `VALUE(envoy_filesystem_reopen_failed)` Change per second
Envoy Proxy: Connections, total	Total connections of both new and old Envoy processes.	Dependent item	envoy.server.total_connections Preprocessing Prometheus pattern: `VALUE(envoy_server_total_connections)`
Envoy Proxy: Connections, parent	Total connections of the old Envoy process on hot restart.	Dependent item	envoy.server.parent_connections Preprocessing Prometheus pattern: `VALUE(envoy_server_parent_connections)`
Envoy Proxy: Clusters, warming	Number of currently warming (not active) clusters.	Dependent item	envoy.cluster_manager.warming_clusters Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_warming_clusters)`
Envoy Proxy: Clusters, active	Number of currently active (warmed) clusters.	Dependent item	envoy.cluster_manager.active_clusters Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_active_clusters)`
Envoy Proxy: Clusters, added rate	Total clusters added (either via static config or CDS) per second.	Dependent item	envoy.cluster_manager.cluster_added.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_added)` Change per second
Envoy Proxy: Clusters, modified rate	Total clusters modified (via CDS) per second.	Dependent item	envoy.cluster_manager.cluster_modified.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_modified)` Change per second
Envoy Proxy: Clusters, removed rate	Total clusters removed (via CDS) per second.	Dependent item	envoy.cluster_manager.cluster_removed.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_removed)` Change per second
Envoy Proxy: Clusters, updates rate	Total cluster updates per second.	Dependent item	envoy.cluster_manager.cluster_updated.rate Preprocessing Prometheus pattern: `VALUE(envoy_cluster_manager_cluster_updated)` Change per second
Envoy Proxy: Listeners, active	Number of currently active listeners.	Dependent item	envoy.listener_manager.total_listeners_active Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_active)`
Envoy Proxy: Listeners, draining	Number of currently draining listeners.	Dependent item	envoy.listener_manager.total_listeners_draining Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_draining)`
Envoy Proxy: Listener, warming	Number of currently warming listeners.	Dependent item	envoy.listener_manager.total_listeners_warming Preprocessing Prometheus pattern: `SUM(envoy_listener_manager_total_listeners_warming)`
Envoy Proxy: Listener manager, initialized	A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers.	Dependent item	envoy.listener_manager.workers_started Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_workers_started)` Discard unchanged with heartbeat: `3h`
Envoy Proxy: Listeners, create failure	Total failed listener object additions to workers per second.	Dependent item	envoy.listener_manager.listener_create_failure.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_create_failure)` Change per second
Envoy Proxy: Listeners, create success	Total listener objects successfully added to workers per second.	Dependent item	envoy.listener_manager.listener_create_success.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_create_success)` Change per second
Envoy Proxy: Listeners, added	Total listeners added (either via static config or LDS) per second.	Dependent item	envoy.listener_manager.listener_added.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_added)` Change per second
Envoy Proxy: Listeners, stopped	Total listeners stopped per second.	Dependent item	envoy.listener_manager.listener_stopped.rate Preprocessing Prometheus pattern: `VALUE(envoy_listener_manager_listener_stopped)` Change per second

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Envoy Proxy: Server state is not live		`last(/Envoy Proxy by HTTP/envoy.server.state) > 0`	Average
Envoy Proxy: Service has been restarted	Uptime is less than 10 minutes.	`last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m`	Info	Manual close: Yes
Envoy Proxy: Failed to fetch metrics data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1`	Warning	Manual close: Yes
Envoy Proxy: SSL certificate expires soon	Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire.	`last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN}`	Warning

LLD rule Cluster metrics discovery

Name Description Type Key and additional info

Cluster metrics discovery

Dependent item

Name	Description	Type	Key and additional info
Cluster metrics discovery		Dependent item	envoy.lld.cluster Preprocessing Prometheus to JSON: `envoy_cluster_membership_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

envoy.lld.cluster

Preprocessing

Prometheus to JSON: envoy_cluster_membership_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster metrics discovery

Name	Description	Type	Key and additional info
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total	Current cluster membership total.	Dependent item	envoy.cluster.membership_total["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy	Current cluster healthy total (inclusive of both health checking and outlier detection).	Dependent item	envoy.cluster.membership_healthy["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy	Current cluster unhealthy.	Calculated	envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded	Current cluster degraded total.	Dependent item	envoy.cluster.membership_degraded["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total	Current cluster total connections.	Dependent item	envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active	Current cluster total active connections.	Dependent item	envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate	Current cluster request total per second.	Dependent item	envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate	Current cluster requests that timed out waiting for a response per second.	Dependent item	envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate	Total upstream requests completed per second.	Dependent item	envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate	Aggregate HTTP response codes per second.	Dependent item	envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending	Total active requests pending a connection pool connection.	Dependent item	envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active	Total active requests.	Dependent item	envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate	Total sent connection bytes per second.	Dependent item	envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate	Total received connection bytes per second.	Dependent item	envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second

Trigger prototypes for Cluster metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Envoy Proxy: There are unhealthy clusters		`last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0`	Average

LLD rule Listeners metrics discovery

Name Description Type Key and additional info

Listeners metrics discovery

Dependent item

Name	Description	Type	Key and additional info
Listeners metrics discovery		Dependent item	envoy.lld.listeners Preprocessing Prometheus to JSON: `envoy_listener_downstream_cx_active` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

envoy.lld.listeners

Preprocessing

Prometheus to JSON: envoy_listener_downstream_cx_active
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Listeners metrics discovery

Name Description Type Key and additional info

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active

Name	Description	Type	Key and additional info
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active	Total active connections.	Dependent item	envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate	Total connections per second.	Dependent item	envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing	Sockets currently undergoing listener filter processing.	Dependent item	envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`

Total active connections.

Dependent item

envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate

Total connections per second.

Dependent item

envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
Change per second

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing

Sockets currently undergoing listener filter processing.

Dependent item

envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.

LLD rule HTTP metrics discovery

Name Description Type Key and additional info

HTTP metrics discovery

Dependent item

Name	Description	Type	Key and additional info
HTTP metrics discovery		Dependent item	envoy.lld.http Preprocessing Prometheus to JSON: `envoy_http_downstream_rq_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

envoy.lld.http

Preprocessing

Prometheus to JSON: envoy_http_downstream_rq_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP metrics discovery

Name	Description	Type	Key and additional info
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate	Total active connections per second.	Dependent item	envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active	Total active requests.	Dependent item	envoy.http.downstream_rq_active["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate	Total requests closed due to a timeout on the request path per second.	Dependent item	envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate	Total connections per second.	Dependent item	envoy.http.downstream_cx_total["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active	Total active connections.	Dependent item	envoy.http.downstream_cx_active["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.`
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate	Total bytes received per second.	Dependent item	envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate	Total bytes sent per second.	Dependent item	envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

21 KiB Raw Blame History Unescape Escape

Envoy Proxy by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule Cluster metrics discovery

Item prototypes for Cluster metrics discovery

Trigger prototypes for Cluster metrics discovery

LLD rule Listeners metrics discovery

Item prototypes for Listeners metrics discovery

LLD rule HTTP metrics discovery

Item prototypes for HTTP metrics discovery

Feedback

21 KiB

Raw Blame History Unescape Escape