You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

21 KiB

Envoy Proxy by HTTP

Overview

The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Envoy Proxy by HTTP - collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • Envoy Proxy 1.20.2

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview

Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.

Macros used

Name Description Default
{$ENVOY.URL}

Instance URL.

http://localhost:9901
{$ENVOY.METRICS.PATH}

The path Zabbix will scrape metrics in prometheus format from.

/stats/prometheus
{$ENVOY.CERT.MIN}

Minimum number of days before certificate expiration used for trigger expression.

7

Items

Name Description Type Key and additional info
Envoy Proxy: Get node metrics

Get server metrics.

HTTP agent envoy.get_metrics

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

Envoy Proxy: Server state

State of the server.

Live - (default) Server is live and serving traffic.

Draining - Server is draining listeners in response to external health checks failing.

Pre initializing - Server has not yet completed cluster manager initialization.

Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS).

Dependent item envoy.server.state

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_state)

  • Discard unchanged with heartbeat: 3h

Envoy Proxy: Server live

1 if the server is not currently draining, 0 otherwise.

Dependent item envoy.server.live

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_live)

  • Discard unchanged with heartbeat: 3h

Envoy Proxy: Uptime

Current server uptime in seconds.

Dependent item envoy.server.uptime

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_uptime)

    Custom on fail: Discard value

Envoy Proxy: Certificate expiration, day before

Number of days until the next certificate being managed will expire.

Dependent item envoy.server.days_until_first_cert_expiring

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_days_until_first_cert_expiring)

Envoy Proxy: Server concurrency

Number of worker threads.

Dependent item envoy.server.concurrency

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_concurrency)

Envoy Proxy: Memory allocated

Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart.

Dependent item envoy.server.memory_allocated

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_memory_allocated)

Envoy Proxy: Memory heap size

Current reserved heap size in bytes. New Envoy process heap size on hot restart.

Dependent item envoy.server.memory_heap_size

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_memory_heap_size)

Envoy Proxy: Memory physical size

Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart.

Dependent item envoy.server.memory_physical_size

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_memory_physical_size)

Envoy Proxy: Filesystem, flushed by timer rate

Total number of times internal flush buffers are written to a file due to flush timeout per second.

Dependent item envoy.filesystem.flushed_by_timer.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_flushed_by_timer)

  • Change per second
Envoy Proxy: Filesystem, write completed rate

Total number of times a file was written per second.

Dependent item envoy.filesystem.write_completed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_write_completed)

  • Change per second
Envoy Proxy: Filesystem, write failed rate

Total number of times an error occurred during a file write operation per second.

Dependent item envoy.filesystem.write_failed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_write_failed)

  • Change per second
Envoy Proxy: Filesystem, reopen failed rate

Total number of times a file was failed to be opened per second.

Dependent item envoy.filesystem.reopen_failed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_reopen_failed)

  • Change per second
Envoy Proxy: Connections, total

Total connections of both new and old Envoy processes.

Dependent item envoy.server.total_connections

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_total_connections)

Envoy Proxy: Connections, parent

Total connections of the old Envoy process on hot restart.

Dependent item envoy.server.parent_connections

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_parent_connections)

Envoy Proxy: Clusters, warming

Number of currently warming (not active) clusters.

Dependent item envoy.cluster_manager.warming_clusters

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_warming_clusters)

Envoy Proxy: Clusters, active

Number of currently active (warmed) clusters.

Dependent item envoy.cluster_manager.active_clusters

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_active_clusters)

Envoy Proxy: Clusters, added rate

Total clusters added (either via static config or CDS) per second.

Dependent item envoy.cluster_manager.cluster_added.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_added)

  • Change per second
Envoy Proxy: Clusters, modified rate

Total clusters modified (via CDS) per second.

Dependent item envoy.cluster_manager.cluster_modified.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_modified)

  • Change per second
Envoy Proxy: Clusters, removed rate

Total clusters removed (via CDS) per second.

Dependent item envoy.cluster_manager.cluster_removed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_removed)

  • Change per second
Envoy Proxy: Clusters, updates rate

Total cluster updates per second.

Dependent item envoy.cluster_manager.cluster_updated.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_updated)

  • Change per second
Envoy Proxy: Listeners, active

Number of currently active listeners.

Dependent item envoy.listener_manager.total_listeners_active

Preprocessing

  • Prometheus pattern: SUM(envoy_listener_manager_total_listeners_active)

Envoy Proxy: Listeners, draining

Number of currently draining listeners.

Dependent item envoy.listener_manager.total_listeners_draining

Preprocessing

  • Prometheus pattern: SUM(envoy_listener_manager_total_listeners_draining)

Envoy Proxy: Listener, warming

Number of currently warming listeners.

Dependent item envoy.listener_manager.total_listeners_warming

Preprocessing

  • Prometheus pattern: SUM(envoy_listener_manager_total_listeners_warming)

Envoy Proxy: Listener manager, initialized

A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers.

Dependent item envoy.listener_manager.workers_started

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_workers_started)

  • Discard unchanged with heartbeat: 3h

Envoy Proxy: Listeners, create failure

Total failed listener object additions to workers per second.

Dependent item envoy.listener_manager.listener_create_failure.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_create_failure)

  • Change per second
Envoy Proxy: Listeners, create success

Total listener objects successfully added to workers per second.

Dependent item envoy.listener_manager.listener_create_success.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_create_success)

  • Change per second
Envoy Proxy: Listeners, added

Total listeners added (either via static config or LDS) per second.

Dependent item envoy.listener_manager.listener_added.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_added)

  • Change per second
Envoy Proxy: Listeners, stopped

Total listeners stopped per second.

Dependent item envoy.listener_manager.listener_stopped.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_stopped)

  • Change per second

Triggers

Name Description Expression Severity Dependencies and additional info
Envoy Proxy: Server state is not live last(/Envoy Proxy by HTTP/envoy.server.state) > 0 Average
Envoy Proxy: Service has been restarted

Uptime is less than 10 minutes.

last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m Info Manual close: Yes
Envoy Proxy: Failed to fetch metrics data

Zabbix has not received data for items for the last 10 minutes.

nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 Warning Manual close: Yes
Envoy Proxy: SSL certificate expires soon

Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire.

last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} Warning

LLD rule Cluster metrics discovery

Name Description Type Key and additional info
Cluster metrics discovery Dependent item envoy.lld.cluster

Preprocessing

  • Prometheus to JSON: envoy_cluster_membership_total

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cluster metrics discovery

Name Description Type Key and additional info
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total

Current cluster membership total.

Dependent item envoy.cluster.membership_total["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy

Current cluster healthy total (inclusive of both health checking and outlier detection).

Dependent item envoy.cluster.membership_healthy["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy

Current cluster unhealthy.

Calculated envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded

Current cluster degraded total.

Dependent item envoy.cluster.membership_degraded["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total

Current cluster total connections.

Dependent item envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active

Current cluster total active connections.

Dependent item envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate

Current cluster request total per second.

Dependent item envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate

Current cluster requests that timed out waiting for a response per second.

Dependent item envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate

Total upstream requests completed per second.

Dependent item envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending

Total active requests pending a connection pool connection.

Dependent item envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active

Total active requests.

Dependent item envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate

Total sent connection bytes per second.

Dependent item envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate

Total received connection bytes per second.

Dependent item envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second

Trigger prototypes for Cluster metrics discovery

Name Description Expression Severity Dependencies and additional info
Envoy Proxy: There are unhealthy clusters last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 Average

LLD rule Listeners metrics discovery

Name Description Type Key and additional info
Listeners metrics discovery Dependent item envoy.lld.listeners

Preprocessing

  • Prometheus to JSON: envoy_listener_downstream_cx_active

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Listeners metrics discovery

Name Description Type Key and additional info
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active

Total active connections.

Dependent item envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate

Total connections per second.

Dependent item envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing

Sockets currently undergoing listener filter processing.

Dependent item envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

LLD rule HTTP metrics discovery

Name Description Type Key and additional info
HTTP metrics discovery Dependent item envoy.lld.http

Preprocessing

  • Prometheus to JSON: envoy_http_downstream_rq_total

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for HTTP metrics discovery

Name Description Type Key and additional info
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate

Total active connections per second.

Dependent item envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active

Total active requests.

Dependent item envoy.http.downstream_rq_active["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate

Total requests closed due to a timeout on the request path per second.

Dependent item envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate

Total connections per second.

Dependent item envoy.http.downstream_cx_total["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active

Total active connections.

Dependent item envoy.http.downstream_cx_active["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate

Total bytes received per second.

Dependent item envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate

Total bytes sent per second.

Dependent item envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums