You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

18 KiB

AWS ECS Cluster by HTTP

Overview

The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the (CloudWatch pricing)[https://aws.amazon.com/cloudwatch/pricing/] page.

Additional information about the metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • AWS ECS Cluster by HTTP

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:Describe*",
              "cloudwatch:Get*",
              "cloudwatch:List*",
              "ecs:Describe*",
              "ecs:List*"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Set the following macros "{$AWS.ACCESS.KEY.ID}", "{$AWS.SECRET.ACCESS.KEY}", "{$AWS.REGION}", "{$AWS.ECS.CLUSTER.NAME}"

For more information about managing access keys, see official documentation

Refer to the Macros section for a list of macros used for LLD filters.

Additional information about the metrics and used API methods:

Macros used

Name Description Default
{$AWS.PROXY}

Sets HTTP proxy value. If this macro is empty then no proxy is used.

{$AWS.ACCESS.KEY.ID}

Access key ID.

{$AWS.SECRET.ACCESS.KEY}

Secret access key.

{$AWS.REGION}

Amazon ECS Region code.

us-west-1
{$AWS.ECS.CLUSTER.NAME}

ECS cluster name.

{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}

Filter of discoverable alarms by name.

.*
{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}

Filter to exclude discovered alarms by name.

CHANGE_IF_NEEDED
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}

Filter of discoverable services by name.

.*
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}

Filter to exclude discovered services by name.

CHANGE_IF_NEEDED
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}

The warning threshold of the cluster CPU utilization expressed in %.

70
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}

The warning threshold of the cluster memory utilization expressed in %.

70
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}

The warning threshold of the cluster service CPU utilization expressed in %.

80
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}

The warning threshold of the cluster service memory utilization expressed in %.

80

Items

Name Description Type Key and additional info
AWS ECS Cluster: Get cluster metrics

Get cluster metrics.

Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html

Script aws.ecs.get_metrics

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

AWS ECS Cluster: Get cluster services

Get cluster services.

Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html

Script aws.ecs.get_cluster_services

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

AWS ECS Cluster: Get alarms data

Get alarms data.

DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html

Script aws.ecs.get_alarms

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

AWS ECS Cluster: Get metrics check

Data collection check.

Dependent item aws.ecs.metrics.check

Preprocessing

  • JSON Path: $.error

    Custom on fail: Set value to

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster: Get alarms check

Data collection check.

Dependent item aws.ecs.alarms.check

Preprocessing

  • JSON Path: $.error

    Custom on fail: Set value to

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster: Container Instance Count

'The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.'

Dependent item aws.ecs.container_instance_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster: Task Count

'The number of tasks running in the cluster.'

Dependent item aws.ecs.task_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster: Service Count

'The number of services in the cluster.'

Dependent item aws.ecs.service_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster: CPU Reserved

'A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined CPU reservation in their task definition.'

Dependent item aws.ecs.cpu_reserved

Preprocessing

  • JSON Path: $.[?(@.Label == "CpuReserved")].Values.first().first()

    Custom on fail: Discard value

AWS ECS Cluster: CPU Utilization

Cluster CPU utilization

Dependent item aws.ecs.cpu_utilization

Preprocessing

  • JSON Path: $.CPUUtilization

    Custom on fail: Discard value

AWS ECS Cluster: Memory Utilization

'The memory being used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.memory_utilization

Preprocessing

  • JSON Path: $.MemoryUtilization

    Custom on fail: Discard value

AWS ECS Cluster: Network rx bytes

'The number of bytes received by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.network.rx

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster: Network tx bytes

'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.network.tx

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

Triggers

Name Description Expression Severity Dependencies and additional info
AWS ECS Cluster: Failed to get metrics data length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0 Warning
AWS ECS Cluster: Failed to get alarms data length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0 Warning
AWS ECS Cluster: High CPU utilization

The CPU utilization is too high. The system might be slow to respond.

min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} Warning
AWS ECS Cluster: High memory utilization

The system is running out of free memory.

min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} Warning

LLD rule Cluster Alarms discovery

Name Description Type Key and additional info
Cluster Alarms discovery

Discovery instance alarms.

Dependent item aws.ecs.alarms.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Alarms discovery

Name Description Type Key and additional info
AWS ECS Cluster Alarms: ["{#ALARM_NAME}"]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]

Preprocessing

  • JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()

    Custom on fail: Discard value

AWS ECS Cluster Alarms: ["{#ALARM_NAME}"]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item aws.ecs.alarm.state_reason["{#ALARM_NAME}"]

Preprocessing

  • JSON Path: $.StateReason

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster Alarms: ["{#ALARM_NAME}"]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item aws.ecs.alarm.state["{#ALARM_NAME}"]

Preprocessing

  • JSON Path: $.StateValue

    Custom on fail: Set value to: 3

  • JavaScript: The text is too long. Please see the template.

Trigger prototypes for Cluster Alarms discovery

Name Description Expression Severity Dependencies and additional info
AWS ECS Cluster Alarms: "{#ALARM_NAME}" has 'Alarm' state

Alarm "{#ALARM_NAME}" has 'Alarm' state.
Reason: {ITEM.LASTVALUE2}

last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0 Average
AWS ECS Cluster Alarms: "{#ALARM_NAME}" has 'Insufficient data' state last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1 Info

LLD rule Cluster Services discovery

Name Description Type Key and additional info
Cluster Services discovery

Discovery {$AWS.ECS.CLUSTER.NAME} services.

Dependent item aws.ecs.services.discovery

Preprocessing

  • JSON Path: $.services

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Services discovery

Name Description Type Key and additional info
AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Running Task

The number of tasks currently in the running state.

Dependent item aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Pending Task

The number of tasks currently in the pending state.

Dependent item aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Desired Task

The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.

Dependent item aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Task Set

The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.

Dependent item aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: CPU Reserved

"A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined CPU reservation in their task definition."

Dependent item aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: CPU Utilization

"A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined CPU reservation in their task definition."

Dependent item aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Memory utilized

'The memory being used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Custom multiplier: 1048576

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Memory utilization

'The memory being used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Memory reserved

'The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

  • Custom multiplier: 1048576

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Network rx bytes

'The number of bytes received by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Network tx bytes

'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    Custom on fail: Discard value

AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Get metrics

Get metrics of ESC services.

Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html

Script aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • Check for not supported value

    Custom on fail: Discard value

Trigger prototypes for Cluster Services discovery

Name Description Expression Severity Dependencies and additional info
AWS ECS Cluster Service: High CPU utilization

The CPU utilization is too high. The system might be slow to respond.

min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} Warning
AWS ECS Cluster Service: High memory utilization

The system is running out of free memory.

min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums