|
|
|
|
|
# AWS ECS Cluster by HTTP
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts.
|
|
|
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
|
|
|
*NOTE*
|
|
|
This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics.
|
|
|
For more information, please refer to the (CloudWatch pricing)[https://aws.amazon.com/cloudwatch/pricing/] page.
|
|
|
|
|
|
Additional information about the metrics and used API methods:
|
|
|
|
|
|
* Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics-ECS.html
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
Zabbix version: 7.0 and higher.
|
|
|
|
|
|
## Tested versions
|
|
|
|
|
|
This template has been tested on:
|
|
|
- AWS ECS Cluster by HTTP
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
> Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section.
|
|
|
|
|
|
## Setup
|
|
|
|
|
|
The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.
|
|
|
|
|
|
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
|
|
|
|
|
|
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.
|
|
|
```json
|
|
|
{
|
|
|
"Version":"2012-10-17",
|
|
|
"Statement":[
|
|
|
{
|
|
|
"Action":[
|
|
|
"cloudwatch:Describe*",
|
|
|
"cloudwatch:Get*",
|
|
|
"cloudwatch:List*",
|
|
|
"ecs:Describe*",
|
|
|
"ecs:List*"
|
|
|
],
|
|
|
"Effect":"Allow",
|
|
|
"Resource":"*"
|
|
|
}
|
|
|
]
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Set the following macros "{$AWS.ACCESS.KEY.ID}", "{$AWS.SECRET.ACCESS.KEY}", "{$AWS.REGION}", "{$AWS.ECS.CLUSTER.NAME}"
|
|
|
|
|
|
For more information about managing access keys, see [official documentation](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys)
|
|
|
|
|
|
Refer to the Macros section for a list of macros used for LLD filters.
|
|
|
|
|
|
Additional information about the metrics and used API methods:
|
|
|
* Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html
|
|
|
|
|
|
### Macros used
|
|
|
|
|
|
|Name|Description|Default|
|
|
|
|----|-----------|-------|
|
|
|
|{$AWS.PROXY}|<p>Sets HTTP proxy value. If this macro is empty then no proxy is used.</p>||
|
|
|
|{$AWS.ACCESS.KEY.ID}|<p>Access key ID.</p>||
|
|
|
|{$AWS.SECRET.ACCESS.KEY}|<p>Secret access key.</p>||
|
|
|
|{$AWS.REGION}|<p>Amazon ECS Region code.</p>|`us-west-1`|
|
|
|
|{$AWS.ECS.CLUSTER.NAME}|<p>ECS cluster name.</p>||
|
|
|
|{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}|<p>Filter of discoverable alarms by name.</p>|`.*`|
|
|
|
|{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}|<p>Filter to exclude discovered alarms by name.</p>|`CHANGE_IF_NEEDED`|
|
|
|
|{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}|<p>Filter of discoverable services by name.</p>|`.*`|
|
|
|
|{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}|<p>Filter to exclude discovered services by name.</p>|`CHANGE_IF_NEEDED`|
|
|
|
|{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}|<p>The warning threshold of the cluster CPU utilization expressed in %.</p>|`70`|
|
|
|
|{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}|<p>The warning threshold of the cluster memory utilization expressed in %.</p>|`70`|
|
|
|
|{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}|<p>The warning threshold of the cluster service CPU utilization expressed in %.</p>|`80`|
|
|
|
|{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}|<p>The warning threshold of the cluster service memory utilization expressed in %.</p>|`80`|
|
|
|
|
|
|
### Items
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|AWS ECS Cluster: Get cluster metrics|<p>Get cluster metrics.</p><p>Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html</p>|Script|aws.ecs.get_metrics<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Get cluster services|<p>Get cluster services.</p><p>Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html</p>|Script|aws.ecs.get_cluster_services<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Get alarms data|<p>Get alarms data.</p><p>DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html</p>|Script|aws.ecs.get_alarms<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Get metrics check|<p>Data collection check.</p>|Dependent item|aws.ecs.metrics.check<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.error`</p><p>⛔️Custom on fail: Set value to</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster: Get alarms check|<p>Data collection check.</p>|Dependent item|aws.ecs.alarms.check<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.error`</p><p>⛔️Custom on fail: Set value to</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster: Container Instance Count|<p>'The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.'</p>|Dependent item|aws.ecs.container_instance_count<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Task Count|<p>'The number of tasks running in the cluster.'</p>|Dependent item|aws.ecs.task_count<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Service Count|<p>'The number of services in the cluster.'</p>|Dependent item|aws.ecs.service_count<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: CPU Reserved|<p>'A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.</p><p> This metric is only collected for tasks that have a defined CPU reservation in their task definition.'</p>|Dependent item|aws.ecs.cpu_reserved<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.[?(@.Label == "CpuReserved")].Values.first().first()`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: CPU Utilization|<p>Cluster CPU utilization</p>|Dependent item|aws.ecs.cpu_utilization<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.CPUUtilization`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Memory Utilization|<p>'The memory being used by tasks in the resource that is specified by the dimension set that you're using.</p><p> This metric is only collected for tasks that have a defined memory reservation in their task definition.'</p>|Dependent item|aws.ecs.memory_utilization<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.MemoryUtilization`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Network rx bytes|<p>'The number of bytes received by the resource that is specified by the dimensions that you're using.</p><p> This metric is only available for containers in tasks using the awsvpc or bridge network modes.'</p>|Dependent item|aws.ecs.network.rx<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster: Network tx bytes|<p>'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.</p><p> This metric is only available for containers in tasks using the awsvpc or bridge network modes.'</p>|Dependent item|aws.ecs.network.tx<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|
|
|
### Triggers
|
|
|
|
|
|
|Name|Description|Expression|Severity|Dependencies and additional info|
|
|
|
|----|-----------|----------|--------|--------------------------------|
|
|
|
|AWS ECS Cluster: Failed to get metrics data||`length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0`|Warning||
|
|
|
|AWS ECS Cluster: Failed to get alarms data||`length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0`|Warning||
|
|
|
|AWS ECS Cluster: High CPU utilization|<p>The CPU utilization is too high. The system might be slow to respond.</p>|`min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}`|Warning||
|
|
|
|AWS ECS Cluster: High memory utilization|<p>The system is running out of free memory.</p>|`min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}`|Warning||
|
|
|
|
|
|
### LLD rule Cluster Alarms discovery
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|Cluster Alarms discovery|<p>Discovery instance alarms.</p>|Dependent item|aws.ecs.alarms.discovery<p>**Preprocessing**</p><ul><li><p>JavaScript: `The text is too long. Please see the template.`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|
### Item prototypes for Cluster Alarms discovery
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|AWS ECS Cluster Alarms: ["{#ALARM_NAME}"]: Get metrics|<p>Get alarm metrics about the state and its reason.</p>|Dependent item|aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].first()`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster Alarms: ["{#ALARM_NAME}"]: State reason|<p>An explanation for the alarm state, in text format.</p><p>Alarm description:</p><p>{#ALARM_DESCRIPTION}</p>|Dependent item|aws.ecs.alarm.state_reason["{#ALARM_NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.StateReason`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster Alarms: ["{#ALARM_NAME}"]: State|<p>The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).</p><p>Alarm description:</p><p>{#ALARM_DESCRIPTION}</p>|Dependent item|aws.ecs.alarm.state["{#ALARM_NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.StateValue`</p><p>⛔️Custom on fail: Set value to: `3`</p></li><li><p>JavaScript: `The text is too long. Please see the template.`</p></li></ul>|
|
|
|
|
|
|
### Trigger prototypes for Cluster Alarms discovery
|
|
|
|
|
|
|Name|Description|Expression|Severity|Dependencies and additional info|
|
|
|
|----|-----------|----------|--------|--------------------------------|
|
|
|
|AWS ECS Cluster Alarms: "{#ALARM_NAME}" has 'Alarm' state|<p>Alarm "{#ALARM_NAME}" has 'Alarm' state. <br>Reason: {ITEM.LASTVALUE2}</p>|`last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0`|Average||
|
|
|
|AWS ECS Cluster Alarms: "{#ALARM_NAME}" has 'Insufficient data' state||`last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1`|Info||
|
|
|
|
|
|
### LLD rule Cluster Services discovery
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|Cluster Services discovery|<p>Discovery {$AWS.ECS.CLUSTER.NAME} services.</p>|Dependent item|aws.ecs.services.discovery<p>**Preprocessing**</p><ul><li><p>JSON Path: `$.services`</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|
|
|
### Item prototypes for Cluster Services discovery
|
|
|
|
|
|
|Name|Description|Type|Key and additional info|
|
|
|
|----|-----------|----|-----------------------|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Running Task|<p>The number of tasks currently in the `running` state.</p>|Dependent item|aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Pending Task|<p>The number of tasks currently in the `pending` state.</p>|Dependent item|aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Desired Task|<p>The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.</p>|Dependent item|aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Task Set|<p>The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.</p>|Dependent item|aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Discard unchanged with heartbeat: `3h`</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: CPU Reserved|<p>"A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.</p><p> This metric is only collected for tasks that have a defined CPU reservation in their task definition."</p>|Dependent item|aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: CPU Utilization|<p>"A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using.</p><p> This metric is only collected for tasks that have a defined CPU reservation in their task definition."</p>|Dependent item|aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Memory utilized|<p>'The memory being used by tasks in the resource that is specified by the dimension set that you're using.</p><p>This metric is only collected for tasks that have a defined memory reservation in their task definition.'</p>|Dependent item|aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1048576`</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Memory utilization|<p>'The memory being used by tasks in the resource that is specified by the dimension set that you're using.</p><p>This metric is only collected for tasks that have a defined memory reservation in their task definition.'</p>|Dependent item|aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Memory reserved|<p>'The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. </p><p>This metric is only collected for tasks that have a defined memory reservation in their task definition.'</p>|Dependent item|aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li><li><p>Custom multiplier: `1048576`</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Network rx bytes|<p>'The number of bytes received by the resource that is specified by the dimensions that you're using.</p><p>This metric is only available for containers in tasks using the awsvpc or bridge network modes.'</p>|Dependent item|aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Network tx bytes|<p>'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.</p><p>This metric is only available for containers in tasks using the awsvpc or bridge network modes.'</p>|Dependent item|aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>JSON Path: `The text is too long. Please see the template.`</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|AWS ECS Cluster Service: ["{#AWS.ECS.SERVICE.NAME}"]: Get metrics|<p>Get metrics of ESC services.</p><p>Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html</p>|Script|aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"]<p>**Preprocessing**</p><ul><li><p>Check for not supported value</p><p>⛔️Custom on fail: Discard value</p></li></ul>|
|
|
|
|
|
|
### Trigger prototypes for Cluster Services discovery
|
|
|
|
|
|
|Name|Description|Expression|Severity|Dependencies and additional info|
|
|
|
|----|-----------|----------|--------|--------------------------------|
|
|
|
|AWS ECS Cluster Service: High CPU utilization|<p>The CPU utilization is too high. The system might be slow to respond.</p>|`min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}`|Warning||
|
|
|
|AWS ECS Cluster Service: High memory utilization|<p>The system is running out of free memory.</p>|`min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}`|Warning||
|
|
|
|
|
|
## Feedback
|
|
|
|
|
|
Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com)
|
|
|
|
|
|
You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)
|
|
|
|