# AWS ECS Cluster by HTTP ## Overview The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. *NOTE* This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the (CloudWatch pricing)[https://aws.amazon.com/cloudwatch/pricing/] page. Additional information about the metrics and used API methods: * Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics-ECS.html ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - AWS ECS Cluster by HTTP ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API. Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions. Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics. ```json { "Version":"2012-10-17", "Statement":[ { "Action":[ "cloudwatch:Describe*", "cloudwatch:Get*", "cloudwatch:List*", "ecs:Describe*", "ecs:List*" ], "Effect":"Allow", "Resource":"*" } ] } ``` Set the following macros "{$AWS.ACCESS.KEY.ID}", "{$AWS.SECRET.ACCESS.KEY}", "{$AWS.REGION}", "{$AWS.ECS.CLUSTER.NAME}" For more information about managing access keys, see [official documentation](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) Refer to the Macros section for a list of macros used for LLD filters. Additional information about the metrics and used API methods: * Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html ### Macros used |Name|Description|Default| |----|-----------|-------| |{$AWS.PROXY}|
Sets HTTP proxy value. If this macro is empty then no proxy is used.
|| |{$AWS.ACCESS.KEY.ID}|Access key ID.
|| |{$AWS.SECRET.ACCESS.KEY}|Secret access key.
|| |{$AWS.REGION}|Amazon ECS Region code.
|`us-west-1`| |{$AWS.ECS.CLUSTER.NAME}|ECS cluster name.
|| |{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}|Filter of discoverable alarms by name.
|`.*`| |{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}|Filter to exclude discovered alarms by name.
|`CHANGE_IF_NEEDED`| |{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}|Filter of discoverable services by name.
|`.*`| |{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}|Filter to exclude discovered services by name.
|`CHANGE_IF_NEEDED`| |{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}|The warning threshold of the cluster CPU utilization expressed in %.
|`70`| |{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}|The warning threshold of the cluster memory utilization expressed in %.
|`70`| |{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}|The warning threshold of the cluster service CPU utilization expressed in %.
|`80`| |{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}|The warning threshold of the cluster service memory utilization expressed in %.
|`80`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |AWS ECS Cluster: Get cluster metrics|Get cluster metrics.
Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html
|Script|aws.ecs.get_metrics**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
Get cluster services.
Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html
|Script|aws.ecs.get_cluster_services**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
Get alarms data.
DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html
|Script|aws.ecs.get_alarms**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
Data collection check.
|Dependent item|aws.ecs.metrics.check**Preprocessing**
JSON Path: `$.error`
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: `3h`
Data collection check.
|Dependent item|aws.ecs.alarms.check**Preprocessing**
JSON Path: `$.error`
⛔️Custom on fail: Set value to
Discard unchanged with heartbeat: `3h`
'The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.'
|Dependent item|aws.ecs.container_instance_count**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'The number of tasks running in the cluster.'
|Dependent item|aws.ecs.task_count**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'The number of services in the cluster.'
|Dependent item|aws.ecs.service_count**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined CPU reservation in their task definition.'
|Dependent item|aws.ecs.cpu_reserved**Preprocessing**
JSON Path: `$.[?(@.Label == "CpuReserved")].Values.first().first()`
⛔️Custom on fail: Discard value
Cluster CPU utilization
|Dependent item|aws.ecs.cpu_utilization**Preprocessing**
JSON Path: `$.CPUUtilization`
⛔️Custom on fail: Discard value
'The memory being used by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined memory reservation in their task definition.'
|Dependent item|aws.ecs.memory_utilization**Preprocessing**
JSON Path: `$.MemoryUtilization`
⛔️Custom on fail: Discard value
'The number of bytes received by the resource that is specified by the dimensions that you're using.
This metric is only available for containers in tasks using the awsvpc or bridge network modes.'
|Dependent item|aws.ecs.network.rx**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.
This metric is only available for containers in tasks using the awsvpc or bridge network modes.'
|Dependent item|aws.ecs.network.tx**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
The CPU utilization is too high. The system might be slow to respond.
|`min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}`|Warning|| |AWS ECS Cluster: High memory utilization|The system is running out of free memory.
|`min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}`|Warning|| ### LLD rule Cluster Alarms discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Cluster Alarms discovery|Discovery instance alarms.
|Dependent item|aws.ecs.alarms.discovery**Preprocessing**
JavaScript: `The text is too long. Please see the template.`
Discard unchanged with heartbeat: `3h`
Get alarm metrics about the state and its reason.
|Dependent item|aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]**Preprocessing**
JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].first()`
⛔️Custom on fail: Discard value
An explanation for the alarm state, in text format.
Alarm description:
{#ALARM_DESCRIPTION}
|Dependent item|aws.ecs.alarm.state_reason["{#ALARM_NAME}"]**Preprocessing**
JSON Path: `$.StateReason`
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: `3h`
The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).
Alarm description:
{#ALARM_DESCRIPTION}
|Dependent item|aws.ecs.alarm.state["{#ALARM_NAME}"]**Preprocessing**
JSON Path: `$.StateValue`
⛔️Custom on fail: Set value to: `3`
JavaScript: `The text is too long. Please see the template.`
Alarm "{#ALARM_NAME}" has 'Alarm' state.
Reason: {ITEM.LASTVALUE2}
Discovery {$AWS.ECS.CLUSTER.NAME} services.
|Dependent item|aws.ecs.services.discovery**Preprocessing**
JSON Path: `$.services`
Discard unchanged with heartbeat: `3h`
The number of tasks currently in the `running` state.
|Dependent item|aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: `3h`
The number of tasks currently in the `pending` state.
|Dependent item|aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: `3h`
The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.
|Dependent item|aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: `3h`
The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.
|Dependent item|aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: `3h`
"A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined CPU reservation in their task definition."
|Dependent item|aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
"A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined CPU reservation in their task definition."
|Dependent item|aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'The memory being used by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined memory reservation in their task definition.'
|Dependent item|aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Custom multiplier: `1048576`
'The memory being used by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined memory reservation in their task definition.'
|Dependent item|aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using.
This metric is only collected for tasks that have a defined memory reservation in their task definition.'
|Dependent item|aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Custom multiplier: `1048576`
'The number of bytes received by the resource that is specified by the dimensions that you're using.
This metric is only available for containers in tasks using the awsvpc or bridge network modes.'
|Dependent item|aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.
This metric is only available for containers in tasks using the awsvpc or bridge network modes.'
|Dependent item|aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Get metrics of ESC services.
Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html
|Script|aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"]**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
The CPU utilization is too high. The system might be slow to respond.
|`min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}`|Warning|| |AWS ECS Cluster Service: High memory utilization|The system is running out of free memory.
|`min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}`|Warning|| ## Feedback Please report any issues with the template at [`https://support.zabbix.com`](https://support.zabbix.com) You can also provide feedback, discuss the template, or ask for help at [`ZABBIX forums`](https://www.zabbix.com/forum/zabbix-suggestions-and-feedback)