# GitLab by HTTP ## Overview This template is designed to monitor GitLab by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. The template `GitLab by HTTP` — collects metrics by an HTTP agent from the GitLab `/-/metrics` endpoint. See https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html. ## Requirements Zabbix version: 7.0 and higher. ## Tested versions This template has been tested on: - GitLab 13.5.3 EE ## Configuration > Zabbix should be configured according to the instructions in the [Templates out of the box](https://www.zabbix.com/documentation/7.0/manual/config/templates_out_of_the_box) section. ## Setup This template works with self-hosted GitLab instances. Internal service metrics are collected from the GitLab `/-/metrics` endpoint. To access metrics following two methods are available: 1. Explicitly allow monitoring instance IP address in gitlab [whitelist configuration](https://docs.gitlab.com/ee/administration/monitoring/ip_whitelist.html). 2. Get token from Gitlab `Admin -> Monitoring -> Health check` page: http://your.gitlab.address/admin/health_check; Use this token in macro `{$GITLAB.HEALTH.TOKEN}` as variable path, like: `?token=your_token`. Remember to change the macros `{$GITLAB.URL}`. Also, see the Macros section for a list of [macros used](#Macros-used) to set trigger values. *NOTE.* Some metrics may not be collected depending on your Gitlab instance version and configuration. See [Gitlab's documentation](https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html) for further information about its metric collection. ### Macros used |Name|Description|Default| |----|-----------|-------| |{$GITLAB.URL}|
URL of a GitLab instance.
|`http://localhost`| |{$GITLAB.HEALTH.TOKEN}|The token path for Gitlab health check. Example `?token=your_token`
|| |{$GITLAB.UNICORN.UTILIZATION.MAX.WARN}|The maximum percentage of Unicorn workers utilization for a trigger expression.
|`90`| |{$GITLAB.PUMA.UTILIZATION.MAX.WARN}|The maximum percentage of Puma thread utilization for a trigger expression.
|`90`| |{$GITLAB.HTTP.FAIL.MAX.WARN}|The maximum number of HTTP request failures for a trigger expression.
|`2`| |{$GITLAB.REDIS.FAIL.MAX.WARN}|The maximum number of Redis client exceptions for a trigger expression.
|`2`| |{$GITLAB.UNICORN.QUEUE.MAX.WARN}|The maximum number of Unicorn queued requests for a trigger expression.
|`1`| |{$GITLAB.PUMA.QUEUE.MAX.WARN}|The maximum number of Puma queued requests for a trigger expression.
|`1`| |{$GITLAB.OPEN.FDS.MAX.WARN}|The maximum percentage of used file descriptors for a trigger expression.
|`90`| ### Items |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |GitLab: Get instance metrics||HTTP agent|gitlab.get_metrics**Preprocessing**
Check for not supported value
⛔️Custom on fail: Discard value
The readiness probe checks whether the GitLab instance is ready to accept traffic via Rails Controllers.
|HTTP agent|gitlab.readiness**Preprocessing**
Check for not supported value
⛔️Custom on fail: Set value to: `{"master_check":[{"status":"failed"}]}`
JSON Path: `$.master_check[0].status`
Boolean to decimal
⛔️Custom on fail: Set value to: `0`
Discard unchanged with heartbeat: `30m`
Checks whether the application server is running. This probe is used to know if Rails Controllers are not deadlocked due to a multi-threading.
|HTTP agent|gitlab.liveness**Preprocessing**
Check for not supported value
⛔️Custom on fail: Set value to: `{"status": "failed"}`
JSON Path: `$.status`
Boolean to decimal
⛔️Custom on fail: Set value to: `0`
Discard unchanged with heartbeat: `30m`
Version of the GitLab instance.
|Dependent item|gitlab.deployments.version**Preprocessing**
JSON Path: `$[?(@.name=="deployments")].labels.version.first()`
Discard unchanged with heartbeat: `3h`
Minimum UNIX timestamp of ruby processes start time.
|Dependent item|gitlab.ruby.process_start_time_seconds.first**Preprocessing**
JSON Path: `$[?(@.name=="ruby_process_start_time_seconds")].value.min()`
Discard unchanged with heartbeat: `3h`
Maximum UNIX timestamp ruby processes start time.
|Dependent item|gitlab.ruby.process_start_time_seconds.last**Preprocessing**
JSON Path: `$[?(@.name=="ruby_process_start_time_seconds")].value.max()`
Discard unchanged with heartbeat: `3h`
Counter of how many users have logged in since GitLab was started or restarted.
|Dependent item|gitlab.user_session_logins_total**Preprocessing**
JSON Path: `$[?(@.name=="user_session_logins_total")].value.first()`
⛔️Custom on fail: Discard value
Counter of failed CAPTCHA attempts during login.
|Dependent item|gitlab.failed_login_captcha_total**Preprocessing**
JSON Path: `$[?(@.name=="failed_login_captcha_total")].value.first()`
⛔️Custom on fail: Discard value
Counter of successful CAPTCHA attempts during login.
|Dependent item|gitlab.successful_login_captcha_total**Preprocessing**
JSON Path: `$[?(@.name=="successful_login_captcha_total")].value.first()`
⛔️Custom on fail: Discard value
Number of times an upload record could not find its file.
|Dependent item|gitlab.upload_file_does_not_exist**Preprocessing**
JSON Path: `$[?(@.name=="upload_file_does_not_exist")].value.first()`
⛔️Custom on fail: Discard value
Total amount of pipeline processing events.
|Dependent item|gitlab.pipeline.processing_events_total**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Counter of pipelines created.
|Dependent item|gitlab.pipeline.created_total**Preprocessing**
JSON Path: `$[?(@.name=="pipelines_created_total")].value.sum()`
⛔️Custom on fail: Discard value
Counter of completed Auto DevOps pipelines.
|Dependent item|gitlab.pipeline.auto_devops_completed.total**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Counter of completed Auto DevOps pipelines with status "failed".
|Dependent item|gitlab.pipeline.auto_devops_completed_total.failed**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
The sum of the time in seconds it takes to create a CI/CD pipeline.
|Dependent item|gitlab.pipeline.pipeline_creation**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
The count of the time it takes to create a CI/CD pipeline.
|Dependent item|gitlab.pipeline.pipeline_creation.count**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Connections to the main database in use where the owner is still alive.
|Dependent item|gitlab.database.connection_pool_busy**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Current connections to the main database in the pool.
|Dependent item|gitlab.database.connection_pool_connections**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Connections to the main database in use where the owner is not alive.
|Dependent item|gitlab.database.connection_pool_dead**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Connections to the main database not in use.
|Dependent item|gitlab.database.connection_pool_idle**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Total connection to the main database pool capacity.
|Dependent item|gitlab.database.connection_pool_size**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Threads currently waiting on this queue.
|Dependent item|gitlab.database.connection_pool_waiting**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Number of Redis client requests per second. (Instance: queues)
|Dependent item|gitlab.redis.client_requests.queues.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Number of Redis client requests per second. (Instance: cache)
|Dependent item|gitlab.redis.client_requests.cache.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Number of Redis client requests per second. (Instance: shared_state)
|Dependent item|gitlab.redis.client_requests.shared_state.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Number of Redis client exceptions per second. (Instance: queues)
|Dependent item|gitlab.redis.client_exceptions.queues.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Number of Redis client exceptions per second. (Instance: cache)
|Dependent item|gitlab.redis.client_exceptions.cache.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Number of Redis client exceptions per second. (Instance: shared_state)
|Dependent item|gitlab.redis.client_exceptions.shared_state.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
The cache read miss count.
|Dependent item|gitlab.cache.misses_total.rate**Preprocessing**
JSON Path: `$[?(@.name=="gitlab_cache_misses_total")].value.sum()`
The count of cache operations.
|Dependent item|gitlab.cache.operations_total.rate**Preprocessing**
JSON Path: `$[?(@.name=="gitlab_cache_operations_total")].value.sum()`
Average CPU time util in seconds.
|Dependent item|gitlab.ruby.process_cpu_seconds.rate**Preprocessing**
JSON Path: `$[?(@.name=="ruby_process_cpu_seconds_total")].value.avg()`
⛔️Custom on fail: Discard value
Number of running Ruby threads.
|Dependent item|gitlab.ruby.threads_running**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Average number of opened file descriptors.
|Dependent item|gitlab.ruby.file_descriptors.avg**Preprocessing**
JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.avg()`
Maximum number of opened file descriptors.
|Dependent item|gitlab.ruby.file_descriptors.max**Preprocessing**
JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.max()`
Minimum number of opened file descriptors.
|Dependent item|gitlab.ruby.file_descriptors.min**Preprocessing**
JSON Path: `$[?(@.name=="ruby_file_descriptors")].value.min()`
Maximum number of open file descriptors per process.
|Dependent item|gitlab.ruby.process_max_fds**Preprocessing**
JSON Path: `$[?(@.name=="ruby_process_max_fds")].value.avg()`
Average RSS Memory usage in bytes.
|Dependent item|gitlab.ruby.process_resident_memory_bytes.avg**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Minimum RSS Memory usage in bytes.
|Dependent item|gitlab.ruby.process_resident_memory_bytes.min**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Maximum RSS Memory usage in bytes.
|Dependent item|gitlab.ruby.process_resident_memory_bytes.max**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
Number of requests received into the system.
|Dependent item|gitlab.http.requests.rate**Preprocessing**
JSON Path: `$[?(@.name=="http_requests_total")].value.sum()`
Number of handle failures of requests with HTTP-code 5xx.
|Dependent item|gitlab.http.requests.5xx.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Number of handle failures of requests with code 4XX.
|Dependent item|gitlab.http.requests.4xx.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
Transactions per second (gitlab_transaction_* metrics).
|Dependent item|gitlab.transactions.rate**Preprocessing**
JSON Path: `The text is too long. Please see the template.`
⛔️Custom on fail: Discard value
The application server is not running or Rails Controllers are deadlocked.
|`last(/GitLab by HTTP/gitlab.liveness)=0`|High|| |GitLab: Version has changed|The GitLab version has changed. Acknowledge to close the problem manually.
|`last(/GitLab by HTTP/gitlab.deployments.version,#1)<>last(/GitLab by HTTP/gitlab.deployments.version,#2) and length(last(/GitLab by HTTP/gitlab.deployments.version))>0`|Info|**Manual close**: Yes| |GitLab: Too many Redis queues client exceptions|"Too many Redis client exceptions during the requests to Redis instance queues."
|`min(/GitLab by HTTP/gitlab.redis.client_exceptions.queues.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`|Warning|| |GitLab: Too many Redis cache client exceptions|"Too many Redis client exceptions during the requests to Redis instance cache."
|`min(/GitLab by HTTP/gitlab.redis.client_exceptions.cache.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`|Warning|| |GitLab: Too many Redis shared_state client exceptions|"Too many Redis client exceptions during the requests to Redis instance shared_state."
|`min(/GitLab by HTTP/gitlab.redis.client_exceptions.shared_state.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN}`|Warning|| |GitLab: Failed to fetch info data|Zabbix has not received a metrics data for the last 30 minutes
|`nodata(/GitLab by HTTP/gitlab.ruby.threads_running,30m)=1`|Warning|**Manual close**: Yes"Too many requests failed on GitLab instance with 5xx HTTP code"
|`min(/GitLab by HTTP/gitlab.http.requests.5xx.rate,5m)>{$GITLAB.HTTP.FAIL.MAX.WARN}`|Warning|| ### LLD rule Unicorn metrics discovery |Name|Description|Type|Key and additional info| |----|-----------|----|-----------------------| |Unicorn metrics discovery|DiscoveryUnicorn specific metrics, when Unicorn is used.
|HTTP agent|gitlab.unicorn.discovery**Preprocessing**
Prometheus to JSON: `unicorn_workers`
⛔️Custom on fail: Discard value
JavaScript: `The text is too long. Please see the template.`
The number of Unicorn workers
|Dependent item|gitlab.unicorn.unicorn_workers[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='unicorn_workers')].value.sum()`
The number of active Unicorn connections.
|Dependent item|gitlab.unicorn.active_connections[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='unicorn_active_connections')].value.sum()`
The number of queued Unicorn connections.
|Dependent item|gitlab.unicorn.queued_connections[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='unicorn_queued_connections')].value.sum()`
Discovery of Puma specific metrics when Puma is used.
|HTTP agent|gitlab.puma.discovery**Preprocessing**
Prometheus to JSON: `puma_workers`
JavaScript: `The text is too long. Please see the template.`
Number of puma threads processing a request.
|Dependent item|gitlab.puma.active_connections[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_active_connections')].value.sum()`
Total number of puma workers.
|Dependent item|gitlab.puma.workers[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_workers')].value.sum()`
The number of booted puma workers.
|Dependent item|gitlab.puma.running_workers[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_running_workers')].value.sum()`
The number of old puma workers.
|Dependent item|gitlab.puma.stale_workers[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_stale_workers')].value.sum()`
The number of running puma threads.
|Dependent item|gitlab.puma.running[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_running')].value.sum()`
The number of connections in that puma worker's "todo" set waiting for a worker thread.
|Dependent item|gitlab.puma.queued_connections[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_queued_connections')].value.sum()`
The number of requests the puma worker is capable of taking right now.
|Dependent item|gitlab.puma.pool_capacity[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_pool_capacity')].value.sum()`
The maximum number of puma worker threads.
|Dependent item|gitlab.puma.max_threads[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_max_threads')].value.sum()`
The number of spawned puma threads which are not processing a request.
|Dependent item|gitlab.puma.idle_threads[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_idle_threads')].value.sum()`
The number of workers terminated by PumaWorkerKiller.
|Dependent item|gitlab.puma.killer_terminations_total[{#SINGLETON}]**Preprocessing**
JSON Path: `$[?(@.name=='puma_killer_terminations_total')].value.sum()`
⛔️Custom on fail: Discard value