function61
diff --git a/‎README.md‎
Lines changed: 7 additions & 4 deletions b/‎README.md‎
Lines changed: 7 additions & 4 deletions
diff --git a/‎docs/usecase_cloudwatch-alerting-healthy-queue.png‎
17.7 KB b/‎docs/usecase_cloudwatch-alerting-healthy-queue.png‎
17.7 KB
diff --git a/‎docs/usecase_cloudwatch-alerting-unhealthy-queue.png‎
18.4 KB b/‎docs/usecase_cloudwatch-alerting-unhealthy-queue.png‎
18.4 KB
diff --git a/‎docs/usecase_cloudwatch-alerting.md‎
Lines changed: 55 additions & 0 deletions b/‎docs/usecase_cloudwatch-alerting.md‎
Lines changed: 55 additions & 0 deletions
@@ -3,8 +3,10 @@ lambda-alertmanager?
 
 - Provides simple & reliable alerting for your infrastructure.
 - Uses so little resources that it is practically free to run.
-- [Monitors your web properties for being up](docs/usecase_http-monitoring.md), receive alerts from Prometheus,
-  Amazon CloudWatch alarms, alarms via SNS topic or any custom HTTP integration (as JSON).
+- [Monitors your web properties for being up](docs/usecase_http-monitoring.md),
+  [receive alerts from Prometheus](docs/usecase_prometheus-alerting.md),
+  [Amazon CloudWatch alarms](docs/usecase_cloudwatch-alerting.md), alarms via SNS topic or
+  [any custom HTTP integration (as JSON)](docs/setup_custom_integration.md).
 - Runs **entirely** on AWS' reliable infrastructure (after setup nothing for you to manage or fix). The compute part is Lambda,
   but we also use DynamoDB + streams (for state), IAM (for sandboxing AlertManager), API Gateway (for inbound https integrations),
   CloudWatch Events (for scheduling) and SNS (inbound alarm receiving, outbound alert delivery).
@@ -59,8 +61,9 @@ Follow these steps precisely, and you've got yourself a working installation:
 4. [Set up AlertManager](docs/setup_alertmanager.md)
 5. [Set up API Gateway](docs/setup_apigateway.md) (also includes: testing that this works)
 6. (recommended) [Set up AlertManager-canary](docs/setup_alertmanager-canary.md)
-7. (optional) Set up Prometheus integration
+7. (optional) [Set up Prometheus integration](docs/usecase_prometheus-alerting.md)
 8. (optional) [Set up custom integration](docs/setup_custom_integration.md)
+9. (optional) [Set up CloudWatch integration](docs/usecase_cloudwatch-alerting.md)
 
 
 Diagram
@@ -109,7 +112,7 @@ Q: Why use this, [uptimerobot.com](https://uptimerobot.com/) is free?
 
 A: uptimerobot.com is awesome, but:
 
-- It only supports 5 minute rates while lambda-alertmanager supports 1 minute rates.
+- The free option only supports 5 minute rates while lambda-alertmanager supports 1 minute rates.
 - It does mainly HTTP/HTTPS checks, while lambda-alertmanager integrates with Prometheus, Amazon CloudWatch & others as well.
 - It supports free SMS messages (no delivery guarantees), but they have non-free "pro SMS" (better delivery).
   lambda-alertmanager SMSes are all "pro SMS" and free to a certain limit.
 
@@ -0,0 +1,55 @@
+Use case: CloudWatch alerting
+=============================
+
+NOTE: this guide applies to most AWS services - not just SQS. But we'll use SQS as an example.
+
+Let's say that you have queue workers (whether in AWS or outside of AWS) that use AWS's SQS
+(Simple Queue Service). Great way to detect problems is to detect if they queue is backing up.
+
+
+What does a healhy queue look like?
+-----------------------------------
+
+A healthy queue would not have many queued work items for a prolonged amount of time.
+Healthy queue looks like this:
+
+![](usecase_cloudwatch-alerting-healthy-queue.png)
+
+Observations:
+
+- Items are sent to the queue pretty constantly.
+- Visible messages (= messages that are not yet consumed by a worker) should be close to zero at all times.
+
+
+What does an unhealhy queue look like?
+--------------------------------------
+
+Unhealthy queue gets messages sent to it faster than they are processed. Looks like this:
+
+![](usecase_cloudwatch-alerting-unhealthy-queue.png)
+
+Observations:
+
+- Items are sent to the queue pretty constantly.
+- Visible messages (= messages that are not yet consumed by a worker) ARE NOT close to zero.
+
+
+Creating a CloudWatch alarm to detect unhealthy queue
+-----------------------------------------------------
+
+Go to `CloudWatch > Alarms > Create Alarm > SQS Queue Metrics`:
+
+- QueueName = your queue
+- Metrics = `ApproximateNumberOfMessagesVisible`
+- `[ Next ]`
+- Name = `Queue XYZ health`
+- Whenever ApproximateNumberOfMessagesVisible `is >= 5 for 1 consecutive periods`
+- Period = `5 minutes`
+- Action = `state = alarm => send notification to AlertManager-ingest`
+- `[ Create Alarm ]`
+
+Now when alarming condition is detected, CloudWatch uses AlertManager to dispatch the alert to you. :)
+
+NOTE: `ApproximateAgeOfOldestMessage` is probably best metric to detect unhealthy queue that
+works even in high-bandwidth queues.
+`ApproximateNumberOfMessagesVisible` was mainly used as the easiest explanation.