fix(ci): wait for container removal before docker compose up by samtuckerdavis · Pull Request #104 · Open-Paws/cryptpad-project-management

samtuckerdavis · 2026-04-22T14:27:26Z

Summary

Fixes the production Deploy workflow, which has failed 8+ consecutive times on main with a Docker container-name conflict.
Adds a wait loop after docker compose down / docker rm -f so docker compose up -d doesn't race with an in-flight container removal.

Root cause

Every recent failure (runs 24698290423, 24698091488, 24698090290, ...) dies with one of:

Container cryptpad-cryptpad-1  Error response from daemon: removal of container <id> is already in progress
Container cryptpad-cryptpad-1  Error response from daemon: Conflict. The container name "/cryptpad-cryptpad-1" is already in use ...

Both docker compose down --timeout 30 and docker rm -f return to the shell as soon as the daemon accepts the request; actual removal of the filesystem/network resources can still be in progress. The immediately-following docker compose up -d then collides with its own in-flight teardown. The existing || true and 2>/dev/null || true masked the cleanup errors but not the subsequent up failure.

Fix

After the existing down + rm -f, poll docker ps -a for up to 60s until the container name is actually gone, then run up. If the container is still present after the timeout, abort the deploy with a clear error instead of silently racing.

No change to deploy target (still 64.227.144.36 / /opt/cryptpad).
No change to secrets or auth (still DEPLOY_SSH_KEY).
No checks disabled; the new path adds a stricter fail-closed guard.

Note on verification

This workflow only runs on push to main, so it will not run against this PR. The fix will take effect on merge. Behavior under the failure mode is exercised by the new wait loop on the first deploy after merge. If the stuck container persists beyond 60s, the deploy will now fail fast with a clear error surfacing the daemon state (docker ps -a --filter name=cryptpad-cryptpad-1), rather than a confusing conflict from up.

Test plan

Merge to main and observe the next production deploy run.
Confirm the "Waiting for cryptpad-cryptpad-1 to be removed" lines appear if there's a stuck container, and that up succeeds after the wait.
If the timeout triggers instead, investigate the Docker daemon state on the droplet (not a regression — this surfaces pre-existing breakage more clearly than before).

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Summary by CodeRabbit

Chores
- Improved deployment reliability by adding a verification step that waits for and confirms container removal before bringing services back up; deployment aborts with diagnostics if removal fails.
- Added configuration to exclude vendored third-party and special-purpose files from repository scanning to reduce false positives.

The production deploy has failed 8+ times in a row with either "removal of container ... is already in progress" or "container name /cryptpad-cryptpad-1 is already in use". Root cause: `docker compose down` and `docker rm -f` both return before the Docker daemon finishes the async container removal. The immediately-following `docker compose up -d` then races with that in-flight removal and fails. Fix: after the down/rm step, poll `docker ps -a` for up to 60s until the container name is actually gone, then run `up`. If the container is still present after the timeout, abort the deploy with a clear error instead of masking the problem. No change to deploy target, secrets, or auth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-22T14:27:42Z

AI Code Review\n\nReview unavailable.

coderabbitai · 2026-04-22T14:27:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09d23330-fed8-40f0-8762-345056277068

📥 Commits

Reviewing files that changed from the base of the PR and between fb67604 and a99a48a.

📒 Files selected for processing (1)

.wokeignore

✅ Files skipped from review due to trivial changes (1)

.wokeignore

📝 Walkthrough

Walkthrough

Added a polling/validation step to the SSH deploy workflow that waits up to 60 seconds for cryptpad-cryptpad-1 to be removed before running docker compose up -d. Also added a .wokeignore file listing scanner ignore patterns for vendored and upstream files.

Changes

Cohort / File(s)	Summary
CI/CD Deployment Validation `.github/workflows/deploy.yml`	Inserted a post-removal wait-and-check loop that polls `docker ps -a` for `cryptpad-cryptpad-1` (30 attempts, 2s sleep). If the container remains after timeout, logs matching `docker ps` output and exits 1; otherwise continues to `docker compose up -d`.
Static Analysis Ignore Rules `.wokeignore`	Added scanner ignore patterns excluding vendored third-party directories under `www/lib/`, specific scripts, upstream CryptPad source paths (`lib/`, `scripts/tests/`, `src/common/`), `CHANGELOG.md`, and `.claude/`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately summarizes the main fix: adding a wait-and-check mechanism for container removal before running docker compose up, which is the core change in the deploy workflow.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/prod-deploy-container-removal-race

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy.yml:
- Around line 48-58: The current loop and guard use a piped command like `docker
ps -a --format '{{.Names}}' | grep -Fxq 'cryptpad-cryptpad-1'` which can treat a
failing `docker ps` as "not found"; change both uses to first run `docker ps -a
--format '{{.Names}}'` and capture its exit status and output into a variable,
immediately fail the script if `docker ps` returns non-zero, and only then run
`grep -Fxq 'cryptpad-cryptpad-1'` against the captured output; update the
waiting loop condition (the `while`/`break` logic) and the final guard `if` that
checks presence of `cryptpad-cryptpad-1` to use the captured output variable and
the docker exit check so daemon/query failures are detected and the job aborts
rather than being misinterpreted as "container removed".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eadbc37c-e03e-4ecd-a79d-c5265237061c

📥 Commits

Reviewing files that changed from the base of the PR and between 8554cf7 and fb67604.

📒 Files selected for processing (1)

.github/workflows/deploy.yml

coderabbitai · 2026-04-22T14:30:45Z

+              if ! docker ps -a --format '{{.Names}}' | grep -Fxq 'cryptpad-cryptpad-1'; then
+                break
+              fi
+              echo "[$(date)] Waiting for cryptpad-cryptpad-1 to be removed ($i/30)..."
+              sleep 2
+            done
+            if docker ps -a --format '{{.Names}}' | grep -Fxq 'cryptpad-cryptpad-1'; then
+              echo "[$(date)] ERROR: cryptpad-cryptpad-1 still present after 60s; aborting deploy." >&2
+              docker ps -a --filter name=cryptpad-cryptpad-1 >&2
+              exit 1
+            fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Reproducing conditional behavior similar to Line 48/54:" mock_docker_ps() { return 125; } # simulate daemon/query error if ! mock_docker_ps | grep -Fxq 'cryptpad-cryptpad-1'; then echo "Observed: condition evaluates as absent even though query failed (buggy behavior)." else echo "Unexpected." fi

Repository: Open-Paws/cryptpad-project-management

Length of output: 219

🏁 Script executed:

fd -type f -name "deploy.yml" | head -5

Repository: Open-Paws/cryptpad-project-management

Length of output: 251

🏁 Script executed:

fd -t f "deploy.yml"

Repository: Open-Paws/cryptpad-project-management

Length of output: 63

🏁 Script executed:

git ls-files ".github/workflows/" | head -20

Repository: Open-Paws/cryptpad-project-management

Length of output: 314

🏁 Script executed:

wc -l .github/workflows/deploy.yml

Repository: Open-Paws/cryptpad-project-management

Length of output: 112

🏁 Script executed:

cat -n .github/workflows/deploy.yml

Repository: Open-Paws/cryptpad-project-management

Length of output: 2623

docker ps failures can be misclassified as "container removed".

At Line 48 and Line 54, the pipeline can pass on daemon/query failure (not just true absence), which may allow docker compose up to run despite an unhealthy Docker state. If docker ps fails, grep receives empty input and returns non-match, which ! converts to success in the loop condition or masks the failure in the guard condition.

Make the docker ps call explicit and fail immediately if it errors by capturing its output separately before grepping:

Proposed fix

+ container_name='cryptpad-cryptpad-1' for i in $(seq 1 30); do - if ! docker ps -a --format '{{.Names}}' | grep -Fxq 'cryptpad-cryptpad-1'; then + if ! names="$(docker ps -a --format '{{.Names}}')"; then + echo "[$(date)] ERROR: failed to query Docker daemon while waiting for ${container_name} removal." >&2 + exit 1 + fi + if ! grep -Fxq "${container_name}" <<<"${names}"; then break fi - echo "[$(date)] Waiting for cryptpad-cryptpad-1 to be removed ($i/30)..." + echo "[$(date)] Waiting for ${container_name} to be removed ($i/30)..." sleep 2 done - if docker ps -a --format '{{.Names}}' | grep -Fxq 'cryptpad-cryptpad-1'; then - echo "[$(date)] ERROR: cryptpad-cryptpad-1 still present after 60s; aborting deploy." >&2 - docker ps -a --filter name=cryptpad-cryptpad-1 >&2 + if ! names="$(docker ps -a --format '{{.Names}}')"; then + echo "[$(date)] ERROR: failed to query Docker daemon before final container presence check." >&2 + exit 1 + fi + if grep -Fxq "${container_name}" <<<"${names}"; then + echo "[$(date)] ERROR: ${container_name} still present after 60s; aborting deploy." >&2 + docker ps -a --filter "name=^/${container_name}$" >&2 exit 1 fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy.yml around lines 48 - 58, The current loop and guard use a piped command like `docker ps -a --format '{{.Names}}' | grep -Fxq 'cryptpad-cryptpad-1'` which can treat a failing `docker ps` as "not found"; change both uses to first run `docker ps -a --format '{{.Names}}'` and capture its exit status and output into a variable, immediately fail the script if `docker ps` returns non-zero, and only then run `grep -Fxq 'cryptpad-cryptpad-1'` against the captured output; update the waiting loop condition (the `while`/`break` logic) and the final guard `if` that checks presence of `cryptpad-cryptpad-1` to use the captured output variable and the docker exit check so daemon/query failures are detected and the job aborts rather than being misinterpreted as "container removed".

The `check` workflow runs woke on the whole tree and flags findings in vendored third-party libraries (mermaid, pdf.js, asciidoctor, etc.) and upstream CryptPad source files. Those are not owned in this fork and rewording them would drift from upstream. Add a `.wokeignore` that the action honours to scope the scan to code actually authored here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-22T15:16:24Z

AI Code Review\n\nReview unavailable.

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

samtuckerdavis merged commit b449606 into main Apr 22, 2026
7 checks passed

samtuckerdavis deleted the fix/prod-deploy-container-removal-race branch April 22, 2026 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ci): wait for container removal before docker compose up#104

fix(ci): wait for container removal before docker compose up#104
samtuckerdavis merged 2 commits intomainfrom
fix/prod-deploy-container-removal-race

samtuckerdavis commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

samtuckerdavis commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Note on verification

Test plan

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 22, 2026

AI Code Review\n\nReview unavailable.

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 22, 2026

AI Code Review\n\nReview unavailable.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samtuckerdavis commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading