diff --git a/.flake8 b/.flake8
index 44266c4..3b3ce69 100644
--- a/.flake8
+++ b/.flake8
@@ -9,4 +9,4 @@ extend-ignore = E203, N812, N817
 
 #============================ Repo-specific config ===========================
 per-file-ignores =
-    test/*:D103,D403
\ No newline at end of file
+    test/*:D102,D103,D403
\ No newline at end of file
diff --git a/.github/agents/adversarial-lens.agent.md b/.github/agents/adversarial-lens.agent.md
new file mode 100644
index 0000000..6db042f
--- /dev/null
+++ b/.github/agents/adversarial-lens.agent.md
@@ -0,0 +1,30 @@
+# Adversarial Lens
+
+You are the **adversarial lens** — a red-team perspective. Assume the
+proposal or code is wrong. Your job is to break it.
+
+You always run **after** the constructive lenses. You will receive their
+outputs (plans or findings) so you can focus on what they missed. Do not
+duplicate issues already reported — find the gaps.
+
+## Planning Mode
+
+When reviewing an implementation plan produced by other lenses, actively try
+to find failure modes they overlooked. Look for race conditions, deadlocks,
+ABA problems, platform bugs, edge cases, missing error handling, reference
+counting errors, and assumptions that may not hold. Start from skepticism and
+only endorse what survives scrutiny.
+
+## Review Mode
+
+When reviewing code after constructive reviewers, focus on gaps in their
+coverage. Construct pathological inputs, race windows, resource exhaustion
+scenarios, and edge cases. Look for:
+- Code sections covered by NO existing finding
+- Issue categories not represented in existing findings
+- Cross-component interactions no single perspective would catch
+- Unchecked assumptions and untested preconditions
+- Silent divergences with no test coverage
+- Fragile coupling where changing one thing silently breaks another
+
+Only clear findings that survive scrutiny.
diff --git a/.github/agents/conservative-lens.agent.md b/.github/agents/conservative-lens.agent.md
new file mode 100644
index 0000000..99e20b1
--- /dev/null
+++ b/.github/agents/conservative-lens.agent.md
@@ -0,0 +1,27 @@
+# Conservative Lens
+
+You are the **conservative lens** — a perspective that minimizes the
+changeset.
+
+## Planning Mode
+
+When producing an implementation plan, touch as few lines as possible. Prefer
+surgical edits over refactors. Reuse existing patterns and infrastructure.
+Resist new dependencies or abstractions. Each step should justify why it is
+necessary and confirm that no smaller change achieves the same goal.
+
+## Rebuttal Mode
+
+You have been given a point of disagreement between planners. You will see
+your original recommendation alongside the competing alternatives. Argue
+concisely and specifically for why your approach is the best choice and why
+each alternative is inferior. Ground your argument in concrete trade-offs, not
+abstract preferences. One turn only — make it count.
+
+## Review Mode
+
+When reviewing code or a plan, focus on scope creep and unnecessary change.
+Look for gratuitous refactors, new abstractions that could be avoided, added
+dependencies that duplicate existing functionality, and changes to code that
+did not need to be touched. Flag anything that increases the blast radius
+beyond what the task requires.
diff --git a/.github/agents/correctness-lens.agent.md b/.github/agents/correctness-lens.agent.md
new file mode 100644
index 0000000..a1eaf52
--- /dev/null
+++ b/.github/agents/correctness-lens.agent.md
@@ -0,0 +1,18 @@
+# Correctness Lens
+
+You are the **correctness lens** — a perspective focused exclusively on
+functional correctness.
+
+## Planning Mode
+
+When producing an implementation plan, ensure every step preserves existing
+invariants and introduces no logic errors. Verify state transitions, boundary
+conditions, and error propagation at each step. Flag any step where correctness
+depends on an unstated assumption.
+
+## Review Mode
+
+When reviewing code or a plan, focus exclusively on functional correctness.
+Look for logic errors, off-by-one mistakes, incorrect state transitions,
+broken invariants, missing error handling at system boundaries, and test gaps.
+Ignore style.
diff --git a/.github/agents/security-lens.agent.md b/.github/agents/security-lens.agent.md
new file mode 100644
index 0000000..3ce5dcd
--- /dev/null
+++ b/.github/agents/security-lens.agent.md
@@ -0,0 +1,18 @@
+# Security Lens
+
+You are the **security lens** — a perspective focused exclusively on security.
+
+## Planning Mode
+
+When producing an implementation plan, evaluate every step for its impact on
+trust boundaries, attack surface, and secrets handling. Prefer designs that
+make vulnerabilities structurally impossible over those that require discipline
+to remain safe. Flag any step that introduces input handling, deserialization,
+or privilege changes without validation.
+
+## Review Mode
+
+When reviewing code or a plan, focus exclusively on security. Look for
+injection flaws, buffer overflows, race conditions exploitable by an attacker,
+unsafe deserialization, credential leaks, missing input validation at trust
+boundaries, and OWASP Top 10 issues. Ignore style.
diff --git a/.github/agents/speed-lens.agent.md b/.github/agents/speed-lens.agent.md
new file mode 100644
index 0000000..8b97af2
--- /dev/null
+++ b/.github/agents/speed-lens.agent.md
@@ -0,0 +1,26 @@
+# Speed Lens
+
+You are the **speed lens** — a perspective obsessed with performance.
+
+## Planning Mode
+
+When producing an implementation plan, optimize for minimal latency and
+overhead at every step. Inline aggressively, avoid unnecessary abstractions,
+and prefer lock-free and wait-free primitives. Tolerate complexity if it buys
+measurable speed. Justify each step with its performance rationale, and flag
+any step where a simpler but slower alternative exists.
+
+## Rebuttal Mode
+
+You have been given a point of disagreement between planners. You will see
+your original recommendation alongside the competing alternatives. Argue
+concisely and specifically for why your approach is the best choice and why
+each alternative is inferior. Ground your argument in concrete trade-offs, not
+abstract preferences. One turn only — make it count.
+
+## Review Mode
+
+When reviewing code or a plan, focus exclusively on performance. Look for
+unnecessary allocations, redundant work, hot-path overhead, abstraction cost,
+cache-unfriendly access patterns, and missed opportunities for batching or
+parallelism. Ignore style unless it has a performance consequence.
diff --git a/.github/agents/synthesis-lens.agent.md b/.github/agents/synthesis-lens.agent.md
new file mode 100644
index 0000000..dae3623
--- /dev/null
+++ b/.github/agents/synthesis-lens.agent.md
@@ -0,0 +1,37 @@
+# Synthesis Lens
+
+You are the **synthesis lens** — a senior engineer who reconciles outputs from
+multiple competing perspectives into one coherent result.
+
+## Planning Mode
+
+When synthesizing implementation plans, preserve the strongest ideas from each
+perspective. Where planners disagree, make an explicit trade-off decision and
+justify it. The final plan must be a numbered step-by-step implementation
+sequence with clear rationale. Flag any unresolved risks.
+
+When rebuttals are provided alongside the original plans, engage with the
+arguments each lens makes. Do not ignore or average them — evaluate the
+competing cases on their merits and pick one option per disagreement.
+Justify each choice by referencing the specific argument that was most
+convincing.
+
+If a disagreement cannot be resolved — for example, the rebuttals are equally
+compelling, or the trade-off depends on priorities only the user can set —
+do not guess. Instead, present the unresolved disagreement in a structured
+format:
+
+> **Unresolved: {short title}**
+> - **Option A ({lens name}):** {summary of position and key argument}
+> - **Option B ({lens name}):** {summary of position and key argument}
+> - **Why it matters:** {impact of the choice}
+
+The orchestrator will escalate these to the user for a decision.
+
+## Review Mode
+
+When synthesizing review findings, merge duplicates (keeping the most detailed
+version and noting which perspectives flagged each issue). Resolve conflicts
+where reviewers disagree by noting both sides and flagging the trade-off.
+Classify findings by severity: critical, high, medium, low. Present a unified
+report ordered by severity.
diff --git a/.github/agents/usability-lens.agent.md b/.github/agents/usability-lens.agent.md
new file mode 100644
index 0000000..31132c3
--- /dev/null
+++ b/.github/agents/usability-lens.agent.md
@@ -0,0 +1,26 @@
+# Usability Lens
+
+You are the **usability lens** — a perspective that prioritizes clean,
+readable, maintainable code.
+
+## Planning Mode
+
+When producing an implementation plan, favor clear abstractions, good naming,
+and small focused functions. Accept modest performance cost for clarity. Each
+step should explain how it keeps the code understandable and easy to modify.
+Flag any step that introduces unnecessary complexity.
+
+## Rebuttal Mode
+
+You have been given a point of disagreement between planners. You will see
+your original recommendation alongside the competing alternatives. Argue
+concisely and specifically for why your approach is the best choice and why
+each alternative is inferior. Ground your argument in concrete trade-offs, not
+abstract preferences. One turn only — make it count.
+
+## Review Mode
+
+When reviewing code or a plan, focus on readability, naming, API ergonomics,
+and long-term maintainability. Look for unclear logic, misleading names,
+excessive complexity, poor abstractions, duplicated logic, and violations of
+project conventions. Ignore micro-optimizations unless they harm clarity.
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 957dd4f..3bbe3fa 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -22,8 +22,14 @@ User code (@when)
        ▼
  Python layer (behaviors.py)
    ├── @when decorator: captures closure vars, schedules behavior
-   ├── Behaviors: runtime lifecycle, worker pool, scheduler thread
-   └── Cown: typed data wrapper with acquire/release semantics
+   │     directly from the caller's thread (no central queue hop)
+   ├── Behaviors: runtime lifecycle, worker pool, terminator
+   │     (no scheduler thread; scheduling and release run on the
+   │      threads that need them — caller and worker)
+   ├── Cown: typed data wrapper with acquire/release semantics
+   └── Noticeboard: notice_write/update/delete/read, noticeboard()
+         (global key-value store, snapshot-per-behavior; mutators
+          are serialized through one dedicated noticeboard thread)
        │
        ▼
  Transpiler (transpiler.py)
@@ -33,13 +39,17 @@ User code (@when)
        ▼
  Worker (worker.py)
    └── Sub-interpreter event loop: receives behavior capsules via
-       message queue, executes them, sends release messages
+       boc_worker queue, executes them, then releases cowns and
+       decrements the terminator on the worker thread itself
        │
        ▼
  C extensions
-   ├── _core.c: CownCapsule, BehaviorCapsule, Request (two-phase
-   │            locking), lock-free MPSC message queues (16 queues,
-   │            tag-based)
+   ├── _core.c: CownCapsule, BehaviorCapsule, BOCBehavior +
+   │            BOCRequest array (two-phase locking), C-level
+   │            terminator, lock-free MPSC message queues (16
+   │            queues, tag-based), global Noticeboard (mutex-
+   │            protected, up to 64 entries with thread-local
+   │            snapshot cache + monotonic version counter)
    └── _math.c: dense double-precision Matrix
 ```
 
@@ -47,26 +57,41 @@ User code (@when)
 
 1. `@when(cown_a, cown_b)` → transpiler extracts the decorated function and its
    captured variables → exported as `__behavior__N` in a generated module.
-2. The scheduler enqueues a `Request` that performs two-phase locking (2PL) over
-   the sorted cown IDs.
-3. When all cowns are acquired, the behavior capsule is sent to a worker via
-   `send("boc_worker", capsule)`.
-4. The worker executes `__behavior__N` with exclusive access to the cowns, then
-   sends `("release", bid)` back to the scheduler.
-5. The scheduler releases the cowns, allowing waiting behaviors to proceed. The
-   result is stored in the `Cown` returned by `@when`.
+2. `whencall` (caller's thread) increments the C terminator and calls
+   `_core.behavior_schedule`, which performs two-phase locking (2PL) over
+   the sorted cown IDs in C, releasing the GIL across the lock-free link
+   loops.
+3. When all cowns are acquired, `behavior_resolve_one` enqueues the
+   `BehaviorCapsule` directly to the `boc_worker` queue — no central
+   scheduler hop.
+4. A worker pops the capsule, executes `__behavior__N` with exclusive access
+   to the cowns, then on the same worker thread calls
+   `behavior_release_all` (MCS unlink + handoff to the next behavior
+   waiting on each cown) and `terminator_dec`.
+5. Releasing a cown may resolve the next waiting behavior, which is
+   dispatched directly to a worker without touching any central queue.
+   The result is stored in the `Cown` returned by `@when`.
+6. `wait()` blocks on `terminator_wait` until the C-level count reaches
+   zero; `stop()` then drains the workers and the noticeboard thread.
 
 ## Public API
 
 | Symbol | Purpose |
 |--------|---------|
-| `Cown[T]` | Typed wrapper for concurrently-owned data |
+| `Cown[T]` | Typed wrapper for concurrently-owned data (with `.value` and `.exception` properties) |
 | `@when(*cowns)` | Schedule a behavior with exclusive access to the listed cowns |
+| `whencall(thunk, args, captures)` | Lower-level form of `@when` used by the transpiler; schedules a named thunk against cowns and capture values |
 | `send(tag, contents)` | Send a message to a tag (lock-free) |
 | `receive(tags, timeout, after)` | Selective receive; blocks or times out |
 | `drain(tags)` | Clear all queued messages for the given tag(s) |
 | `set_tags(tags)` | Pre-assign tags to queues; clears all messages |
 | `TIMEOUT` | Sentinel returned by `receive` on timeout |
+| `noticeboard()` | Read a per-behavior cached snapshot of the global key-value store |
+| `notice_read(key, default)` | Convenience: read a single key from the snapshot |
+| `notice_write(key, value)` | Non-blocking write to the noticeboard |
+| `notice_update(key, fn, default)` | Atomic read-modify-write; returning `REMOVED` deletes the entry |
+| `notice_delete(key)` | Non-blocking delete of a single noticeboard entry |
+| `REMOVED` | Sentinel returned by a `notice_update` fn to delete the entry |
 | `wait(timeout)` | Block until all behaviors complete; stops the runtime |
 | `start(workers, export_dir, module)` | Manually start the runtime (auto-called on first `@when`) |
 | `Matrix` | Dense 2D matrix of doubles with C-backed arithmetic |
@@ -89,18 +114,90 @@ User code (@when)
 | `setup.py` | C extension build configuration |
 | `pyproject.toml` | Project metadata, dependencies, entry points |
 | `.flake8` | Linting rules (Google style, 120 chars, double quotes) |
+| `.copilot/` | Scratch directory for temporary files (gitignored) |
+
+## Scratch and Temporary Files
+
+Use the `.copilot/` directory at the repo root for **all** temporary files:
+diffs saved for review, scratch scripts, generated transpiler output, ad-hoc
+notes, intermediate command output, etc. The directory is gitignored.
+
+- **Do not use `/tmp`** or any other system temp location. Keeping scratch
+  files inside the repo means they survive across tool calls in the same
+  session and are easy for you to find again.
+- **Look in `.copilot/` first** when searching for prior scratch artifacts
+  (saved diffs, exported modules, plan notes). Standard search tools
+  respect `.gitignore`, so you may need to pass include flags to see
+  these files.
+- Create the directory with `mkdir -p .copilot` if it does not yet exist.
 
 ## Build and Test
 
+**Always activate a project virtual environment first.** The repository keeps
+several side-by-side venvs (one per Python version / build flavor) because the
+underlying `XIData` API changes between Python releases and the C extension
+needs version-specific testing. At the time of writing the following venvs
+exist at the repo root:
+
+| venv | Python flavor |
+|------|---------------|
+| `.env312` | CPython 3.12 |
+| `.env313d` | CPython 3.13 (debug build) |
+| `.env313t` | CPython 3.13 (free-threaded) |
+| `.env314` | CPython 3.14 (default for most work) |
+| `.env315` | CPython 3.15 |
+| `.env315t` | CPython 3.15 (free-threaded) |
+
+**At the start of a session, ask the user which venv to use** before running
+any `pip`, `pytest`, `python`, or other project command. Do not assume
+`.env314`; the user may be debugging a version-specific issue and have a
+different venv in mind. If the user does not specify, suggest `.env314` as the
+default but wait for confirmation.
+
 ```bash
+source .env314/bin/activate  # or whichever venv the user picked
 pip install -e .[test]       # editable install with test deps
 pytest -vv                   # run full suite
 pip install -e .[linting]    # linting deps
 flake8 src/ test/            # lint check
 ```
 
+Never run `pip`, `pytest`, `python`, or any project command outside the
+activated venv. If you need to validate a fix against more than one Python
+version, re-install and re-run the suite in each relevant venv.
+
 C extensions are compiled by setuptools from `_core.c` and `_math.c` during
-`pip install`.
+`pip install`. Re-installing in a fresh venv triggers a rebuild against that
+interpreter's headers.
+
+## Inspecting Transpiler Output
+
+`@when` is implemented by an AST transformer (`src/bocpy/transpiler.py`) that
+extracts each decorated function into a top-level `__behavior__N` definition
+and rewrites the call site as `whencall('__behavior__N', cowns, captures)`.
+The captures tuple is built **at schedule time**, so loop variables are
+snapshotted by value — no `x=x` default-arg idiom is needed (and adding one
+breaks the behavior because the transpiler treats every signature name as a
+behavior parameter and discards the default).
+
+When debugging behavior dispatch, capture resolution, parameter-count
+mismatches, or anything else that depends on the transpiler's output, use the
+`export_module.py` tool at the repo root:
+
+```bash
+python export_module.py path/to/your_module.py -o .copilot/export.py
+cat .copilot/export.py
+```
+
+The exported module is exactly what worker interpreters import. It shows:
+
+- which names are behavior parameters vs. captures,
+- the order of arguments to each `whencall(...)`,
+- how class/function references are resolved at module level, and
+- how `__file__` and module dunders are rewritten.
+
+Reach for this tool whenever a `@when` body misbehaves in a way that is not
+explained by the source as written.
 
 ---
 
@@ -138,6 +235,27 @@ After implementing a change, run the **review-loop** skill to get an independent
 review. This may be skipped for trivial changesets, but you do not decide what
 qualifies as trivial — ask for approval first.
 
+For a complete pre-merge audit, use **branch-review** instead, which runs three
+constructive reviewer lenses plus an adversarial gap-analysis pass over the
+branch diff.
+
+For non-trivial design work (multi-subsystem changes, architecture decisions),
+use **multi-perspective-plan** to draft and stress-test the plan with competing
+lens subagents before any code is written.
+
+Other skills available in `.github/skills/`:
+
+- **commenting-c-and-python** — the C and Python doc/comment conventions used
+  across `_core.c`, `_math.c`, `behaviors.py`, `transpiler.py`, and the
+  `__init__.pyi` stub.
+- **testing-with-boc** — how to write pytest tests against `@when`, `Cown`,
+  noticeboard, and `Cown.exception`, including the `send`/`receive` assertion
+  pattern.
+- **testing-message-queue** — how to write tests for the lock-free MPSC queue
+  (`send`/`receive`/`set_tags`/`drain`).
+- **version-bump** — the files to update in lock-step when releasing a new
+  version.
+
 I am still your collaborator. If you are unsure how to address a reviewer's
 comment, ask me rather than guessing.
 
diff --git a/.github/skills/branch-review/SKILL.md b/.github/skills/branch-review/SKILL.md
index 6e3b581..4397d46 100644
--- a/.github/skills/branch-review/SKILL.md
+++ b/.github/skills/branch-review/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: branch-review
-description: "Multi-perspective code review for a branch before merging. Use when: reviewing a branch, preparing a PR, pre-merge review, auditing a feature branch, or when /branch-review is invoked. Spawns four reviewer subagents with competing priorities (correctness, security, maintainability, adversarial) then synthesizes their findings into a unified review report."
+description: "Multi-perspective code review for a branch before merging. Use when: reviewing a branch, preparing a PR, pre-merge review, auditing a feature branch, or when /branch-review is invoked. Spawns three constructive reviewer subagents (correctness, security, usability), then runs an adversarial gap analysis to find what they missed, and synthesizes all findings into a unified review report."
 argument-hint: "Branch name or merge target (e.g. 'main' or 'feature/foo -> main')"
 ---
 
@@ -62,23 +62,22 @@ Assemble a context block that every reviewer will receive. It must include:
 Keep the context block identical across all four reviewers to ensure a fair
 comparison.
 
-### 3. Spawn Four Reviewer Subagents
+### 3. Spawn Three Constructive Reviewer Lens Subagents
 
-Launch four subagents **in parallel**. Each receives the context block plus a
-persona directive. Each must return findings in the severity-tagged format
-defined above.
+Launch three subagents **in parallel**, each using a named lens agent operating
+in **review mode**. Each receives the context block and must return findings in
+the severity-tagged format defined above.
 
-| # | Persona | Directive |
-|---|---------|-----------|
-| 1 | **Correctness** | Focus exclusively on functional correctness. Look for logic errors, off-by-one mistakes, incorrect state transitions, broken invariants, missing error handling at system boundaries, and test gaps. Ignore style. |
-| 2 | **Security** | Focus exclusively on security. Look for injection flaws, buffer overflows, race conditions exploitable by an attacker, unsafe deserialization, credential leaks, missing input validation at trust boundaries, and OWASP Top 10 issues. Ignore style. |
-| 3 | **Maintainability** | Focus on code quality and long-term health. Look for unclear naming, excessive complexity, duplicated logic, missing or misleading comments, poor abstractions, and violations of project conventions (flake8 rules, comment style, etc.). Ignore performance unless it causes a correctness issue. |
-| 4 | **Adversarial** | Assume the code is wrong and try to break it. Construct pathological inputs, race windows, resource exhaustion scenarios, and edge cases. Challenge every assumption. Only clear findings that survive scrutiny. |
+| # | Agent | Focus |
+|---|-------|-------|
+| 1 | `correctness-lens` | Logic errors, broken invariants, test gaps |
+| 2 | `security-lens` | Injection, overflows, trust boundary violations |
+| 3 | `usability-lens` | Naming, complexity, conventions, maintainability |
 
 Each subagent prompt must include:
 
 - The shared context block
-- The persona directive (from the table above)
+- An instruction to operate in **review mode**
 - These instructions:
 
   > Review the diff and changed files from the perspective described above.
@@ -95,9 +94,48 @@ Each subagent prompt must include:
   > Do NOT fabricate issues. Only report genuine problems.
   > Order findings by severity (critical first).
 
-### 4. Deduplicate and Synthesize
+### 4. Adversarial Gap Analysis
 
-After all four reviewers return:
+After the three constructive reviewers return, spawn a fresh `adversarial-lens`
+subagent operating in **review mode**. This step runs **sequentially** — the
+adversarial reviewer receives the existing findings so it can focus on what the
+others missed.
+
+The adversarial subagent prompt must include:
+
+- The shared context block
+- The full list of findings from the three constructive reviewers
+- These instructions:
+
+  > You are the adversarial reviewer. The findings below were produced by three
+  > constructive reviewers (correctness, security, usability). Your job is to
+  > find what they missed.
+  >
+  > Focus on:
+  > - Code sections covered by NO existing finding (overlooked areas)
+  > - Issue categories not represented in the existing findings
+  > - Cross-component interactions no single lens would catch
+  > - Unchecked assumptions and untested preconditions
+  > - Silent divergences with no test coverage
+  > - Fragile coupling where changing one thing silently breaks another
+  >
+  > For each issue found, report it in this exact format:
+  >
+  >   **[SEVERITY] Short title**
+  >   - **Location:** file path and line number(s)
+  >   - **Problem:** what is wrong and why it matters
+  >   - **Suggestion:** concrete fix or remediation
+  >
+  >   where SEVERITY is one of: critical, high, medium, low.
+  >
+  > If the existing findings are comprehensive and you find no gaps, state
+  > explicitly: "No additional issues found."
+  > Do NOT duplicate issues already reported. Only report NEW problems.
+  > Order findings by severity (critical first).
+
+### 5. Deduplicate and Synthesize
+
+After all four reviewers (three constructive + adversarial) have returned:
 
 1. **Merge duplicates.** If multiple reviewers flag the same issue, keep the
    most detailed version and note which perspectives flagged it (higher
@@ -110,27 +148,42 @@ After all four reviewers return:
    or construct a minimal reproduction. Mark any finding you cannot verify as
    **[unverified]**.
 
-### 5. Present the Report
+### 6. Present the Report
 
-Present a single unified review report to the user:
+Present a single unified review report to the user with these sections, in
+order:
 
 1. **Summary** — one-paragraph overview: number of findings by severity, overall
    assessment (e.g., "ready to merge with minor fixes" or "has blocking issues").
-2. **Findings** — listed by severity (critical first). Each finding includes:
-   - The severity tag and title
-   - Location (as a file link with line numbers)
-   - Problem description
-   - Suggested fix
-   - Which reviewer perspectives flagged it
-3. **Trade-offs** — any unresolved disagreements between reviewers, with both
+
+2. **Positive observations** — bullet list of things the reviewers agreed were
+   done well (design choices, test quality, documentation, etc.). Keep it brief
+   but genuine — this provides signal about what to preserve during remediation.
+
+3. **Findings — Critical / High** — a Markdown table with columns:
+   `#`, `Severity`, `Title`, `Location`, `Flagged by`, `Status`.
+   Below the table, expand each row with the full problem description and
+   suggested fix.
+
+4. **Findings — Medium** — same table + expansion format.
+
+5. **Findings — Low** — same table + expansion format.
+
+6. **Trade-offs** — any unresolved disagreements between reviewers, with both
    sides stated.
-4. **Action prompt** — ask the user which findings to address. Options:
-   - Fix all
+
+7. **Remediation plan** — a numbered, ordered list of concrete steps to address
+   the findings. Group related fixes into a single step where sensible. Each
+   step should name the finding(s) it addresses and briefly describe what to
+   do. Order by priority: blocking issues first, then medium, then low.
+
+8. **Action prompt** — ask the user which findings to address. Options:
+   - Fix all (follow the remediation plan)
    - Select specific findings by number
    - Dismiss specific findings
    - Ask for clarification
 
-### 6. Apply Fixes
+### 7. Apply Fixes
 
 For each approved finding:
 
@@ -140,7 +193,7 @@ For each approved finding:
 
 If a fix is ambiguous or touches architecture, ask the user for guidance.
 
-### 7. Check Exit or Re-review
+### 8. Check Exit or Re-review
 
 After all approved fixes are applied:
 
diff --git a/.github/skills/multi-perspective-plan/SKILL.md b/.github/skills/multi-perspective-plan/SKILL.md
index 048878e..e25eea0 100644
--- a/.github/skills/multi-perspective-plan/SKILL.md
+++ b/.github/skills/multi-perspective-plan/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: multi-perspective-plan
-description: "Multi-perspective planning with adversarial review loop. Use when: planning complex changes, designing architecture, evaluating implementation strategies, drafting implementation plans, or when /plan is invoked. Spawns three planner subagents, synthesizes their outputs, then iteratively hardens the plan through an adversarial review loop until it passes scrutiny."
+description: "Multi-perspective planning with rebuttal rounds and adversarial review loop. Use when: planning complex changes, designing architecture, evaluating implementation strategies, drafting implementation plans, or when /plan is invoked. Spawns three planner subagents, runs rebuttals on disagreements, synthesizes their outputs, then iteratively hardens the plan through an adversarial review loop until it passes scrutiny."
 argument-hint: "Describe the change or feature to plan"
 ---
 
@@ -25,22 +25,22 @@ subagent can work from the same facts. Read the relevant source files and tests.
 Summarize the current state in a brief context block that will be included in
 every subagent prompt.
 
-### 2. Spawn Three Planner Subagents
+### 2. Spawn Three Planner Lens Subagents
 
-Launch three subagents **in parallel**. Each receives the same context block
-plus a persona directive. Each must return a concrete, step-by-step
+Launch three subagents **in parallel**, each using a named lens agent. Each
+receives the same context block and must return a concrete, step-by-step
 implementation plan (not just commentary).
 
-| # | Persona | Directive |
-|---|---------|-----------|
-| 1 | **Speed** | Obsessed with performance. Minimize latency and overhead at all costs. Inline aggressively, avoid abstractions, prefer lock-free and wait-free primitives. Tolerate complexity if it buys speed. |
-| 2 | **Usability** | Prioritize clean, readable, maintainable code. Favor clear abstractions, good naming, and small functions. Accept modest performance cost for clarity. |
-| 3 | **Conservative** | Minimize the changeset. Touch as few lines as possible. Prefer surgical edits over refactors. Reuse existing patterns. Resist new dependencies or abstractions. |
+| # | Agent | Focus |
+|---|-------|-------|
+| 1 | `speed-lens` | Performance — minimize latency and overhead |
+| 2 | `usability-lens` | Clarity — clean, readable, maintainable code |
+| 3 | `conservative-lens` | Scope — minimal changeset, surgical edits |
 
 Each subagent prompt must include:
 
 - The shared context block
-- The persona directive (from the table above)
+- An instruction to operate in **planning mode**
 - A request for a **numbered step-by-step plan** with rationale per step
 - A request for **risks and mitigations** specific to their perspective
 
@@ -53,33 +53,57 @@ analysis noting:
 - Points of disagreement (trade-offs to resolve)
 - Any gaps none of the planners addressed
 
-### 4. Synthesize
+### 4. Rebuttals (If Disagreements Exist)
 
-Send all three plans **plus your analysis** to a fourth subagent with the
-directive:
+If step 3 identified points of disagreement, run a rebuttal round.
 
-> You are a senior engineer synthesizing three competing implementation plans
-> into one final plan. Preserve the strongest ideas from each perspective.
-> Where planners disagree, make an explicit trade-off decision and justify it.
-> The final plan must be a numbered step-by-step implementation sequence with
-> clear rationale. Flag any unresolved risks.
+For **each disagreement**, identify which lenses hold competing positions. Then
+spawn those lenses **in parallel** as fresh subagents operating in **rebuttal
+mode**. Each subagent receives:
+
+- The specific point of disagreement
+- Its own original recommendation
+- The competing recommendation(s) from the other lens(es)
+- An instruction to argue concisely for why its approach is best and why the
+  alternatives are inferior — one turn only
+
+Collect the rebuttals. If there are **no disagreements**, skip this step
+entirely.
+
+### 5. Synthesize
+
+Send all three original plans, **your analysis from step 3**, and **any
+rebuttals from step 4** to a `synthesis-lens` subagent operating in **planning
+mode**.
+
+The subagent must produce a numbered step-by-step implementation sequence with
+clear rationale. For each disagreement, it must pick one option and justify the
+choice by engaging with the rebuttal arguments — not ignoring or averaging
+them. Flag any unresolved risks.
+
+If the synthesis agent reports any **unresolved disagreements** (trade-offs it
+could not resolve), **stop and present them to the user**. For each unresolved
+item, show:
+
+- The competing options with their lens attribution
+- The key argument from each side's rebuttal
+- Why the choice matters
+
+Wait for the user to decide before proceeding. Incorporate the user's decisions
+into the plan.
 
 The output of this step is the **draft plan**.
 
-### 5. Adversarial Review Loop
+### 6. Adversarial Review Loop
 
 Iteratively harden the draft plan by running adversarial reviews until the plan
 passes scrutiny. Each iteration proceeds as follows:
 
-#### 5a. Spawn Adversarial Reviewer
+#### 6a. Spawn Adversarial Reviewer
 
-Launch a fresh subagent with the directive:
+Launch a fresh `adversarial-lens` subagent operating in **planning mode** with
+the following prompt structure:
 
-> You are an adversarial reviewer. Assume this plan is wrong. Actively try to
-> break the design. Look for race conditions, deadlocks, ABA problems, platform
-> bugs, edge cases, missing error handling, reference counting errors, and
-> failure modes. Start from skepticism and only endorse what survives scrutiny.
->
 > **Plan to review:**
 > {include the full draft plan}
 >
@@ -101,12 +125,12 @@ Launch a fresh subagent with the directive:
 > - Do NOT fabricate issues. Only report genuine problems.
 > - Order findings by severity (critical first).
 
-#### 5b. Evaluate Findings
+#### 6b. Evaluate Findings
 
 After the adversarial reviewer returns:
 
 - If the reviewer reports **"LGTM"** (no issues found), the plan is final.
-  Proceed to step 6.
+  Proceed to step 7.
 - If the reviewer reports findings, address them:
   - For **critical** and **high** findings: revise the plan to fix or mitigate
     each issue. Update the draft plan in-place.
@@ -114,23 +138,23 @@ After the adversarial reviewer returns:
     add as a documented risk in the plan.
   - For **low** findings: note and move on.
 
-#### 5c. Check for Stuck State
+#### 6c. Check for Stuck State
 
 If after addressing findings you are **unsure how to proceed** — for example,
 the adversarial reviewer raises a concern that conflicts with a core
 requirement, or two mitigations are mutually exclusive — **stop and ask the
 user** for guidance. Present the specific dilemma and the options you see.
 
-#### 5d. Repeat
+#### 6d. Repeat
 
-Go back to step 5a with the revised plan. Use a fresh subagent each time (no
+Go back to step 6a with the revised plan. Use a fresh subagent each time (no
 memory of previous passes).
 
 **Bound:** If the loop has run **3 times** without reaching LGTM, present the
 current plan to the user with all remaining unresolved findings and ask how to
 proceed.
 
-### 6. Present
+### 7. Present
 
 Present the final plan to the user for approval. Clearly attribute which ideas
 came from which perspective where relevant. Note any risks that survived the
diff --git a/.github/skills/review-loop/SKILL.md b/.github/skills/review-loop/SKILL.md
index fc41490..c601b21 100644
--- a/.github/skills/review-loop/SKILL.md
+++ b/.github/skills/review-loop/SKILL.md
@@ -42,6 +42,10 @@ Gather the full content of the target so it can be passed to the reviewer.
 
 ### 2. Spawn Reviewer Subagent
 
+If a specific lens is requested (e.g., `correctness-lens`, `security-lens`,
+`adversarial-lens`, etc.), use that named lens agent operating in **review
+mode**. Otherwise, use a generic reviewer.
+
 Launch a subagent with the following prompt structure:
 
 > You are a code reviewer performing a thorough review of the following target.
diff --git a/.github/skills/testing-with-boc/SKILL.md b/.github/skills/testing-with-boc/SKILL.md
index 4834c66..46c5b3c 100644
--- a/.github/skills/testing-with-boc/SKILL.md
+++ b/.github/skills/testing-with-boc/SKILL.md
@@ -11,6 +11,13 @@ functions that run once all required **cowns** (concurrently-owned data) are
 available. Testing BOC programs requires specific patterns because behaviors
 execute asynchronously on worker interpreters.
 
+> **Design guidance:** if you find yourself reaching for `time.sleep`,
+> `threading.Event`, polling loops, or `wait_for_*` helpers inside a
+> behavior or in test code that drives one, stop and read
+> `thinking-in-boc` first. The right answer is almost always to express
+> the dependency through the cown graph, not through a classical
+> synchronization primitive.
+
 ## Key Concepts
 
 | Concept | Description |
@@ -47,6 +54,50 @@ def fixed(x):               # 1 param matches 1 @when arg
     return x.value * factor  # factor captured from enclosing scope
 ```
 
+### Do not use the `def _(c, x=x)` loop-capture idiom
+
+A common Python idiom for snapshotting a loop variable is to bind it as a
+default argument:
+
+```python
+for i, c in enumerate(cowns):
+    @when(c)
+    def _(c, i=i):          # unnecessary AND breaks @when
+        send("done", i)
+```
+
+**You don't need this with `@when`.** The transpiler rewrites the call site as
+`whencall('__behavior__N', (c,), (i,))`, snapshotting captures into a tuple at
+schedule time. There is no late-binding hazard to defend against — just
+reference the loop variable directly:
+
+```python
+for i, c in enumerate(cowns):
+    @when(c)
+    def _(c):
+        send("done", i)     # i is captured by value at schedule time
+```
+
+Adding `i=i` to the signature actively breaks the behavior. The transpiler
+treats every name in the signature as a behavior parameter and discards the
+default, so the worker sees a function with an extra positional arg that the
+runtime never supplies. See the "Inspecting Transpiler Output" section of
+`.github/copilot-instructions.md` for how to use `export_module.py` to
+confirm exactly which names are parameters and which are captures.
+
+If you do want a fresh scope per iteration (e.g. to avoid sharing mutable
+state between iterations), use a helper function:
+
+```python
+def _schedule(c, i):                # fresh scope per iteration
+    @when(c)
+    def _(c):
+        send("done", i)
+
+for i, c in enumerate(cowns):
+    _schedule(c, i)
+```
+
 ### Critical rule: classes and functions must be declared at module level
 
 Behaviors run in separate sub-interpreters. The transpiler exports the module so
@@ -93,7 +144,7 @@ body after scheduling a behavior. Instead, use `send` to ship the result out of
 the behavior and `receive` in the test to collect and verify it.
 
 ```python
-from bocpy import Cown, when, send, receive, TIMEOUT, wait
+from bocpy import Cown, when, send, receive, drain, TIMEOUT, wait
 
 RECEIVE_TIMEOUT = 10
 
@@ -107,21 +158,32 @@ class TestExample:
 
         Uses a timeout so that if a behavior never fires (e.g. due to a
         parameter-count mismatch in @when) the test fails quickly instead
-        of hanging forever.
+        of hanging forever. The "assert" queue is always drained before
+        returning so leftover messages from a failing test do not leak
+        into subsequent tests in CI.
         """
         failed = None
-        for _ in range(count):
-            result = receive("assert", RECEIVE_TIMEOUT)
-            assert result[0] != TIMEOUT, (
-                "Timed out waiting for an 'assert' message from a behavior. "
-                "Check that every @when arg count matches the decorated "
-                "function's parameter count."
-            )
-            _, (actual, expected) = result
-            if actual != expected:
-                failed = (actual, expected)
+        timed_out = False
+        try:
+            for _ in range(count):
+                result = receive("assert", RECEIVE_TIMEOUT)
+                if result[0] == TIMEOUT:
+                    timed_out = True
+                    break
+                _, (actual, expected) = result
+                if failed is None and actual != expected:
+                    failed = (actual, expected)
+        finally:
+            drain("assert")
+
+        assert not timed_out, (
+            "Timed out waiting for an 'assert' message from a behavior. "
+            "Check that every @when arg count matches the decorated "
+            "function's parameter count."
+        )
         if failed is not None:
-            assert failed[0] != failed[1]
+            actual, expected = failed
+            assert actual == expected, f"expected {expected!r}, got {actual!r}"
 
     def test_double(self):
         x = Cown(3)
@@ -275,7 +337,10 @@ def test_cown_grouping(self):
 
 ## Pattern 5 — Exception Propagation
 
-If a behavior raises, the exception is captured in the returned cown's `.value`.
+If a behavior raises, the exception is captured in the returned cown's `.value`
+**and** the cown's `.exception` flag is set to `True`. This lets downstream
+behaviors distinguish a thrown exception from a value that just happens to be
+an `Exception` instance returned normally.
 
 ```python
 def test_exception_in_behavior(self):
@@ -287,13 +352,152 @@ def test_exception_in_behavior(self):
 
     @when(bad)
     def _(b):
+        send("assert", (b.exception, True))
         send("assert", (isinstance(b.value, ZeroDivisionError), True))
-        b.value = None         # clear so it doesn't propagate further
+        b.value = None         # writing .value clears the exception flag
+
+    self.receive_asserts(2)
+
+
+def test_returned_exception_is_not_flagged(self):
+    """An Exception object *returned* from a behavior is just a value."""
+    x = Cown(1)
+
+    @when(x)
+    def returns_exc(x):
+        return ValueError("not really an error")
+
+    @when(returns_exc)
+    def _(r):
+        send("assert", (r.exception, False))
+        send("assert", (isinstance(r.value, ValueError), True))
+
+    self.receive_asserts(2)
+```
+
+Notes:
+
+- Writing `cown.value = ...` from inside a behavior **clears** `.exception`.
+- `cown.exception` is also writable inside a behavior, in case you want to
+  manually mark or unmark a cown as carrying an error.
+- Always assert on `.exception` before `isinstance(.value, Exception)` —
+  otherwise a behavior that legitimately returns an `Exception` will be
+  indistinguishable from one that raised.
+
+## Pattern 6 — Noticeboard
+
+The noticeboard is a global key-value store (up to 64 keys) that behaviors can
+read and write **without** acquiring any cowns. Writes are non-blocking; reads
+return a snapshot taken once per behavior execution.
+
+| Function | Purpose |
+|----------|---------|
+| `notice_write(key, value)` | Non-blocking write. |
+| `notice_update(key, fn, default=None)` | Atomic read-modify-write. `fn` and `default` must be picklable. Returning `REMOVED` deletes the entry. |
+| `notice_delete(key)` | Non-blocking delete. |
+| `noticeboard()` | Read-only mapping — snapshot of the noticeboard, cached for the duration of the current behavior. |
+| `notice_read(key, default=None)` | Convenience: one key from the snapshot. |
+
+### Key rule: snapshot per behavior
+
+Within a single behavior, `noticeboard()` and `notice_read()` always return
+data from the **same** snapshot — even if other behaviors write in the
+meantime. To see a write made by another behavior, schedule a follow-up
+behavior (typically by chaining via a cown returned from `@when`).
+
+```python
+def test_noticeboard_roundtrip(self):
+    x = Cown(0)
+
+    @when(x)
+    def step1(x):
+        notice_write("greeting", "hello")
+
+    # The chain on `step1` ensures step2 runs *after* the write has been
+    # applied and step2's snapshot sees it.
+    @when(x, step1)
+    def step2(x, _):
+        send("assert", (notice_read("greeting"), "hello"))
 
     self.receive_asserts()
 ```
 
-## Pattern 6 — Parameterized Tests
+### Atomic update
+
+`notice_update` runs `fn(current_value)` on the scheduler and writes the
+result back atomically. Lambdas and closures are **not** picklable — use a
+module-level function (optionally wrapped with `functools.partial`) or an
+`operator` function.
+
+```python
+from functools import partial
+from operator import add
+
+def _bump(n, by):
+    return n + by
+
+class TestCounter:
+    @classmethod
+    def teardown_class(cls):
+        wait()
+
+    def test_atomic_increment(self):
+        x = Cown(0)
+
+        @when(x)
+        def init(x):
+            notice_write("count", 0)
+
+        @when(x, init)
+        def bump(x, _):
+            notice_update("count", partial(_bump, by=5))
+            notice_update("count", partial(add, 3))
+
+        @when(x, bump)
+        def check(x, _):
+            send("assert", (notice_read("count"), 8))
+
+        receive_asserts()
+```
+
+### Delete via `REMOVED`
+
+Returning the `REMOVED` sentinel from a `notice_update` callback deletes the
+entry. `notice_delete(key)` is the direct form.
+
+```python
+def _drop_if_zero(n):
+    return REMOVED if n == 0 else n - 1
+
+def test_remove_via_update(self):
+    x = Cown(0)
+
+    @when(x)
+    def init(x):
+        notice_write("lives", 1)
+
+    @when(x, init)
+    def tick(x, _):
+        notice_update("lives", _drop_if_zero)   # 1 -> 0
+        notice_update("lives", _drop_if_zero)   # 0 -> REMOVED
+
+    @when(x, tick)
+    def check(x, _):
+        send("assert", ("lives" in noticeboard(), False))
+
+    self.receive_asserts()
+```
+
+### Common noticeboard pitfalls
+
+| Pitfall | Fix |
+|---------|-----|
+| Reading a value back inside the **same** behavior that wrote it | The snapshot was taken at the start of the behavior. Chain a follow-up `@when` to observe the write. |
+| Passing a lambda or closure to `notice_update` | They are not picklable. Use a module-level function with `functools.partial`, or an `operator` function. |
+| Asserting in the test body that `noticeboard()` contains a key | Read inside a behavior and `send` the result out — `noticeboard()` and `notice_read()` outside any behavior return a snapshot that is never refreshed. |
+| Writing more than 64 distinct keys | Excess writes are dropped with a logged warning — they do **not** raise. Keep tests within the limit (and `notice_delete` keys you no longer need). |
+
+## Pattern 7 — Parameterized Tests
 
 Use `@pytest.mark.parametrize` to sweep inputs. Each invocation gets its own
 cowns so tests are isolated.
@@ -311,7 +515,7 @@ def test_fibonacci(self, n):
     self.receive_asserts()
 ```
 
-## Pattern 7 — Testing `send`/`receive` Messaging Directly
+## Pattern 8 — Testing `send`/`receive` Messaging Directly
 
 For code that uses the lower-level messaging API without behaviors:
 
@@ -335,7 +539,7 @@ def test_timeout_with_after_callback():
     assert value == 42
 ```
 
-## Pattern 8 — Complex Objects in Cowns
+## Pattern 9 — Complex Objects in Cowns
 
 Mutable objects (e.g., class instances) work inside cowns. Behaviors mutate them
 in-place under exclusive access.
diff --git a/.github/skills/thinking-in-boc/SKILL.md b/.github/skills/thinking-in-boc/SKILL.md
new file mode 100644
index 0000000..f12f86b
--- /dev/null
+++ b/.github/skills/thinking-in-boc/SKILL.md
@@ -0,0 +1,270 @@
+---
+name: thinking-in-boc
+description: "Think in Behavior-Oriented Concurrency, not threads-and-locks. Use when: writing or reviewing any bocpy code (library, examples, tests), about to reach for time.sleep / threading.Event / atomic counters / polling loops / wait_for_* helpers, designing how a downstream behavior observes an upstream one, scheduling work to run after the next worker is free, or building loop / tail-recursion patterns. Catches the reflex to apply classical synchronization to a problem that wants a cown."
+---
+
+# Thinking in Behavior-Oriented Concurrency
+
+This skill is a corrective. The default reflex when synchronizing concurrent
+work is to reach for **threads-and-locks** primitives: shared state, a mutex,
+a condition variable, a busy-wait loop, an atomic counter, an event flag, a
+`Future`. In BOC, those answers are almost always wrong — not because they
+break, but because they bypass the very mechanism that makes BOC safe and
+fast.
+
+The BOC question is not *"what synchronization primitive do I need here?"*
+
+It is: ***"what cown is this work ordered against, and what behavior should
+run when that cown is free?"***
+
+Read this skill any time you catch yourself writing one of the smells below.
+
+## The smells
+
+If you find yourself typing any of these inside, or in code that interacts
+with, a BOC program — **stop and re-derive the design.**
+
+| Smell | What you almost certainly meant |
+|-------|---------------------------------|
+| `time.sleep(...)` in a polling loop | Schedule a behavior on the cown the predicate depends on. |
+| `while not <flag>: ...` busy-wait | Same — make `<flag>` a cown and `@when(flag)` a behavior. |
+| `threading.Event` / `Condition` / `Lock` | A cown plus a behavior chain. |
+| `wait_for_<x>_version(target)` polling | `@when(downstream_cowns)` — let the cown graph order it. |
+| `atomic_counter` from Python | A `Cown(int)` mutated inside `@when(counter)`. |
+| `Future`, `Queue.get()`, "ferry one value out" | `return` the value from a behavior; `@when(that_behavior)` reads it. |
+| `time.sleep(0)` "yield" | `@when()` — the empty-cown behavior runs when a worker is free. |
+| `if work_remaining: do_work(); else: stop` in a thread loop | A **behavior loop**: the behavior re-schedules itself with `@when(state)` on the same cown until done. |
+
+The smells are signals that you are managing concurrency *outside* the
+runtime. The runtime cannot help you make that correct or fast.
+
+## The replacements
+
+There are only a handful of BOC patterns. Almost every problem decomposes
+into one of them.
+
+### 1. Sequencing on data — `@when(cown)`
+
+A behavior runs when its cowns are free. That is the entire ordering
+mechanism. If `step2` must observe `step1`'s effect on `x`, both behaviors
+take `x`:
+
+```python
+@when(x)
+def step1(x):
+    x.value = "ready"
+
+@when(x)
+def step2(x):
+    assert x.value == "ready"
+```
+
+You did not need a lock. You did not need an event. You did not need to
+poll. The runtime acquired `x` for `step1`, released it, and only then gave
+it to `step2`.
+
+### 2. Fan-in / barrier — `@when(cowns)` vs `@when(a, b, c)`
+
+There are two distinct shapes for "this behavior depends on multiple
+cowns" and choosing the right one matters.
+
+**Use `@when(a, b, c)`** — separate positional arguments — when you know
+**at write-time exactly which cowns** the behavior needs and they have
+distinct roles. The decorated function takes one named parameter per
+cown:
+
+```python
+@when(account_a, account_b)
+def transfer(src, dst):                     # two roles, two names
+    dst.value += src.value
+    src.value = 0
+```
+
+**Use `@when(cowns)`** — a single list/tuple argument — when the **set
+is dynamic or homogeneous** (its size is determined at runtime, or the
+cowns play the same role). The decorated function takes **one parameter**
+which is the list itself:
+
+```python
+cowns = [Cown(i) for i in range(N)]
+for c in cowns:
+    @when(c)
+    def producer(c):
+        ...                                 # writes whatever it writes
+
+@when(cowns)                                # one list arg, not *cowns
+def consumer(cowns):
+    total = sum(c.value for c in cowns)     # cowns IS the list
+```
+
+This is the classical N-way barrier, expressed as data dependence: the
+runtime acquires every cown in the list before the behavior runs, so the
+consumer cannot start until every producer behavior has returned. **Do
+not** spread the list with `*` — `@when` accepts the list directly, and
+spreading would force you to know `N` at write-time, defeating the point.
+
+Mixing the two forms — `@when(anchor, cowns)` — is also valid: the
+behavior takes one named parameter (`anchor`) plus one list parameter.
+
+### 3. Happens-after — chain on the prior behavior's result cown
+
+`@when` returns a `Cown` holding the behavior's result. Pass that cown to a
+later `@when` to enforce happens-after across unrelated data:
+
+```python
+@when(x)
+def writer(x):
+    notice_write("k", x.value)
+    notice_sync()                           # commit before returning
+
+@when(x, writer)                            # waits for writer to finish
+def reader(x, _):
+    assert notice_read("k") == x.value
+```
+
+### 4. Run when *any* worker is free — `@when()`
+
+`@when()` with no arguments schedules a behavior with no data dependencies.
+It runs as soon as a worker is available. Use this when you want some work
+to happen in the background and you do not need to coordinate with any
+particular cown — for example, sending a report after forks have been
+released:
+
+```python
+@when(left, right, hunger)
+def take_bite(left, right, hunger):
+    left.value.use(); right.value.use()
+    hunger.value -= 1
+    if hunger.value == 0:
+        # forks released when this behavior returns; the report goes
+        # out from a fresh behavior so it does not delay the release.
+        @when()
+        def _():
+            send("report", ("full", index))
+```
+
+`@when()` is also the BOC equivalent of "tail-call this on the worker
+pool" — it lets the current behavior return promptly while the follow-up
+work waits its turn.
+
+### 5. Behavior loops — tail-recursive self-scheduling
+
+To process work in chunks until done, do **not** write a `while` loop
+inside one behavior — that pins one worker for the duration. Instead, the
+behavior does one chunk and then **schedules the next iteration** on the
+same cown:
+
+```python
+def step(state: Cown[State]):
+    @when(state)
+    def _(state):
+        if state.value.done:
+            send("done", state.value.result)
+            return
+
+        state.value.do_one_chunk()
+        step(state)                         # tail-schedule next iteration
+```
+
+This is the BOC analogue of tail recursion. Each iteration releases the
+cown between chunks, so:
+
+- other behaviors waiting on `state` can interleave between iterations,
+- the worker is returned to the pool between chunks, and
+- work is naturally bounded by data availability — no busy-wait.
+
+`prime_factor.py` (`sieve_check` → `sieve_work` → `sieve_check`) is the
+canonical example in this repository.
+
+### 6. Flushing your own queued mutations — `notice_sync()`
+
+The noticeboard mutator runs on its own thread. `notice_write` /
+`notice_update` / `notice_delete` are fire-and-forget. If a *subsequent
+behavior* must observe your noticeboard mutation, call `notice_sync()` at
+the end of the writing behavior:
+
+```python
+@when(x)
+def writer(x):
+    notice_write("k", v)
+    notice_sync()                           # block until commit
+
+@when(x, writer)                            # now reader sees v
+def reader(x, _):
+    assert notice_read("k") == v
+```
+
+`notice_sync()` flushes **only the calling thread's** prior writes. For
+cross-producer ordering, lean on `@when(cowns)` (pattern 2) — let the cown
+graph do the synchronization, and let each writer's `notice_sync()` make
+its own commit visible before it releases its cown.
+
+### 7. Single-assignment rendezvous — the behavior's own result cown
+
+`@when` returns a `Cown` holding whatever the behavior returns. That cown
+*is* your rendezvous — there is no need to allocate a separate `Cown(None)`
+and assign into it:
+
+```python
+@when(x)
+def compute(x):
+    return expensive(x.value)               # the result lives in `compute`
+
+@when(compute)
+def consume(result):                        # result is a Cown
+    send("answer", result.value)            # unwrap with .value
+```
+
+This replaces `Future` / `Queue` for one-shot results. For streaming use
+the message queue (`send` / `receive`) directly.
+
+## The BOC checklist
+
+Before writing **any** synchronization, ask:
+
+1. **What cown does this work depend on?** If the answer is "none" you may
+   want `@when()`. If the answer is "X" you want `@when(X)`. If you know
+   at write-time exactly which cowns you need, prefer the explicit form
+   `@when(X, Y, Z)` — it is faster than the list form because the runtime
+   can resolve each dependency by position rather than iterating a
+   sequence. Only fall back to `@when([X, Y, Z])` (one list arg) when the
+   set is dynamic or homogeneous.
+2. **Who reads my output?** Their `@when(...)` should include the cown I
+   wrote to, or my behavior's result-cown.
+3. **Am I about to loop in one behavior?** If the loop body has any
+   release-friendly point, lift it into a behavior loop (pattern 5) so
+   other work can interleave.
+4. **Am I about to poll, sleep, or block?** Find the cown the predicate
+   depends on. Make the polling code a behavior on that cown.
+5. **Am I about to use a `threading.*` primitive inside a behavior?**
+   Almost certainly the wrong layer. Threads-and-locks primitives belong
+   only at the BOC runtime boundary (test setup, `wait()`, `receive()` for
+   assertions, the runtime's own internals).
+
+## When the classical answer *is* right
+
+Classical synchronization is correct in three places:
+
+1. **Outside the runtime, talking to it.** The test thread blocking on
+   `receive("assert")` for assertion messages is a thread-level wait, and
+   that is fine — it is the boundary between the test harness and the
+   behavior graph.
+2. **`wait()` itself.** The library uses condvars internally to block the
+   main thread until the runtime drains. Do not reinvent this.
+3. **C-level runtime internals.** `_core.c` uses mutexes and condvars
+   because it *implements* BOC. User Python code should not.
+
+If you are not in one of those three places and you are reaching for a
+classical primitive, walk back through the checklist.
+
+## Self-correction prompt
+
+If you have already written code that uses `time.sleep`, `wait_for_*`, an
+event flag, or a polling loop in a behavior or in code that schedules
+behaviors, treat it as a defect. Ask:
+
+> *Which cown carries the dependency I am polling on? Why is the
+> consuming work not a behavior on that cown?*
+
+Rewrite to remove the classical primitive. The result is almost always
+shorter, faster, and provably free of races.
diff --git a/.github/workflows/pr_gate.yml b/.github/workflows/pr_gate.yml
index 336353e..f3ffa65 100644
--- a/.github/workflows/pr_gate.yml
+++ b/.github/workflows/pr_gate.yml
@@ -62,7 +62,7 @@ jobs:
         run: pip install -e .[test] --verbose
 
       - name: Python test
-        run: pytest -vv
+        run: pytest -vv -s
 
   windows:
     runs-on: windows-latest
@@ -86,7 +86,7 @@ jobs:
         run: pip install -e .[test] --verbose
 
       - name: Python test
-        run: pytest -vv
+        run: pytest -vv -s
 
   macos-arm64:
     runs-on: macos-latest
@@ -110,7 +110,7 @@ jobs:
         run: pip install -e .[test] --verbose
 
       - name: Python test
-        run: pytest -vv
+        run: pytest -vv -s
 
   macos-x86_64:
     runs-on: macos-15-intel
@@ -134,7 +134,7 @@ jobs:
         run: pip install -e .[test] --verbose
 
       - name: Python test
-        run: pytest -vv
+        run: pytest -vv -s
 
   free-threaded:
     runs-on: ubuntu-latest
@@ -168,7 +168,7 @@ jobs:
         with:
           max_attempts: 3
           timeout_minutes: 10
-          command: pytest -vv
+          command: pytest -vv -s
 
   asan:
     runs-on: ubuntu-latest
@@ -218,4 +218,4 @@ jobs:
           UBSAN_OPTIONS: halt_on_error=1:print_stacktrace=1
         run: |
           source "$RUNNER_TEMP/asan-venv/bin/activate"
-          pytest -vv
\ No newline at end of file
+          pytest -vv -s
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 898929f..3d045ce 100644
--- a/.gitignore
+++ b/.gitignore
@@ -208,4 +208,5 @@ __marimo__/
 
 # venvs
 .env*
-.vscode
\ No newline at end of file
+.vscode
+.copilot
\ No newline at end of file
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b9c7b52..fa78877 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,95 @@
+## 2026-04-17 - Version 0.4.0
+Noticeboard, distributed scheduler, and a relocated examples package.
+
+**New Features**
+
+- **Noticeboard** — a shared key-value store (up to 64 keys) that
+  behaviors can read and write without acquiring cowns. Writes
+  (`notice_write`, `notice_delete`) are non-blocking; reads
+  (`noticeboard`, `notice_read`) return a cached snapshot taken once
+  per behavior execution. Atomic read-modify-write is available via
+  `notice_update`, which accepts a picklable function and an optional
+  default. Returning the `REMOVED` sentinel from the update function
+  deletes the entry. Mutations are serialized through a single
+  dedicated noticeboard thread so the C-level read-modify-write stays
+  consistent without forcing behaviors to take a mutex.
+- **`notice_sync`** — new public API that blocks until the caller's
+  prior `notice_write` / `notice_update` / `notice_delete` mutations
+  have been committed, providing a read-your-writes barrier for code
+  that hands work off to a subsequent behavior.
+- **`noticeboard_version`** — new public API returning a global,
+  monotonic version counter that increments on every successful
+  noticeboard commit. Useful as a cheap change-detection hint without
+  taking a full snapshot.
+- **Distributed scheduler** — the central scheduler thread has been
+  removed. Two-phase locking, request linking, and dispatch now run
+  in C (`BehaviorCapsule.schedule`) directly on the caller's thread,
+  and cown release runs on the worker thread that just executed the
+  behavior. Waiters are tracked with an MCS-style intrusive linked
+  list per cown, so resolving a behavior hands off straight to the
+  next waiter without bouncing through any central queue. The
+  C-level terminator is now the only pending counter.
+- **`Cown.exception` property** — new boolean property on `Cown` that
+  indicates whether the held value is the result of an unhandled
+  exception. Workers now call `set_exception` instead of `set_result`
+  when a behavior raises.
+- **Prime factor example** (`examples/prime_factor.py`, entry point
+  `bocpy-prime-factor`) — demonstrates parallel factorisation using
+  Pollard's rho algorithm with early termination coordinated via the
+  noticeboard.
+- **Benchmark harness** (`examples/benchmark.py`, entry point
+  `bocpy-bench`) — a new micro-benchmark suite covering scheduling
+  throughput, message-queue latency, and noticeboard contention.
+
+**Bug Fixes**
+
+- **Transpiler aliased imports** — `visit_Import` and `visit_ImportFrom`
+  now track the alias name (`import X as Y` / `from X import Y as Z`)
+  instead of the original name, preventing spurious "name not found"
+  errors and duplicate `whencall` injection.
+- **Global variable capture** — `@when` closure capture now falls back
+  to `frame.f_globals` when a name is not found in any local scope,
+  fixing `NameError` for module-level variables used inside behaviors.
+
+**Improvements**
+
+- **C mutex abstraction** — platform-specific mutex and condition-variable
+  code (`SRWLock`/`pthread`/C11 `mtx_t`) is now wrapped behind a
+  unified `BOCMutex`/`BOCCond` inline API, reducing `#ifdef` clutter
+  and simplifying future platform work.
+- **Matrix docstrings** — all `Matrix` C methods now carry built-in
+  docstrings visible to `help()` and Sphinx autodoc.
+- **Worker noticeboard hygiene** — workers clear the per-thread
+  noticeboard cache before each behavior and on shutdown, preventing
+  stale reads across behaviors.
+- **Examples package relocated** — example scripts moved from
+  `src/bocpy/examples/` to a top-level `examples/` directory, mapped
+  back into the `bocpy.examples` package via
+  `[tool.setuptools.package-dir]`. Console-script entry points are
+  unchanged.
+- **Filtered PyPI README** — `setup.py` now strips
+  `<!-- pypi-skip-start -->...<!-- pypi-skip-end -->` regions from
+  `README.md` before publishing, so unsupported content (e.g. Mermaid
+  diagrams) does not appear as raw text on PyPI. The project metadata
+  switches to `dynamic = ["readme"]` to enable this.
+- **Documentation refresh** — `README.md`, `sphinx/source/index.rst`,
+  and `sphinx/source/api.rst` have been substantially expanded to
+  cover the noticeboard, the distributed scheduler model, and the new
+  public APIs.
+- **New `thinking-in-boc` skill** — guidance for writing BOC code
+  without reaching for classical synchronization primitives.
+
+**Tests**
+
+- **`test/test_noticeboard.py`** — new suite covering snapshot
+  semantics, `notice_update` atomicity, `REMOVED`, `notice_sync`,
+  and version-counter monotonicity.
+- **`test/test_scheduling_stress.py`** — new stress suite for the
+  distributed scheduler covering 2PL ordering, duplicate-cown
+  handling, exception propagation, and high-fan-out workloads.
+- **`test/test_transpiler.py`** — new direct tests for AST extraction,
+  capture rewriting, aliased imports, and module export.
+
 ## 2026-04-02 - Version 0.3.1
 CownCapsule serialization support for nested cowns.
 
diff --git a/CITATION.cff b/CITATION.cff
index c0bf736..2169b20 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -5,6 +5,6 @@ authors:
   given-names: "Matthew Alastair"
   orcid: "https://orcid.org/0000-0002-1019-8036"
 title: "bocpy"
-version: 0.3.1
-date-released: 2026-04-02
+version: 0.4.0
+date-released: 2026-04-17
 url: "https://github.com/microsoft/bocpy"
\ No newline at end of file
diff --git a/README.md b/README.md
index 2cba001..e94c8c3 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@ perspective, those functions work like normal. Importantly, the programmer's tas
 shifts from solving concurrent data access problems to organizing data flow through
 functions. The resulting programs are easier to understand, easier to support, easier
 to extend, and unlock multi-core performance due to the ability to schedule behaviors
-to run efficiently across multiple processes.
+to run efficiently across multiple sub-interpreters.
 
 BOC has been implemented in several languages, including as a foundational aspect
 of the research language Verona, and now has been implemented in Python. 
@@ -27,10 +27,61 @@ you have problems with your particular platform/version combination, please file
 an issue on [this repository](https://github.com/microsoft/bocpy/issues).
 
 > [!NOTE]
-> the `bocpy` library depends on the Cross-Interpreter Data mechanism, which was
-> introduced in Python 3.7. We explicitly test and provide wheels for all versions
-> of Python that have not been end-of-lifed (3.10.19 as of time of writing is the
-> oldest version we support). The library may not work in older versions of Python.
+> We provide wheels for Python 3.10 and newer, but `bocpy` only achieves true
+> parallelism on Python 3.12+, where each sub-interpreter has its own GIL. On
+> 3.10 and 3.11 behaviors still run, but they are serialised by the global GIL.
+> The library may not work on Python versions older than 3.10.
+
+### Python version support
+
+<!-- pypi-skip-start -->
+```mermaid
+gitGraph
+   commit id: "3.10"
+   commit id: "3.11"
+   commit id: "3.12" tag: "true parallelism"
+   commit id: "3.13"
+   branch "free-threaded"
+   commit id: "3.13t"
+   checkout main
+   commit id: "3.14"
+   checkout "free-threaded"
+   commit id: "3.14t"
+   checkout main
+   commit id: "3.15"
+   checkout "free-threaded"
+   commit id: "3.15t"
+```
+<!-- pypi-skip-end -->
+
+The mainline (`main`) branch in the diagram is the standard CPython build:
+
+- **3.10 / 3.11** — wheels are published and `@when` works, but every
+  sub-interpreter still shares one process-wide GIL, so behaviors execute one
+  at a time. Use these versions for portability rather than performance.
+- **3.12+** — each sub-interpreter gets its own GIL ([PEP 684][pep684]), so
+  worker behaviors run in parallel across cores. This is where bocpy delivers
+  on its concurrency story.
+- **3.14** is the current default development and CI target; **3.15** is
+  validated as it stabilises.
+
+The `free-threaded` branch tracks the no-GIL CPython builds (informally
+"3.13t", "3.14t", "3.15t" — see [PEP 703][pep703]). bocpy runs **unmodified**
+on these interpreters today: we don't re-enable the GIL, and the cown / 2PL
+protocol gives the same data-race-free guarantees you get on the GIL build.
+The catch is overhead — on free-threaded Python, the sub-interpreter and
+`XIData` machinery is pure ceremony, since plain threads in the main
+interpreter would already run in parallel.
+
+[Issue #5][issue5] tracks adding an alternative direct-threading backend that
+detects a free-threaded interpreter at runtime and skips the
+sub-interpreter / transpiler / `XIData` path entirely, while keeping the
+public `Cown` / `@when` API unchanged. We're holding off on that work until
+the free-threaded build and the relevant CPython APIs stabilise.
+
+[pep684]: https://peps.python.org/pep-0684/
+[pep703]: https://peps.python.org/pep-0703/
+[issue5]: https://github.com/microsoft/bocpy/issues/5
 
 A behavior can be thought of as a function which depends on zero or more
 concurrently-owned data objects (which we call **cowns**). As a programmer, you
@@ -158,7 +209,7 @@ wait()
 You can view the full example
 [here](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/sketches.py)
 
-The underlying BOC scheduler ensures that this operates without deadlock, by
+The BOC runtime ensures that this operates without deadlock, by
 construction.
 
 ### Examples
@@ -176,6 +227,16 @@ We provide a few examples to show different ways of using BOC in a program:
 5. [`bocpy-boids`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/boids.py): An agent-based bird flocking
    example demonstrating the `Matrix` class to do distributed computation over cores.
    Note: you'll need to install `pyglet` first in order to run the `bocpy-boids` example.
+6. [`bocpy-primes`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/primes.py)
+   and [`bocpy-prime-factor`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/prime_factor.py):
+   parallel prime sieve and Pollard's rho factorisation, the latter coordinating
+   early termination via the noticeboard.
+7. [`bocpy-calculator`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/calculator.py):
+   a small Erlang-style calculator service driven by `send`/`receive`.
+8. [`bocpy-cooking-threads`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/cooking_threads.py):
+   the cooking example written with plain threads, for comparison with `bocpy-cooking-boc`.
+9. [`bocpy-sketches`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/sketches.py):
+   the cheese-and-spam sketch shown above as a runnable script.
 
 
 ## Why BOC for Python?
@@ -187,21 +248,28 @@ can operate over its data without a need to change this familiar programming mod
 Even in a free-threading context, BOC will reduce contention on locks and provide
 programs which are data-race free by construction. Our initial research and experiments
 with BOC have shown near linear scaling over cores, with up to 32 concurrent worker
-processes.
+sub-interpreters.
 
 ### This library
-Our implementation is built on top of the subinterpreters mechanism and the
-Cross-Interpreter Data, or `XIData`, API. As of Python 3.12, sub-interpreters have
-their own GIL and thus run in parallel, and thus BOC will also run fully in
-parallel.
+Our implementation is built on top of the sub-interpreters mechanism and the
+Cross-Interpreter Data (`XIData`) API. As of Python 3.12 each sub-interpreter
+has its own GIL, so behaviors scheduled by `bocpy` run truly in parallel.
 
-In addition the providing the `when` function decorator, the library also exposes
+In addition to the `when` function decorator, the library also exposes
 low-level Erlang-style `send` and selective `receive` functions which enable
-lock-free communication across threads and subinterpreters. See the
+lock-free communication across threads and sub-interpreters. See the
 [`bocpy-primes`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/primes.py) and
 [`bocpy-calculator`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/calculator.py)
 examples for the usage of these lower-level functions.
 
+For cross-behavior data sharing that does not warrant a `Cown`, the library
+also provides a small **noticeboard** — a global key-value store of up to 64
+entries. Behaviors can `notice_write`, `notice_update` (atomic
+read-modify-write) and `notice_delete` keys without acquiring any cowns, and
+read a frozen snapshot via `noticeboard()` / `notice_read()`. The
+[`bocpy-prime-factor`](https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/prime_factor.py)
+example uses it to coordinate early termination across worker behaviors.
+
 ### Additional Info
 BOC is built on a solid foundation of serious scholarship and engineering. For further reading, please see:
 1. [When Concurrency Matters: Behaviour-Oriented Concurrency](https://dl.acm.org/doi/10.1145/3622852)
@@ -209,7 +277,7 @@ BOC is built on a solid foundation of serious scholarship and engineering. For f
 3. [OOPSLA23 Talk](https://www.youtube.com/watch?v=iX8TJWonbGU)
 
 > **Trademarks** This project may contain trademarks or logos for projects, products, or services.
-> Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s
+> Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's
 > Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this
-> project must not cause confusion or imply Microsoft sponsorship. Any use of third-party 
-> trademarks or logos are subject to those third-party’s policies.
+> project must not cause confusion or imply Microsoft sponsorship. Any use of third-party
+> trademarks or logos are subject to those third-party's policies.
diff --git a/src/bocpy/examples/README.md b/examples/README.md
similarity index 85%
rename from src/bocpy/examples/README.md
rename to examples/README.md
index ad118e7..779ba8d 100644
--- a/src/bocpy/examples/README.md
+++ b/examples/README.md
@@ -45,6 +45,15 @@ tens of thousands of behaviors and cowns every second. Lots of thanks to
 [Ben Eater's Boids repo](https://github.com/beneater/boids.git), which proved
 a helpful starting point.
 
+## [Prime Factor](prime_factor.py)
+This example generates a semiprime (a product of two primes) and then factors it
+in parallel using multiple search lanes. Each lane is a chain of small behaviors
+that check the [noticeboard](http://microsoft.github.io/bocpy/sphinx/api.html#bocpy.noticeboard)
+for a result before doing a batch of trial divisions. When any lane finds a
+factor it writes to the noticeboard, and the remaining lanes see the result on
+their next check and stop early. Demonstrates the "behavior loop" pattern and
+cross-behavior coordination via the noticeboard.
+
 ## Send/Receive
 In addition to exposing the higher-level behavior primitives (*i.e.*,
 `when`, `Cown`, `wait`), the library also exposes the lower-level functions
diff --git a/src/bocpy/examples/__init__.py b/examples/__init__.py
similarity index 100%
rename from src/bocpy/examples/__init__.py
rename to examples/__init__.py
diff --git a/src/bocpy/examples/assets/cheese.txt b/examples/assets/cheese.txt
similarity index 100%
rename from src/bocpy/examples/assets/cheese.txt
rename to examples/assets/cheese.txt
diff --git a/src/bocpy/examples/assets/menu.txt b/examples/assets/menu.txt
similarity index 100%
rename from src/bocpy/examples/assets/menu.txt
rename to examples/assets/menu.txt
diff --git a/src/bocpy/examples/bank.py b/examples/bank.py
similarity index 100%
rename from src/bocpy/examples/bank.py
rename to examples/bank.py
diff --git a/examples/benchmark.py b/examples/benchmark.py
new file mode 100644
index 0000000..615ec64
--- /dev/null
+++ b/examples/benchmark.py
@@ -0,0 +1,1118 @@
+"""Chain-ring microbenchmark for the BOC runtime.
+
+This benchmark measures *BOC runtime scaling* (scheduler, 2PL, message
+queue, sub-interpreter crossings, return-cown allocation) in isolation
+from any application-specific serial work.  It is **not** a measure of
+how well your own application will scale: real applications carry
+serial costs (data structure construction, scheduling logic,
+result drainage) that this benchmark deliberately eliminates.
+
+A few load-bearing caveats baked into the design:
+
+* Each behavior allocates a fresh return ``Cown`` (the auto-generated
+  one returned by ``@when``).  At thousands of behaviors per second
+  this is a real, version-dependent constant in every sample.
+* ``ChainState`` crosses the interpreter boundary via XIData on every
+  reschedule; for tiny payloads, marshaling can rival the useful work.
+* The ``group-size`` sweep varies acquired-set cardinality and CPU work
+  together (the inner loop multiplies every window slot into
+  ``window[0]``, ``iters * group_size`` matrix multiplies per
+  behavior).  It is not an isolated 2PL-cost knob.
+"""
+
+import argparse
+import json
+import os
+import socket
+import statistics
+import subprocess
+import sys
+import time
+from dataclasses import asdict, dataclass, field
+from datetime import datetime
+from typing import Optional
+
+from bocpy import (Cown, Matrix, noticeboard, notice_write, receive, send,
+                   start, wait, when)
+
+# Sentinels for the parent/child JSON protocol.  Uppercase so the
+# transpiler keeps them as module-level constants in the worker export.
+SENTINEL_BEGIN = "---BOCPY-BENCH-BEGIN---"
+SENTINEL_END = "---BOCPY-BENCH-END---"
+SCHEMA_VERSION = 1
+
+
+# ---------------------------------------------------------------------------
+# Behavior code (chain workload)
+# ---------------------------------------------------------------------------
+
+
+class ChainState:
+    """Per-chain mutable state carried inside a ``Cown[ChainState]``.
+
+    Holds ints only.  The chain's ring of ``Cown[Matrix]`` lives in the
+    noticeboard under ``f"ring_{ring_id}"`` so it is materialized once
+    per worker (and cached for the lifetime of ``NB_VERSION``) instead
+    of being marshaled through XIData on every reschedule.
+    """
+
+    def __init__(self, chain_id: int, ring_id: int, head_idx: int,
+                 iters: int, stride: int, ring_size: int):
+        """Initialize a chain state.
+
+        :param chain_id: A unique id within the workload.
+        :param ring_id: Index of the ring this chain runs on.  Must
+            correspond to a ``f"ring_{ring_id}"`` entry already
+            written to the noticeboard.
+        :param head_idx: Initial head position on the ring.
+        :param iters: Inner-loop matrix multiplications per window slot.
+        :param stride: Step between successive windows.
+        :param ring_size: Number of cowns on the ring.
+        """
+        self.chain_id = chain_id
+        self.ring_id = ring_id
+        self.head_idx = head_idx
+        self.count = 0
+        self.iters = iters
+        self.stride = stride
+        self.ring_size = ring_size
+
+
+def next_window(cs: "ChainState", group_size: int) -> list:
+    """Compute the next sliding window of cowns for a chain.
+
+    Reads the chain's ring from the noticeboard.  Must be called from
+    inside a behavior so that ``noticeboard()`` returns the cached
+    snapshot for the current ``NB_VERSION``.
+
+    :param cs: The chain state.
+    :param group_size: Number of adjacent cowns in the window.
+    :return: ``list[Cown[Matrix]]`` for the next acquired set.
+    """
+    ring = noticeboard()[f"ring_{cs.ring_id}"]
+    return [ring[(cs.head_idx + i * cs.stride) % cs.ring_size]
+            for i in range(group_size)]
+
+
+def schedule_step(state_cown: Cown, window_list: list, group_size: int) -> None:
+    """Schedule one chain step with the given window.
+
+    The static ``@when`` decorator inside this helper is rewritten by
+    the transpiler into a ``whencall`` invocation, so this function
+    works correctly when called from a worker sub-interpreter (where
+    the Python ``when`` decorator is not wired up).
+
+    :param state_cown: The chain's state cown.
+    :param window_list: Adjacent cowns to acquire for this step.
+    :param group_size: Window size, captured into the behavior.
+    """
+    @when(state_cown, window_list)
+    def _step(state, window):
+        cs = state.value
+        # When ``cr_null`` is set, skip the matmul loop entirely.  The
+        # behavior still acquires its window of cowns, mutates
+        # ``ChainState``, and reschedules itself — so the measured
+        # throughput reflects pure BOC runtime overhead (2PL, queue
+        # ops, sub-interpreter crossings, return-cown allocation)
+        # with the application work removed.
+        if not noticeboard().get("cr_null", False):
+            # The inner loop's first slot multiplies window[0] by itself.
+            # Intentional — it keeps the per-behavior multiply count
+            # exactly `iters * group_size`.
+            for _ in range(cs.iters):
+                for c in window:
+                    window[0].value = window[0].value @ c.value
+
+        cs.count += 1
+        cs.head_idx = (cs.head_idx + cs.stride) % cs.ring_size
+        if not noticeboard().get("cr_stop", False):
+            # Pass the already-acquired `state` cown wrapper directly
+            # rather than the closure-captured `state_cown` to keep the
+            # capture set minimal.
+            schedule_step(state, next_window(cs, group_size), group_size)
+
+
+# ---------------------------------------------------------------------------
+# Configuration and result types (plain data only; no Cowns)
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class BenchConfig:
+    """Plain-data benchmark configuration.
+
+    Holds only ints / floats / strings / lists of the same so that an
+    instance can stay live in ``main()``'s frame across ``wait()``
+    without ``stop_workers`` finding any bare Cowns to acquire.
+    """
+
+    workers: int = 1
+    duration: float = 5.0
+    warmup: float = 1.0
+    iters: int = 2000
+    group_size: int = 2
+    stride: int = 1
+    rings: Optional[int] = None
+    chains_per_ring: Optional[int] = None
+    ring_size: int = 128
+    payload_rows: int = 16
+    payload_cols: int = 16
+    repeats: int = 1
+    null_payload: bool = False
+
+
+@dataclass
+class RepeatResult:
+    """Plain-data result for a single repeat of one sweep point."""
+
+    repeat_index: int
+    completed_behaviors: int
+    elapsed_s: float
+    throughput: float
+    wall_clock_ns_start: int
+
+
+@dataclass
+class PointResult:
+    """Plain-data result for a single sweep point."""
+
+    inputs: dict
+    repeats: list = field(default_factory=list)
+    throughput_mean: Optional[float] = None
+    throughput_stdev: Optional[float] = None
+    throughput_min: Optional[float] = None
+    throughput_max: Optional[float] = None
+    error: Optional[dict] = None
+
+
+# ---------------------------------------------------------------------------
+# Sizing / validation helpers (parent-side, no BOC required)
+# ---------------------------------------------------------------------------
+
+
+def derive_sizes(cfg: BenchConfig) -> BenchConfig:
+    """Auto-size ``rings`` and ``chains_per_ring`` if not overridden.
+
+    :param cfg: An input config (mutated and returned).
+    :return: The same config with ``rings`` / ``chains_per_ring`` set.
+    """
+    if cfg.chains_per_ring is None:
+        cfg.chains_per_ring = max(
+            1, cfg.ring_size // (cfg.group_size * cfg.stride * 2))
+    if cfg.rings is None:
+        cfg.rings = max(cfg.workers * 4 // cfg.chains_per_ring,
+                        cfg.workers * 2)
+    return cfg
+
+
+def validate_config(cfg: BenchConfig) -> Optional[str]:
+    """Validate a fully-derived config; return an error string or None.
+
+    Hard errors only.  Soft warnings (``duration < 1.0``, oversubscribed
+    workers) are emitted by the caller rather than failing here.
+
+    :param cfg: A config with ``rings`` and ``chains_per_ring`` set.
+    :return: An error message, or ``None`` if the config is valid.
+    """
+    if cfg.group_size * cfg.stride * 2 > cfg.ring_size:
+        return (f"group_size*stride*2 ({cfg.group_size}*{cfg.stride}*2) "
+                f"> ring_size ({cfg.ring_size}); chains would collide")
+    if cfg.workers < 1:
+        return f"workers must be >= 1, got {cfg.workers}"
+    if cfg.iters < 1:
+        return f"iters must be >= 1, got {cfg.iters}"
+    if cfg.payload_rows < 1 or cfg.payload_cols < 1:
+        return "payload dimensions must be >= 1"
+    if cfg.duration <= 0 or cfg.warmup < 0:
+        return "duration must be > 0 and warmup must be >= 0"
+    return None
+
+
+def emit_soft_warnings(cfg: BenchConfig, cpu_count: int) -> None:
+    """Print soft warnings for unusual configs to stderr.
+
+    :param cfg: The fully-derived config.
+    :param cpu_count: Detected CPU count for oversubscription check.
+    """
+    if cfg.duration < 1.0:
+        print(f"warning: duration={cfg.duration}s is short; results will "
+              "be noisy", file=sys.stderr)
+    if cfg.workers > cpu_count:
+        print(f"warning: workers={cfg.workers} exceeds cpu_count="
+              f"{cpu_count}; oversubscribed", file=sys.stderr)
+
+
+# ---------------------------------------------------------------------------
+# Workload construction
+# ---------------------------------------------------------------------------
+
+
+def build_workload(cfg: BenchConfig):
+    """Build per-ring cowns and per-chain state cowns.
+
+    Each ring is published to the noticeboard under ``f"ring_{r}"``.
+    Workers read it back via ``noticeboard()`` inside ``_step``; the
+    noticeboard's per-worker version-cache means the ring is
+    materialized once per worker per ``NB_VERSION`` instead of being
+    marshaled through XIData on every reschedule.
+
+    :param cfg: A fully-derived config.
+    :return: A ``(rings, state_cowns)`` tuple.  ``rings`` is
+        ``list[list[Cown[Matrix]]]``; ``state_cowns`` is
+        ``list[Cown[ChainState]]``.  Both containers are invisible to
+        ``stop_workers`` (it does not recurse into containers).
+    """
+    rings = []
+    state_cowns = []
+    chain_id = 0
+    for r in range(cfg.rings):
+        ring = [Cown(Matrix.uniform(0.0, 1.0,
+                                    (cfg.payload_rows, cfg.payload_cols)))
+                for _ in range(cfg.ring_size)]
+        rings.append(ring)
+        notice_write(f"ring_{r}", ring)
+        # Spread chains evenly across the ring so adjacent chains'
+        # initial windows don't overlap.
+        spacing = max(1, cfg.ring_size // cfg.chains_per_ring)
+        for k in range(cfg.chains_per_ring):
+            head = (k * spacing) % cfg.ring_size
+            cs = ChainState(chain_id=chain_id, ring_id=r, head_idx=head,
+                            iters=cfg.iters, stride=cfg.stride,
+                            ring_size=cfg.ring_size)
+            state_cowns.append(Cown(cs))
+            chain_id += 1
+    return rings, state_cowns
+
+
+# ---------------------------------------------------------------------------
+# Snapshot helpers (used by the measurement flow)
+# ---------------------------------------------------------------------------
+
+
+def schedule_snap(state_cowns: list) -> None:
+    """Schedule the final snapshot + publish behaviors.
+
+    See the module docstring for the snap ordering invariant.  This
+    helper is structured so that the bare ``snap`` and ``_publish``
+    return-cown locals fall out of scope at its return boundary,
+    satisfying the no-bare-Cowns-in-main rule before ``wait()`` runs.
+
+    :param state_cowns: Every chain's state cown.
+    """
+    @when(state_cowns)
+    def snap(states):
+        return sum(s.value.count for s in states)
+
+    notice_write("cr_stop", True)
+
+    @when(snap)
+    def _publish(s):
+        send("snap", s.value)
+
+
+def emit_chain_snapshot(state_cown: Cown, tag: str) -> None:
+    """Send a chain's ``(count, head_idx)`` over the queue under ``tag``.
+
+    Used by tests that need to inspect chain progress directly.  The
+    helper lives in this module so the ``@when`` decorator runs through
+    the transpiler that registered ``schedule_step``.
+
+    :param state_cown: The chain's state cown.
+    :param tag: The tag to ``send`` the snapshot under.
+    """
+    @when(state_cown)
+    def _emit(s):
+        send(tag, (s.value.count, s.value.head_idx))
+
+
+# ---------------------------------------------------------------------------
+# Single-point measurement (in-process; one BOC start/wait cycle)
+# ---------------------------------------------------------------------------
+
+
+def run_single_point_body(cfg: BenchConfig, repeat_index: int) -> RepeatResult:
+    """Run one measurement in a fresh BOC runtime; return plain data.
+
+    :param cfg: The fully-derived config.
+    :param repeat_index: Index of this repeat for reporting.
+    :return: A ``RepeatResult`` with no Cown references.
+    """
+    # Start the runtime first: ``build_workload`` writes rings to the
+    # noticeboard, and noticeboard writes require the runtime to be
+    # running.
+    start(worker_count=cfg.workers)
+    rings, state_cowns = build_workload(cfg)
+    # Publish the null-payload toggle so worker behaviors can read it
+    # from their per-behavior noticeboard snapshot.  Written before the
+    # warmup sleep so the noticeboard thread has flushed it well
+    # before t_measure_start.
+    notice_write("cr_null", cfg.null_payload)
+    payload_bytes = cfg.payload_rows * cfg.payload_cols * 8
+    total_bytes = cfg.rings * cfg.ring_size * payload_bytes
+    print(f"workload: rings={cfg.rings} ring_size={cfg.ring_size} "
+          f"chains={cfg.rings * cfg.chains_per_ring} "
+          f"payload={cfg.payload_rows}x{cfg.payload_cols} "
+          f"(~{total_bytes / 1024:.1f} KiB matrix data)",
+          file=sys.stderr)
+
+    try:
+        # Kick off one chain per (ring, chain-slot) pair.  Recompute the
+        # head positions exactly the way `build_workload` chose them:
+        # we cannot read `cs_cown.value` from the main thread because
+        # Cowns are released to the runtime on construction.
+        spacing = max(1, cfg.ring_size // cfg.chains_per_ring)
+        chain_idx = 0
+        for r in range(cfg.rings):
+            for k in range(cfg.chains_per_ring):
+                cs_cown = state_cowns[chain_idx]
+                head = (k * spacing) % cfg.ring_size
+                window = [rings[r][(head + i * cfg.stride) % cfg.ring_size]
+                          for i in range(cfg.group_size)]
+                schedule_step(cs_cown, window, cfg.group_size)
+                chain_idx += 1
+
+        time.sleep(cfg.warmup)
+        wall_clock_ns_start = time.time_ns()
+        t_measure_start = time.perf_counter()
+        time.sleep(cfg.duration)
+
+        schedule_snap(state_cowns)
+        msg = receive(["snap"], 60.0 + cfg.duration)
+        t_snap_received = time.perf_counter()
+        if msg is None or msg[0] != "snap":
+            raise RuntimeError("snap behavior did not publish in time")
+        _, total = msg
+        elapsed_s = t_snap_received - t_measure_start
+    finally:
+        # Drop bare-Cown locals before wait().
+        del rings
+        del state_cowns
+        wait()
+
+    throughput = total / elapsed_s if elapsed_s > 0 else 0.0
+    return RepeatResult(repeat_index=repeat_index,
+                        completed_behaviors=int(total),
+                        elapsed_s=elapsed_s,
+                        throughput=throughput,
+                        wall_clock_ns_start=wall_clock_ns_start)
+
+
+# ---------------------------------------------------------------------------
+# Subprocess orchestration
+# ---------------------------------------------------------------------------
+
+
+def cfg_to_argv(cfg: BenchConfig) -> list:
+    """Render a ``BenchConfig`` as CLI args for a child invocation.
+
+    :param cfg: The config to serialize.
+    :return: A list of CLI arguments suitable for child invocation.
+    """
+    args = [
+        "--workers", str(cfg.workers),
+        "--duration", str(cfg.duration),
+        "--warmup", str(cfg.warmup),
+        "--iters", str(cfg.iters),
+        "--group-size", str(cfg.group_size),
+        "--stride", str(cfg.stride),
+        "--ring-size", str(cfg.ring_size),
+        "--payload-rows", str(cfg.payload_rows),
+        "--payload-cols", str(cfg.payload_cols),
+        "--repeats", "1",
+        "--sweep-axis", "none",
+    ]
+    if cfg.rings is not None:
+        args += ["--rings", str(cfg.rings)]
+    if cfg.chains_per_ring is not None:
+        args += ["--chains-per-ring", str(cfg.chains_per_ring)]
+    if cfg.null_payload:
+        args += ["--null-payload"]
+    return args
+
+
+def run_in_subprocess(cfg: BenchConfig, repeat_index: int,
+                      git_sha: Optional[str]) -> RepeatResult:
+    """Run one repeat in a fresh subprocess and return its result.
+
+    On non-zero exit / timeout / missing sentinel, raises
+    ``RuntimeError`` with a stderr-tail diagnostic so the caller can
+    record an ``error`` entry on the point.
+
+    :param cfg: A fully-derived config with ``repeats`` ignored.
+    :param repeat_index: Index into the parent's ``repeats[]`` list.
+    :param git_sha: Optional git sha to forward to the child.
+    """
+    env = dict(os.environ)
+    if git_sha is not None:
+        env["BOCPY_BENCH_GIT_SHA"] = git_sha
+
+    cmd = [sys.executable, "-m", "bocpy.examples.benchmark",
+           "--json-stdout"] + cfg_to_argv(cfg)
+    timeout = max(cfg.duration * 3 + 30, cfg.duration + cfg.warmup + 60)
+    try:
+        proc = subprocess.run(cmd, env=env, capture_output=True,
+                              text=True, timeout=timeout, check=False)
+    except subprocess.TimeoutExpired as ex:
+        raise RuntimeError(
+            f"subprocess timed out after {timeout}s; "
+            f"stderr tail: {(ex.stderr or '')[-400:]!r}")
+
+    if proc.returncode != 0:
+        raise RuntimeError(
+            f"subprocess exited {proc.returncode}; "
+            f"stderr tail: {proc.stderr[-400:]!r}")
+
+    payload = _extract_sentinel_payload(proc.stdout)
+    if payload is None:
+        raise RuntimeError(
+            "child produced no sentinel-framed JSON; "
+            f"stderr tail: {proc.stderr[-400:]!r}")
+
+    return RepeatResult(
+        repeat_index=repeat_index,
+        completed_behaviors=int(payload["completed_behaviors"]),
+        elapsed_s=float(payload["elapsed_s"]),
+        throughput=float(payload["throughput"]),
+        wall_clock_ns_start=int(payload["wall_clock_ns_start"]))
+
+
+def _extract_sentinel_payload(stdout: str) -> Optional[dict]:
+    """Find and parse exactly one sentinel-framed JSON object.
+
+    :param stdout: The captured child stdout.
+    :return: The parsed payload, or ``None`` if no valid frame.
+    """
+    begin = stdout.find(SENTINEL_BEGIN)
+    end = stdout.find(SENTINEL_END)
+    if begin < 0 or end < 0 or end < begin:
+        return None
+    inner = stdout[begin + len(SENTINEL_BEGIN):end].strip()
+    try:
+        return json.loads(inner)
+    except json.JSONDecodeError:
+        return None
+
+
+# ---------------------------------------------------------------------------
+# Sweep orchestration (parent side)
+# ---------------------------------------------------------------------------
+
+
+def cfg_for_axis(base: BenchConfig, axis: str, value) -> BenchConfig:
+    """Clone ``base`` with one axis varied to ``value``.
+
+    :param base: The base config.
+    :param axis: One of ``workers``, ``iters``, ``group-size``,
+        ``payload``, ``none``.
+    :param value: The axis value (an ``int`` for most axes; a
+        ``(rows, cols)`` tuple for ``payload``).
+    :return: A fresh ``BenchConfig`` with that axis applied.
+    """
+    cfg = BenchConfig(**asdict(base))
+    # Reset auto-sized fields so each point recomputes.
+    cfg.rings = base.rings
+    cfg.chains_per_ring = base.chains_per_ring
+    if axis == "workers":
+        cfg.workers = int(value)
+        cfg.rings = None
+        cfg.chains_per_ring = None
+    elif axis == "iters":
+        cfg.iters = int(value)
+    elif axis == "group-size":
+        cfg.group_size = int(value)
+        cfg.chains_per_ring = None
+        cfg.rings = None
+    elif axis == "payload":
+        cfg.payload_rows, cfg.payload_cols = value
+    elif axis == "none":
+        pass
+    else:
+        raise ValueError(f"unknown axis: {axis}")
+    return derive_sizes(cfg)
+
+
+def summarize_repeats(reps: list) -> dict:
+    """Compute mean/stdev/min/max across repeats with the null-stdev rule.
+
+    With fewer than 2 repeats, ``stdev`` / ``min`` / ``max`` are
+    emitted as JSON null rather than zero, to avoid false zero-height
+    error bars in downstream plots.
+
+    :param reps: A list of ``RepeatResult``.
+    :return: A dict with mean, stdev, min, max.
+    """
+    if not reps:
+        return {"mean": None, "stdev": None, "min": None, "max": None}
+    throughputs = [r.throughput for r in reps]
+    if len(throughputs) < 2:
+        return {"mean": throughputs[0], "stdev": None,
+                "min": None, "max": None}
+    return {
+        "mean": statistics.fmean(throughputs),
+        "stdev": statistics.stdev(throughputs),
+        "min": min(throughputs),
+        "max": max(throughputs),
+    }
+
+
+def run_sweep(axis: str, values: list, base: BenchConfig,
+              git_sha: Optional[str], output_path: str,
+              metadata: dict) -> dict:
+    """Run a sweep, flushing JSON to disk after every point.
+
+    :param axis: Sweep axis name.
+    :param values: Per-axis values in order.
+    :param base: Base configuration.
+    :param git_sha: Optional git sha to forward to children.
+    :param output_path: Destination JSON file.
+    :param metadata: Initial metadata dict (will be updated with
+        ``finished_at`` at end).
+    :return: The final results dict (also written to disk).
+    """
+    points = []
+    fixed = asdict(base)
+    fixed.pop("workers", None) if axis == "workers" else None
+    rendered_values = [list(v) if isinstance(v, tuple) else v for v in values]
+    sweep_meta = {"axis": axis, "values": rendered_values, "fixed": fixed}
+
+    interrupted = False
+    for value in values:
+        cfg = cfg_for_axis(base, axis, value)
+        err = validate_config(cfg)
+        inputs = asdict(cfg)
+        if err is not None:
+            point = PointResult(inputs=inputs,
+                                error={"message": err, "stderr_tail": ""})
+            points.append(asdict(point))
+            print(f"point {axis}={value}: validation error: {err}",
+                  file=sys.stderr)
+            _flush_results(output_path, metadata, sweep_meta, points)
+            continue
+
+        repeats: list = []
+        try:
+            for r in range(base.repeats):
+                print(f"point {axis}={value} repeat {r + 1}/{base.repeats}: "
+                      "spawning child...", file=sys.stderr)
+                try:
+                    rep = run_in_subprocess(cfg, r, git_sha)
+                    repeats.append(rep)
+                    print(f"  -> {rep.throughput:.1f} behaviors/s "
+                          f"({rep.completed_behaviors} in "
+                          f"{rep.elapsed_s:.2f}s)", file=sys.stderr)
+                except RuntimeError as ex:
+                    point = PointResult(
+                        inputs=inputs,
+                        repeats=[asdict(r) for r in repeats],
+                        error={"message": str(ex), "stderr_tail": ""})
+                    points.append(asdict(point))
+                    _flush_results(output_path, metadata, sweep_meta, points)
+                    repeats = None  # marker: already appended
+                    break
+        except KeyboardInterrupt:
+            interrupted = True
+            metadata["interrupted"] = True
+            if repeats:
+                point = PointResult(
+                    inputs=inputs,
+                    repeats=[asdict(r) for r in repeats],
+                    error={"message": "interrupted", "stderr_tail": ""})
+                points.append(asdict(point))
+            _flush_results(output_path, metadata, sweep_meta, points)
+            break
+
+        if repeats is None:
+            continue
+
+        summary = summarize_repeats(repeats)
+        point = PointResult(
+            inputs=inputs,
+            repeats=[asdict(r) for r in repeats],
+            throughput_mean=summary["mean"],
+            throughput_stdev=summary["stdev"],
+            throughput_min=summary["min"],
+            throughput_max=summary["max"])
+        points.append(asdict(point))
+        _flush_results(output_path, metadata, sweep_meta, points)
+
+    metadata["finished_at"] = datetime.now().isoformat(timespec="seconds")
+    metadata["interrupted"] = interrupted or metadata.get("interrupted", False)
+    final = _flush_results(output_path, metadata, sweep_meta, points)
+    return final
+
+
+def _flush_results(path: str, metadata: dict, sweep_meta: dict,
+                   points: list) -> dict:
+    """Atomic write of the results JSON; falls back to in-place on Windows.
+
+    :param path: Destination file path.
+    :param metadata: Top-level metadata dict.
+    :param sweep_meta: Sweep description dict.
+    :param points: List of point dicts.
+    :return: The full results document that was written.
+    """
+    document = {
+        "schema_version": SCHEMA_VERSION,
+        "metadata": metadata,
+        "sweep": sweep_meta,
+        "points": points,
+    }
+    serialized = json.dumps(document, indent=2, default=_json_default)
+    os.makedirs(os.path.dirname(os.path.abspath(path)) or ".", exist_ok=True)
+    tmp = path + ".tmp"
+    with open(tmp, "w", encoding="utf-8") as f:
+        f.write(serialized)
+    delays = (0.05, 0.1, 0.2)
+    for attempt, delay in enumerate(delays):
+        try:
+            os.replace(tmp, path)
+            return document
+        except PermissionError:
+            if attempt == len(delays) - 1:
+                print(f"warning: atomic rename failed after {len(delays)} "
+                      "attempts; falling back to in-place overwrite",
+                      file=sys.stderr)
+                with open(path, "w", encoding="utf-8") as f:
+                    f.write(serialized)
+                try:
+                    os.unlink(tmp)
+                except OSError:
+                    pass
+                return document
+            time.sleep(delay)
+    return document
+
+
+def _json_default(obj):
+    """Coerce non-JSON-native objects (e.g. tuples) for serialization.
+
+    :param obj: An object json.dumps could not serialize natively.
+    :return: A JSON-serializable representation.
+    """
+    if isinstance(obj, (set, frozenset)):
+        return list(obj)
+    raise TypeError(f"object of type {type(obj).__name__} is not "
+                    "JSON-serializable")
+
+
+# ---------------------------------------------------------------------------
+# Metadata
+# ---------------------------------------------------------------------------
+
+
+def collect_metadata(argv: list, git_sha: Optional[str]) -> dict:
+    """Collect metadata for the top of the results JSON.
+
+    :param argv: The parent's ``sys.argv``.
+    :param git_sha: The git sha (or None).
+    :return: A metadata dict.
+    """
+    try:
+        bocpy_version = _read_bocpy_version()
+    except Exception:
+        bocpy_version = None
+
+    free_threaded = bool(getattr(sys, "_is_gil_enabled",
+                                 lambda: True)() is False)
+    return {
+        "hostname": socket.gethostname(),
+        "platform": sys.platform,
+        "cpu_count": os.cpu_count() or 0,
+        "python_version": sys.version.split()[0],
+        "python_implementation": sys.implementation.name,
+        "free_threaded": free_threaded,
+        "bocpy_version": bocpy_version,
+        "git_sha": git_sha,
+        "started_at": datetime.now().isoformat(timespec="seconds"),
+        "finished_at": None,
+        "argv": list(argv),
+        "interrupted": False,
+    }
+
+
+def _read_bocpy_version() -> Optional[str]:
+    """Best-effort read of bocpy's version from importlib.metadata.
+
+    :return: Version string or None on failure.
+    """
+    try:
+        from importlib.metadata import version
+        return version("bocpy")
+    except Exception:
+        return None
+
+
+def _git_sha() -> Optional[str]:
+    """Read git sha if available; cheap-and-fail-quietly.
+
+    :return: A 12-char abbreviated sha, or None.
+    """
+    cached = os.environ.get("BOCPY_BENCH_GIT_SHA")
+    if cached:
+        return cached
+    try:
+        out = subprocess.run(
+            ["git", "rev-parse", "--short=12", "HEAD"],
+            capture_output=True, text=True, timeout=5, check=False)
+        if out.returncode == 0:
+            return out.stdout.strip() or None
+    except (FileNotFoundError, subprocess.TimeoutExpired):
+        pass
+    return None
+
+
+# ---------------------------------------------------------------------------
+# ASCII table renderer
+# ---------------------------------------------------------------------------
+
+
+def render_table(document: dict) -> str:
+    """Render a compact ASCII summary table from a results document.
+
+    :param document: A loaded results JSON.
+    :return: A multi-line string ready to print.
+    """
+    axis = document["sweep"]["axis"]
+    points = document["points"]
+    interrupted = document.get("metadata", {}).get("interrupted", False)
+
+    lines = []
+    show_speedup = axis == "workers"
+    baseline = None
+    if show_speedup and points:
+        first = points[0]
+        if interrupted or first.get("error") is not None \
+                or first.get("throughput_mean") is None:
+            show_speedup = False
+            lines.append("note: speedup/efficiency suppressed (baseline "
+                         "missing, errored, or interrupted run)")
+        else:
+            baseline = first["throughput_mean"]
+
+    headers = [axis, "throughput", "stdev"]
+    if show_speedup:
+        headers += ["speedup", "efficiency"]
+    rows = []
+    for pt in points:
+        if pt.get("error") is not None:
+            row = [_axis_label(axis, pt), "ERROR", "-"]
+            if show_speedup:
+                row += ["-", "-"]
+            rows.append(row)
+            continue
+        mean = pt.get("throughput_mean")
+        stdev = pt.get("throughput_stdev")
+        row = [
+            _axis_label(axis, pt),
+            f"{mean:.1f}" if mean is not None else "-",
+            f"{stdev:.1f}" if stdev is not None else "-",
+        ]
+        if show_speedup:
+            speedup = (mean / baseline) if mean and baseline else None
+            workers = pt["inputs"]["workers"]
+            efficiency = (speedup / workers) if speedup and workers else None
+            row += [
+                f"{speedup:.2f}x" if speedup is not None else "-",
+                f"{efficiency:.0%}" if efficiency is not None else "-",
+            ]
+        rows.append(row)
+
+    widths = [max(len(h), max((len(r[i]) for r in rows), default=0))
+              for i, h in enumerate(headers)]
+    sep = "-+-".join("-" * w for w in widths)
+    lines.append(" | ".join(h.ljust(widths[i]) for i, h in enumerate(headers)))
+    lines.append(sep)
+    for r in rows:
+        lines.append(" | ".join(r[i].ljust(widths[i]) for i in range(len(r))))
+    return "\n".join(lines)
+
+
+def _axis_label(axis: str, pt: dict) -> str:
+    """Render the axis cell value for a point row.
+
+    :param axis: Sweep axis name.
+    :param pt: A point dict.
+    :return: A string for the axis column.
+    """
+    inputs = pt.get("inputs", {})
+    if axis == "workers":
+        return str(inputs.get("workers"))
+    if axis == "iters":
+        return str(inputs.get("iters"))
+    if axis == "group-size":
+        return str(inputs.get("group_size"))
+    if axis == "payload":
+        return f"{inputs.get('payload_rows')}x{inputs.get('payload_cols')}"
+    return "-"
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+
+def parse_payload_token(token: str) -> tuple:
+    """Parse a payload token of the form ``"<rows>x<cols>"``.
+
+    :param token: The CLI token.
+    :return: A ``(rows, cols)`` tuple.
+    """
+    if "x" not in token:
+        raise argparse.ArgumentTypeError(
+            f"payload value {token!r} must look like '<rows>x<cols>'")
+    rs, cs = token.split("x", 1)
+    try:
+        rows, cols = int(rs), int(cs)
+    except ValueError:
+        raise argparse.ArgumentTypeError(
+            f"payload value {token!r}: rows and cols must be integers")
+    if rows < 1 or cols < 1:
+        raise argparse.ArgumentTypeError(
+            f"payload value {token!r}: rows and cols must be >= 1")
+    return (rows, cols)
+
+
+def parse_sweep_values(axis: str, raw: Optional[str]) -> list:
+    """Parse ``--sweep-values`` per-axis at argparse time.
+
+    :param axis: The sweep axis.
+    :param raw: The raw CSV string, or None.
+    :return: A list of values appropriate for the axis.
+    """
+    if axis == "none":
+        if raw:
+            raise argparse.ArgumentTypeError(
+                "--sweep-values must be empty when --sweep-axis is 'none'")
+        return [None]
+    if raw is None:
+        return _default_sweep_values(axis)
+    tokens = [t.strip() for t in raw.split(",") if t.strip()]
+    if not tokens:
+        return _default_sweep_values(axis)
+    if axis in ("workers", "iters", "group-size"):
+        out = []
+        for t in tokens:
+            try:
+                out.append(int(t))
+            except ValueError:
+                raise argparse.ArgumentTypeError(
+                    f"--sweep-values: token {t!r} is not an integer "
+                    f"(axis={axis})")
+        return out
+    if axis == "payload":
+        return [parse_payload_token(t) for t in tokens]
+    raise argparse.ArgumentTypeError(f"unknown axis: {axis}")
+
+
+def _default_sweep_values(axis: str) -> list:
+    """Return the documented default sweep values for an axis.
+
+    :param axis: The sweep axis name.
+    :return: A list of default values.
+    """
+    cpu = os.cpu_count() or 1
+    if axis == "workers":
+        return sorted(set([1, 2, 4, 8, min(16, cpu)]))
+    if axis == "iters":
+        return [250, 500, 1000, 2000, 4000, 8000]
+    if axis == "group-size":
+        return [1, 2, 4, 8]
+    if axis == "payload":
+        return [(4, 4), (8, 8), (16, 16), (32, 32), (64, 64)]
+    return []
+
+
+def build_arg_parser() -> argparse.ArgumentParser:
+    """Build the CLI argument parser.
+
+    :return: A configured ``argparse.ArgumentParser``.
+    """
+    p = argparse.ArgumentParser(
+        prog="bocpy.examples.benchmark",
+        description="Microbenchmark for the BOC runtime.")
+    p.add_argument("--workers", type=int, default=None)
+    p.add_argument("--sweep-axis",
+                   choices=("workers", "iters", "group-size", "payload",
+                            "none"),
+                   default="workers")
+    p.add_argument("--sweep-values", default=None)
+    p.add_argument("--duration", type=float, default=5.0)
+    p.add_argument("--warmup", type=float, default=None)
+    p.add_argument("--iters", type=int, default=2000)
+    p.add_argument("--group-size", type=int, default=2, dest="group_size")
+    p.add_argument("--stride", type=int, default=1)
+    p.add_argument("--rings", type=int, default=None)
+    p.add_argument("--chains-per-ring", type=int, default=None,
+                   dest="chains_per_ring")
+    p.add_argument("--ring-size", type=int, default=128, dest="ring_size")
+    p.add_argument("--payload-rows", type=int, default=16,
+                   dest="payload_rows")
+    p.add_argument("--payload-cols", type=int, default=16,
+                   dest="payload_cols")
+    p.add_argument("--repeats", type=int, default=1)
+    p.add_argument("--null-payload", dest="null_payload",
+                   action="store_true", default=False,
+                   help="Skip the matmul inner loop in each behavior. "
+                        "Throughput then reflects pure BOC runtime "
+                        "overhead with the application work removed.")
+    p.add_argument("--output", default=None)
+    p.add_argument("--table", dest="table", action="store_true", default=None)
+    p.add_argument("--no-table", dest="table", action="store_false")
+    p.add_argument("--quiet", action="store_true")
+    p.add_argument("--json-stdout", action="store_true",
+                   help="Run a single point and print sentinel-framed "
+                        "JSON to stdout (subprocess internal).")
+    p.add_argument("--print-table", default=None,
+                   help="Print a table from an existing JSON file and exit.")
+    return p
+
+
+def args_to_base_cfg(args) -> BenchConfig:
+    """Build a base ``BenchConfig`` from parsed CLI args.
+
+    :param args: The parsed argparse namespace.
+    :return: A ``BenchConfig`` (not yet derived).
+    """
+    workers = args.workers if args.workers is not None else 1
+    warmup = args.warmup
+    if warmup is None:
+        warmup = min(1.0, args.duration * 0.1)
+    return BenchConfig(
+        workers=workers,
+        duration=args.duration,
+        warmup=warmup,
+        iters=args.iters,
+        group_size=args.group_size,
+        stride=args.stride,
+        rings=args.rings,
+        chains_per_ring=args.chains_per_ring,
+        ring_size=args.ring_size,
+        payload_rows=args.payload_rows,
+        payload_cols=args.payload_cols,
+        repeats=args.repeats,
+        null_payload=args.null_payload,
+    )
+
+
+def child_main(args) -> int:
+    """Run a single point and emit a sentinel-framed JSON object.
+
+    Used by ``run_in_subprocess``.  The child does **not** run the
+    cross-worker validation gate — that runs once in the parent before
+    any sweep child is spawned.
+
+    :param args: The parsed argparse namespace.
+    :return: Process exit code.
+    """
+    cfg = derive_sizes(args_to_base_cfg(args))
+    err = validate_config(cfg)
+    if err is not None:
+        print(f"benchmark: invalid config: {err}", file=sys.stderr)
+        return 2
+    emit_soft_warnings(cfg, os.cpu_count() or 1)
+    rep = run_single_point_body(cfg, repeat_index=0)
+    payload = {
+        "inputs": asdict(cfg),
+        "completed_behaviors": rep.completed_behaviors,
+        "elapsed_s": rep.elapsed_s,
+        "throughput": rep.throughput,
+        "wall_clock_ns_start": rep.wall_clock_ns_start,
+    }
+    sys.stdout.write("\n" + SENTINEL_BEGIN + "\n")
+    sys.stdout.write(json.dumps(payload, default=_json_default))
+    sys.stdout.write("\n" + SENTINEL_END + "\n")
+    sys.stdout.flush()
+    return 0
+
+
+def parent_main(args) -> int:
+    """Run a sweep across the requested axis.
+
+    :param args: The parsed argparse namespace.
+    :return: Process exit code.
+    """
+    base = args_to_base_cfg(args)
+    try:
+        sweep_values = parse_sweep_values(args.sweep_axis, args.sweep_values)
+    except argparse.ArgumentTypeError as ex:
+        print(f"benchmark: {ex}", file=sys.stderr)
+        return 2
+
+    # Pre-spawn validation across every sweep point.
+    cpu = os.cpu_count() or 1
+    derived_points = []
+    for value in sweep_values:
+        cfg = cfg_for_axis(base, args.sweep_axis, value)
+        err = validate_config(cfg)
+        if err is not None:
+            print(f"benchmark: sweep point {args.sweep_axis}={value} "
+                  f"invalid: {err}", file=sys.stderr)
+            return 2
+        emit_soft_warnings(cfg, cpu)
+        derived_points.append(cfg)
+
+    git_sha = _git_sha()
+
+    # Wall-clock estimate for sweep duration.
+    startup_slack = 5.0
+    est = sum((cfg.duration + cfg.warmup + startup_slack) * base.repeats
+              for cfg in derived_points)
+    print(f"sweep estimate: {len(derived_points)} points "
+          f"x {base.repeats} repeats ~ {est:.0f}s wall clock",
+          file=sys.stderr)
+
+    output_path = args.output or _default_output_path()
+    metadata = collect_metadata(sys.argv, git_sha)
+    document = run_sweep(args.sweep_axis, sweep_values, base,
+                         git_sha, output_path, metadata)
+
+    if args.table is None:
+        show_table = sys.stdout.isatty()
+    else:
+        show_table = args.table
+    if show_table and not args.quiet:
+        print(render_table(document))
+    if not args.quiet:
+        print(f"results: {output_path}", file=sys.stderr)
+    return 0
+
+
+def _default_output_path() -> str:
+    """Compute the default output path under ``results/``.
+
+    Uses ``%Y%m%dT%H%M%S`` rather than ``isoformat()`` so the filename
+    is valid on Windows (no colons).
+
+    :return: A path string.
+    """
+    ts = datetime.now().strftime("%Y%m%dT%H%M%S")
+    host = socket.gethostname().replace(os.sep, "_")
+    return os.path.join("results", f"benchmark-{host}-{ts}.json")
+
+
+def main() -> int:
+    """CLI entry point.
+
+    :return: Process exit code.
+    """
+    if sys.version_info < (3, 12):
+        sys.exit("bocpy benchmarks require Python 3.12+ for "
+                 "sub-interpreter parallelism")
+
+    parser = build_arg_parser()
+    args = parser.parse_args()
+
+    if args.print_table is not None:
+        with open(args.print_table, encoding="utf-8") as f:
+            document = json.load(f)
+        print(render_table(document))
+        return 0
+
+    if args.json_stdout:
+        return child_main(args)
+
+    return parent_main(args)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/src/bocpy/examples/boids.py b/examples/boids.py
similarity index 62%
rename from src/bocpy/examples/boids.py
rename to examples/boids.py
index 004727c..930ee1f 100644
--- a/src/bocpy/examples/boids.py
+++ b/examples/boids.py
@@ -6,7 +6,7 @@
 import math
 from typing import Mapping, NamedTuple
 
-from bocpy import Cown, Matrix, receive, send, wait, when
+from bocpy import Cown, Matrix, receive, send, start, wait, when
 
 
 class BoundingBox(NamedTuple("BoundingBox", [("left", int), ("top", int), ("right", int), ("bottom", int)])):
@@ -351,12 +351,15 @@ def main():
     class Boids(pyglet.window.Window):
         """Pyglet window that renders a boids simulation."""
 
-        def __init__(self, width: int, height: int, num_boids: int):
+        def __init__(self, width: int, height: int, num_boids: int,
+                     show_overlay: bool = True):
             """Initialize the window and create boids.
 
             :param width: Window width in pixels.
             :param height: Window height in pixels.
             :param num_boids: The number of boids to simulate.
+            :param show_overlay: Whether to render the boid count and
+                behavior-rate overlay in the bottom-left corner.
             """
             pyglet.window.Window.__init__(self, width, height, "Boids")
             pyglet.gl.glClearColor(1, 1, 1, 1)
@@ -365,14 +368,21 @@ def __init__(self, width: int, height: int, num_boids: int):
             self.simulation = Simulation(num_boids, width, height)
             self.num_behaviors = 0
             self.samples = deque()
-
-            self.num_boids_label = pyglet.text.Label(f"#boids: {num_boids}",
-                                                     font_size=24, x=5, y=5,
-                                                     color=(100, 100, 100, 255))
-
-            self.behaviors_label = pyglet.text.Label("behavior/s: ",
-                                                     font_size=24, x=5, y=50,
-                                                     color=(100, 100, 100, 255))
+            self.show_overlay = show_overlay
+
+            if show_overlay:
+                self.num_boids_label = pyglet.text.Label(
+                    f"#boids: {num_boids}",
+                    font_size=24, x=5, y=5,
+                    color=(100, 100, 100, 255))
+
+                self.behaviors_label = pyglet.text.Label(
+                    "behavior/s: ",
+                    font_size=24, x=5, y=50,
+                    color=(100, 100, 100, 255))
+            else:
+                self.num_boids_label = None
+                self.behaviors_label = None
 
             self.triangles: pyglet.shapes.Triangle = []
             for _ in range(num_boids):
@@ -386,8 +396,9 @@ def on_draw(self):
             """Clear the window and draw all boid triangles."""
             self.clear()
             self.batch.draw()
-            self.num_boids_label.draw()
-            self.behaviors_label.draw()
+            if self.show_overlay:
+                self.num_boids_label.draw()
+                self.behaviors_label.draw()
 
         def on_close(self):
             wait()
@@ -409,7 +420,7 @@ def update(self, delta_time: float):
                 if len(self.samples) > 10:
                     self.samples.popleft()
 
-            if len(self.samples) > 3:
+            if len(self.samples) > 3 and self.behaviors_label is not None:
                 behavior_rate = sum(self.samples) / len(self.samples)
                 self.behaviors_label.text = f"behavior/s: {behavior_rate:.0f}"
 
@@ -431,10 +442,200 @@ def update(self, delta_time: float):
     parser.add_argument("--boids", "-b", type=int, default=300)
     parser.add_argument("--width", type=int, default=1200)
     parser.add_argument("--height", type=int, default=800)
+    parser.add_argument("--mode", choices=("window", "video"),
+                        default="window",
+                        help="window: interactive (default); "
+                             "video: render and pipe frames to ffmpeg.")
+    parser.add_argument("--duration", type=float, default=30.0,
+                        help="Seconds to simulate in video mode.")
+    parser.add_argument("--output", "-o", default="boids.mp4",
+                        help="Output path for video mode.")
+    parser.add_argument("--fps", type=int, default=30,
+                        help="Simulation/render rate. In video mode this is "
+                             "the encoded frame rate; in window mode this is "
+                             "the scheduled tick rate. The simulation "
+                             "integrates one step per tick, so this value "
+                             "controls on-screen speed in both modes.")
+    parser.add_argument("--workers", type=int, default=None,
+                        help="Number of BOC worker sub-interpreters. "
+                             "Defaults to bocpy's default (CPU count - 1).")
     args = parser.parse_args()
 
+    # Validate at the boundary; downstream code (Matrix sizing, hash modulo,
+    # 1.0/fps) assumes positive values and would crash or silently misbehave.
+    if args.boids <= 0:
+        parser.error("--boids must be positive")
+    if args.width <= 0 or args.height <= 0:
+        parser.error("--width and --height must be positive")
+    if args.duration <= 0:
+        parser.error("--duration must be positive")
+    if args.fps <= 0:
+        parser.error("--fps must be positive")
+    if args.workers is not None and args.workers <= 0:
+        parser.error("--workers must be positive")
+
+    # Start the BOC runtime explicitly so --workers takes effect for every
+    # mode.
+    start(worker_count=args.workers)
+
+    if args.mode == "video":
+        import subprocess
+
+        # Create the window first so we can query the actual framebuffer
+        # dimensions (which may differ from logical size on HiDPI displays).
+        # The overlay (boid count / behavior rate) is suppressed in video
+        # mode so the rendered output stays clean.
+        boids = Boids(args.width, args.height, args.boids,
+                      show_overlay=False)
+
+        # Allow graceful close: override on_close to set a flag and return
+        # True so pyglet does not destroy the window mid-frame. The loop
+        # below honors the flag; the finally block tears the window down.
+        # Use a bocpy-prefixed attribute name to avoid colliding with any
+        # underscore-prefixed pyglet internals.
+        boids.bocpy_video_closing = False
+
+        def _on_close():
+            boids.bocpy_video_closing = True
+            return True
+
+        boids.on_close = _on_close
+
+        # Determine the real framebuffer size (HiDPI-correct).
+        boids.switch_to()
+        boids.dispatch_events()
+        boids.clear()
+        boids.batch.draw()
+        first_buf = pyglet.image.get_buffer_manager().get_color_buffer()
+        fb_width = first_buf.width
+        fb_height = first_buf.height
+        if (fb_width, fb_height) != (args.width, args.height):
+            print(f"note: framebuffer is {fb_width}x{fb_height} "
+                  f"(window requested {args.width}x{args.height}); "
+                  f"encoding at framebuffer resolution.")
+
+        # Validate frame count BEFORE spawning ffmpeg so we don't leak the
+        # subprocess if the duration/fps combination produces no frames.
+        num_frames = int(args.duration * args.fps)
+        if num_frames == 0:
+            print(f"error: --duration {args.duration} is too short for "
+                  f"--fps {args.fps} (no frames would be written).")
+            boids.close()
+            wait()
+            return
+
+        try:
+            ff = subprocess.Popen(
+                [
+                    "ffmpeg", "-y", "-loglevel", "warning",
+                    "-f", "rawvideo", "-pix_fmt", "rgba",
+                    "-s", f"{fb_width}x{fb_height}",
+                    "-r", str(args.fps),
+                    "-i", "-",
+                    "-vf", "vflip",
+                    "-c:v", "libx264", "-pix_fmt", "yuv420p",
+                    args.output,
+                ],
+                stdin=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+            )
+        except FileNotFoundError:
+            print("error: ffmpeg not found on PATH; install ffmpeg or use "
+                  "--mode headless.")
+            boids.close()
+            wait()
+            return
+        except OSError as exc:
+            # Other startup failures (read-only output dir, ENOMEM, etc.)
+            # also need cleanup to avoid leaking the window/runtime.
+            print(f"error: failed to start ffmpeg: {exc}")
+            boids.close()
+            wait()
+            return
+
+        dt = 1.0 / args.fps
+        frames_written = 0
+        ff_stderr: bytes | None = b""
+        try:
+            for _ in range(num_frames):
+                if boids.bocpy_video_closing:
+                    break
+
+                boids.switch_to()
+                boids.dispatch_events()
+                if boids.bocpy_video_closing:
+                    break
+
+                boids.update(dt)
+                boids.clear()
+                boids.batch.draw()
+                if boids.show_overlay:
+                    boids.num_boids_label.draw()
+                    boids.behaviors_label.draw()
+
+                buf = pyglet.image.get_buffer_manager().get_color_buffer()
+                # Defensive: framebuffer size must remain stable for the
+                # encoder. If it changes (window manager fiddling, monitor
+                # move) we abort rather than emit garbled frames.
+                if (buf.width, buf.height) != (fb_width, fb_height):
+                    print(f"error: framebuffer size changed mid-record "
+                          f"({fb_width}x{fb_height} -> "
+                          f"{buf.width}x{buf.height}); stopping.")
+                    break
+
+                data = buf.get_image_data().get_data("RGBA", buf.width * 4)
+                try:
+                    ff.stdin.write(data)
+                except BrokenPipeError:
+                    print("error: ffmpeg pipe closed unexpectedly.")
+                    break
+                frames_written += 1
+                boids.flip()
+        except KeyboardInterrupt:
+            print("(interrupted)")
+        finally:
+            try:
+                try:
+                    if ff.stdin is not None:
+                        ff.stdin.close()
+                except OSError:
+                    pass
+                try:
+                    _, ff_stderr = ff.communicate(timeout=30)
+                except subprocess.TimeoutExpired:
+                    print("warning: ffmpeg did not exit within 30s; killing.")
+                    ff.kill()
+                    try:
+                        _, ff_stderr = ff.communicate(timeout=5)
+                    except subprocess.TimeoutExpired:
+                        pass
+            finally:
+                # Always release the pyglet window and BOC runtime, even if
+                # ffmpeg cleanup raised something unexpected.
+                try:
+                    boids.close()
+                finally:
+                    wait()
+
+        if ff.returncode != 0:
+            if ff.returncode is None:
+                # We tried to kill ffmpeg but it never reaped within 5s after
+                # SIGKILL. The output file (if any) is almost certainly
+                # truncated and missing the libx264 moov atom.
+                print("error: ffmpeg was killed and did not exit; "
+                      "output file is likely truncated.")
+            else:
+                print(f"error: ffmpeg exited with status {ff.returncode}.")
+            if ff_stderr:
+                print(ff_stderr.decode("utf-8", errors="replace"), end="")
+            return
+
+        print(f"Wrote {args.output} ({frames_written} frames)"
+              f"{' (interrupted)' if boids.bocpy_video_closing else ''}")
+        return
+
     boids = Boids(args.width, args.height, args.boids)
-    pyglet.clock.schedule_interval(boids.update, 1/30)
+    pyglet.clock.schedule_interval(boids.update, 1 / args.fps)
     pyglet.app.run()
 
 
diff --git a/src/bocpy/examples/calculator.py b/examples/calculator.py
similarity index 100%
rename from src/bocpy/examples/calculator.py
rename to examples/calculator.py
diff --git a/src/bocpy/examples/cooking_boc.py b/examples/cooking_boc.py
similarity index 100%
rename from src/bocpy/examples/cooking_boc.py
rename to examples/cooking_boc.py
diff --git a/src/bocpy/examples/cooking_threads.py b/examples/cooking_threads.py
similarity index 100%
rename from src/bocpy/examples/cooking_threads.py
rename to examples/cooking_threads.py
diff --git a/src/bocpy/examples/dining_philosophers.py b/examples/dining_philosophers.py
similarity index 100%
rename from src/bocpy/examples/dining_philosophers.py
rename to examples/dining_philosophers.py
diff --git a/src/bocpy/examples/fibonacci.py b/examples/fibonacci.py
similarity index 100%
rename from src/bocpy/examples/fibonacci.py
rename to examples/fibonacci.py
diff --git a/examples/prime_factor.py b/examples/prime_factor.py
new file mode 100644
index 0000000..c7bae66
--- /dev/null
+++ b/examples/prime_factor.py
@@ -0,0 +1,238 @@
+"""Parallel prime factorisation with early termination via the noticeboard."""
+
+import argparse
+from functools import partial
+import logging
+import math
+import random
+
+from bocpy import (Cown, notice_read, notice_update, notice_write,
+                   receive, send, wait, when)
+
+
+# -- Helpers for notice_update (must be module-level and picklable) -----------
+
+
+def _merge_sieve(existing, new_primes):
+    """Extend *existing* with the tail of *new_primes* beyond its end.
+
+    Because the sieve is built by extending upward, *new_primes* is a
+    sorted run of consecutive primes that is either:
+    1. entirely contained in *existing* (another lane already found them),
+    2. overlapping the end (the prefix is known, the suffix is new), or
+    3. strictly continuing *existing* (all new).
+    In every case we just need to append the primes past *existing*[-1].
+    """
+    if not new_primes:
+        return existing
+    if not existing:
+        return new_primes
+
+    cutoff = existing[-1]
+    # Binary search for the first new prime beyond the existing sieve
+    lo, hi = 0, len(new_primes)
+    while lo < hi:
+        mid = (lo + hi) // 2
+        if new_primes[mid] <= cutoff:
+            lo = mid + 1
+        else:
+            hi = mid
+
+    if lo >= len(new_primes):
+        return existing
+    return existing + new_primes[lo:]
+
+
+# -- Sieve phase: find primes from random candidates -------------------------
+
+
+class SieveLane:
+    """Progress state for one sieve lane."""
+
+    def __init__(self, lane_id: int, remaining: int, batch: int, lo: int, hi: int):
+        """Initialise a sieve lane.
+
+        :param lane_id: Numeric identifier for this lane.
+        :param remaining: How many candidates this lane should still test.
+        :param batch: How many candidates to generate and test per behavior.
+        :param lo: Lower bound for random candidates (inclusive).
+        :param hi: Upper bound for random candidates (inclusive).
+        """
+        self.lane_id = lane_id
+        self.remaining = remaining
+        self.batch = batch
+        self.lo = lo
+        self.hi = hi
+        self.found = []
+
+
+def sieve_check(lane: Cown[SieveLane]):
+    """Check whether this sieve lane has more work to do."""
+    @when(lane)
+    def _(lane):
+        if lane.value.remaining <= 0:
+            send("sieve_done", lane.value.found)
+            return
+
+        sieve_work(lane)
+
+
+def sieve_work(lane: Cown[SieveLane]):
+    """Generate a batch of candidates and test for primality."""
+    @when(lane)
+    def _(lane):
+        info = lane.value
+        sieve = list(notice_read("sieve") or [2, 3])
+        new_sieve_primes = []
+        count = min(info.batch, info.remaining)
+
+        for _ in range(count):
+            c = random.randrange(info.lo, info.hi) | 1
+            limit = int(math.isqrt(c)) + 1
+
+            # Extend the local sieve until it covers sqrt(c)
+            n = sieve[-1] + 2
+            while sieve[-1] < limit:
+                if all(n % p != 0 for p in sieve if p * p <= n):
+                    sieve.append(n)
+                    new_sieve_primes.append(n)
+                n += 2
+
+            # Test c against the sieve
+            is_prime = True
+            for p in sieve:
+                if p * p > c:
+                    break
+                if c % p == 0:
+                    is_prime = False
+                    break
+
+            if is_prime:
+                info.found.append(c)
+
+        info.remaining -= count
+        if new_sieve_primes:
+            notice_update("sieve",
+                          partial(_merge_sieve, new_primes=new_sieve_primes),
+                          default=[2, 3])
+
+        sieve_check(lane)
+
+
+# -- Factor phase: Pollard's rho with parallel random walks ------------------
+
+
+class RhoLane:
+    """State for one Pollard's rho random walk."""
+
+    def __init__(self, lane_id: int, n: int, batch: int):
+        """Initialise a rho lane with a random starting point and constant.
+
+        :param lane_id: Numeric identifier for this lane.
+        :param n: The number being factored.
+        :param batch: Iterations per work behavior.
+        """
+        self.lane_id = lane_id
+        self.c = random.randrange(1, n)
+        self.x = random.randrange(2, n)
+        self.y = self.x
+        self.batch = batch
+
+
+def rho_check(lane: Cown[RhoLane], n: int):
+    """Check the noticeboard for a result before continuing the walk."""
+    @when(lane)
+    def _(lane):
+        if notice_read("factor") is not None:
+            return
+
+        rho_work(lane, n)
+
+
+def rho_work(lane: Cown[RhoLane], n: int):
+    """Run a batch of Pollard's rho iterations using Floyd's cycle detection."""
+    @when(lane)
+    def _(lane):
+        info = lane.value
+        x, y, c = info.x, info.y, info.c
+
+        for _ in range(info.batch):
+            x = (x * x + c) % n
+            y = (y * y + c) % n
+            y = (y * y + c) % n
+            d = math.gcd(abs(x - y), n)
+            if d != 1 and d != n:
+                notice_write("factor", d)
+                send("result", d)
+                print(f"  lane {info.lane_id} found factor {d}")
+                return
+            if d == n:
+                # Cycle with trivial gcd — restart with new constants
+                info.c = random.randrange(1, n)
+                info.x = random.randrange(2, n)
+                info.y = info.x
+                rho_check(lane, n)
+                return
+
+        info.x = x
+        info.y = y
+        rho_check(lane, n)
+
+
+# -- Main --------------------------------------------------------------------
+
+
+def main():
+    """Sieve for primes, build a semiprime, then factor it in parallel."""
+    parser = argparse.ArgumentParser("Prime Factor")
+    parser.add_argument("--lanes", "-n", type=int, default=4,
+                        help="number of parallel search lanes")
+    parser.add_argument("--candidates", "-c", type=int, default=2000,
+                        help="number of random candidates to sieve")
+    parser.add_argument("--batch", "-b", type=int, default=100,
+                        help="candidates tested per work behavior")
+    parser.add_argument("--bits", type=int, default=16,
+                        help="bit-size of candidate numbers")
+    parser.add_argument("--loglevel", "-l", type=str, default=logging.WARNING)
+    args = parser.parse_args()
+
+    logging.basicConfig(level=args.loglevel)
+
+    # Phase 1 — parallel sieve to find primes from random candidates
+    lo = 1 << (args.bits - 1)
+    hi = (1 << args.bits) - 1
+    per_lane = args.candidates // args.lanes
+    print(f"sieving {args.candidates} candidates ({args.bits}-bit) "
+          f"across {args.lanes} lanes ...")
+
+    for i in range(args.lanes):
+        lane = Cown(SieveLane(i, per_lane, args.batch, lo, hi))
+        sieve_check(lane)
+
+    primes = []
+    for _ in range(args.lanes):
+        _, found = receive("sieve_done")
+        primes.extend(found)
+
+    print(f"found {len(primes)} primes")
+
+    # Phase 2 — pick two primes, form a semiprime, and factor it
+    p, q = random.sample(primes, 2)
+    n = p * q
+
+    print(f"factoring {n} (= {p} x {q})")
+    print(f"Pollard's rho with {args.lanes} parallel walks, batch={args.batch}")
+
+    for i in range(args.lanes):
+        lane = Cown(RhoLane(i, n, args.batch))
+        rho_check(lane, n)
+
+    _, factor = receive("result")
+    other = n // factor
+    print(f"result: {n} = {factor} x {other}")
+
+    wait()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/bocpy/examples/primes.py b/examples/primes.py
similarity index 100%
rename from src/bocpy/examples/primes.py
rename to examples/primes.py
diff --git a/src/bocpy/examples/sketches.py b/examples/sketches.py
similarity index 100%
rename from src/bocpy/examples/sketches.py
rename to examples/sketches.py
diff --git a/pyproject.toml b/pyproject.toml
index 3945c41..300fb3b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,14 +4,15 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "bocpy"
-version = "0.3.1"
+version = "0.4.0"
 authors = [
     {name = "bocpy Team", email="bocpy@microsoft.com"}
 ]
 description = "bocpy is a Python extension that adds Behavior-oriented concurrency built on top of cross-interpreter data."
-readme = "README.md"
+dynamic = ["readme"]
 keywords = ["behavior-oriented", "concurrency", "subinterpreters"]
 license = "MIT"
+requires-python = ">=3.10"
 classifiers = [
     "Development Status :: 4 - Beta",
     "Programming Language :: Python :: 3",
@@ -22,7 +23,7 @@ classifiers = [
 [project.urls]
 homepage = "https://microsoft.github.io/bocpy/"
 source = "https://github.com/microsoft/bocpy"
-documentation = "https://microsoft.github.io/bocpy/docs"
+documentation = "https://microsoft.github.io/bocpy/sphinx/index.html"
 issues = "https://github.com/microsoft/bocpy/issues"
 
 [project.optional-dependencies]
@@ -35,12 +36,21 @@ boids = ["pyglet"]
 bocpy-bank = "bocpy.examples.bank:main"
 bocpy-boids = "bocpy.examples.boids:main"
 bocpy-calculator = "bocpy.examples.calculator:main"
+bocpy-bench = "bocpy.examples.benchmark:main"
 bocpy-cooking-boc = "bocpy.examples.cooking_boc:main"
 bocpy-cooking-threads = "bocpy.examples.cooking_threads:main"
 bocpy-dining-philosophers = "bocpy.examples.dining_philosophers:main"
 bocpy-fibonacci = "bocpy.examples.fibonacci:main"
+bocpy-prime-factor = "bocpy.examples.prime_factor:main"
 bocpy-primes = "bocpy.examples.primes:main"
 bocpy-sketches = "bocpy.examples.sketches:main"
 
+[tool.setuptools]
+packages = ["bocpy", "bocpy.examples"]
+
+[tool.setuptools.package-dir]
+"" = "src"
+"bocpy.examples" = "examples"
+
 [tool.setuptools.package-data]
 "bocpy.examples" = ["assets/*.txt"]
diff --git a/setup.py b/setup.py
index 6d39c73..223bd72 100644
--- a/setup.py
+++ b/setup.py
@@ -1,6 +1,22 @@
+import re
+from pathlib import Path
+
 from setuptools import Extension, setup
 
+# Load the README and strip any sections marked as PyPI-skip. GitHub still
+# renders the original; PyPI's long_description gets the filtered version so
+# unsupported content (e.g. Mermaid code blocks) does not appear as raw text.
+_readme = Path(__file__).parent.joinpath("README.md").read_text(encoding="utf-8")
+_readme = re.sub(
+    r"<!-- pypi-skip-start -->.*?<!-- pypi-skip-end -->\n?",
+    "",
+    _readme,
+    flags=re.DOTALL,
+)
+
 setup(
+    long_description=_readme,
+    long_description_content_type="text/markdown",
     ext_modules=[
         Extension(
             name="bocpy._core",
diff --git a/sphinx/source/api.rst b/sphinx/source/api.rst
index 822802e..ac2c8f5 100644
--- a/sphinx/source/api.rst
+++ b/sphinx/source/api.rst
@@ -19,6 +19,17 @@ Behaviors
 .. autofunction:: start
 
 
+Noticeboard
+-----------
+
+.. autofunction:: notice_write
+.. autofunction:: notice_update
+.. autofunction:: notice_delete
+.. autofunction:: noticeboard
+.. autofunction:: notice_read
+.. autodata:: REMOVED
+
+
 Math
 ----
 
@@ -34,3 +45,4 @@ Messaging
 .. autofunction:: send
 .. autofunction:: receive
 .. autofunction:: set_tags
+.. autofunction:: drain
diff --git a/sphinx/source/conf.py b/sphinx/source/conf.py
index c0ae6e4..e7fcb65 100644
--- a/sphinx/source/conf.py
+++ b/sphinx/source/conf.py
@@ -14,7 +14,7 @@
 project = 'bocpy'
 copyright = '2026, Microsoft'
 author = 'Microsoft'
-release = '0.3.1'
+release = '0.4.0'
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
diff --git a/sphinx/source/index.rst b/sphinx/source/index.rst
index 6323a1a..aaccde1 100644
--- a/sphinx/source/index.rst
+++ b/sphinx/source/index.rst
@@ -1,11 +1,90 @@
-.. bocpy documentation master file, created by
-   sphinx-quickstart on Thu Feb 12 23:16:09 2026.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
-
 bocpy documentation
 ===================
 
+`bocpy <https://github.com/microsoft/bocpy>`_ is a Python library implementing
+**Behavior-Oriented Concurrency (BOC)**.  Programmers wrap shared data in
+**cowns** (concurrently-owned objects) and schedule **behaviors** with the
+``@when`` decorator; the runtime runs each behavior once all of its required
+cowns are available, with deadlock freedom guaranteed by construction.  On
+Python 3.12 and newer, behaviors execute in parallel across worker
+sub-interpreters that each have their own GIL.
+
+For a hands-on introduction, see the
+`BOC tutorial <https://microsoft.github.io/bocpy/>`_, the
+`project README <https://github.com/microsoft/bocpy#readme>`_, and the
+`runnable examples <https://github.com/microsoft/bocpy/tree/main/src/bocpy/examples>`_.
+The :ref:`api` page below documents every public symbol.
+
+A taste of BOC
+--------------
+
+The snippet below — a trimmed version of
+`bocpy-bank <https://github.com/microsoft/bocpy/blob/main/src/bocpy/examples/bank.py>`_
+— shows the core concepts: data wrapped in :class:`Cown`\s, a behavior
+scheduled with :func:`when` that takes exclusive access to two cowns at once,
+and :func:`wait` blocking the main thread until all behaviors have completed.
+The runtime acquires ``src`` and ``dst`` in a deadlock-free order, so
+``transfer`` can safely mutate both accounts.
+
+.. code-block:: python
+
+   from bocpy import Cown, wait, when
+
+
+   class Account:
+       def __init__(self, name, balance):
+           self.name = name
+           self.balance = balance
+
+
+   def transfer(src: Cown[Account], dst: Cown[Account], amount: float):
+       # `@when` schedules `_` to run once both cowns are available.
+       # Inside, src.value and dst.value can be mutated safely —
+       # no other behavior can touch either account at the same time.
+       @when(src, dst)
+       def _(src, dst):
+           print(f"  transfer: {src.value.name} -> {dst.value.name} ({amount})")
+           if src.value.balance >= amount:
+               src.value.balance -= amount
+               dst.value.balance += amount
+
+       @when(dst)
+       def _(dst):
+           print(f"  {dst.value.name} now has {dst.value.balance}")
+
+
+   alice = Cown(Account("Alice", 100))
+   bob = Cown(Account("Bob", 0))
+
+   print("scheduling first transfer")
+   transfer(alice, bob, 40)
+   print("scheduling second transfer")
+   transfer(bob, alice, 10)
+   print("main thread reaches wait()")
+
+   wait()  # block until every scheduled behavior has finished
+   print("all behaviors complete")
+
+Running it prints something like:
+
+.. code-block:: console
+
+   $ python bank.py
+   scheduling first transfer
+   scheduling second transfer
+   main thread reaches wait()
+     transfer: Alice -> Bob (40)
+     Bob now has 40
+     transfer: Bob -> Alice (10)
+     Alice now has 70
+   all behaviors complete
+
+Note how the ``scheduling …`` lines all print *before* any behavior body
+runs: ``@when`` returns immediately, and the runtime only fires each
+behavior once its cowns are free.  The two transfers serialise on the
+``Alice``/``Bob`` cowns, so their effects are interleaved in a deadlock-free,
+data-race-free order chosen by the runtime.
+
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
diff --git a/src/bocpy/__init__.py b/src/bocpy/__init__.py
index 14b6a8c..5501a15 100644
--- a/src/bocpy/__init__.py
+++ b/src/bocpy/__init__.py
@@ -2,8 +2,13 @@
 
 from ._core import drain, receive, send, set_tags, TIMEOUT
 from ._math import Matrix
-from .behaviors import Behaviors, Cown, start, wait, when, whencall, WORKER_COUNT
+from .behaviors import (Behaviors, Cown, notice_delete, notice_read,
+                        notice_sync, notice_update, notice_write, noticeboard,
+                        noticeboard_version, REMOVED,
+                        start, wait, when, whencall, WORKER_COUNT)
 
-__all__ = ["Matrix", "send", "receive", "set_tags", "TIMEOUT", "start",
-           "wait", "when", "whencall", "Behaviors", "Cown", "WORKER_COUNT",
-           "drain"]
+__all__ = ["Behaviors", "Cown", "Matrix", "REMOVED", "TIMEOUT",
+           "WORKER_COUNT", "drain", "notice_delete", "notice_read",
+           "notice_sync", "notice_update", "notice_write", "noticeboard",
+           "noticeboard_version", "receive",
+           "send", "set_tags", "start", "wait", "when", "whencall"]
diff --git a/src/bocpy/__init__.pyi b/src/bocpy/__init__.pyi
index 2541a44..fc21f25 100644
--- a/src/bocpy/__init__.pyi
+++ b/src/bocpy/__init__.pyi
@@ -1,9 +1,12 @@
-from typing import Any, Callable, Generic, Iterator, Optional, Sequence, TypeVar, Union
+from typing import Any, Callable, Generic, Iterator, Mapping, Optional, Sequence, TypeVar, Union
 
 
 TIMEOUT: str
 """Sentinel value returned by :func:`receive` when a timeout occurs."""
 
+REMOVED: object
+"""Sentinel returned by a ``notice_update`` fn to delete the entry."""
+
 
 def drain(tags: Union[str, Sequence[str]]) -> None:
     """Drain all messages associated with one or more tags.
@@ -359,12 +362,20 @@ class Cown(Generic[T]):
     def release(self):
         """Releases the cown."""
 
+    @property
+    def exception(self) -> bool:
+        """Whether the held value is the result of an unhandled exception."""
+
+    @exception.setter
+    def exception(self, value: bool):
+        """Set or clear the exception flag."""
+
     @property
     def acquired(self) -> bool:
         """Whether the cown is currently acquired."""
 
     def __lt__(self, other: "Cown") -> bool:
-        """Order by the underying capsule for deterministic ordering."""
+        """Order by the underlying capsule for deterministic ordering."""
 
     def __eq__(self, other: "Cown") -> bool:
         """Equality based on the wrapped capsule."""
@@ -379,14 +390,196 @@ class Cown(Generic[T]):
         """Debug representation."""
 
 
+def notice_write(key: str, value: Any) -> None:
+    """Write a value to the noticeboard.
+
+    The write is fire-and-forget: the value is serialized immediately and
+    handed to a dedicated noticeboard thread, which applies it under
+    mutex.
+
+    **No ordering guarantee.** A subsequent behavior — even one that
+    chains directly off the writer through a shared cown — is *not*
+    guaranteed to observe this write. Treat the noticeboard as
+    eventually consistent shared state, never as a synchronization
+    channel between behaviors.
+
+    The noticeboard supports up to 64 distinct keys.  Writes beyond the
+    limit are not applied; the noticeboard thread catches the resulting
+    error and logs a warning.  No exception propagates to the caller.
+
+    :param key: The noticeboard key (max 63 UTF-8 bytes).
+    :type key: str
+    :param value: The value to store.
+    :type value: Any
+    """
+
+
+def notice_update(key: str, fn: Callable[[Any], Any], default: Any = None) -> None:
+    """Atomically update a noticeboard entry.
+
+    Reads the current value for *key* (or *default* if absent), applies
+    *fn* to it, and writes the result back.  The read-modify-write is
+    atomic because the single-threaded noticeboard mutator performs all
+    three steps without interleaving. Like :func:`notice_write`, the
+    call is fire-and-forget and carries **no ordering guarantee** with
+    respect to other behaviors.
+
+    Both *fn* and *default* must be picklable.  Lambdas and closures
+    are **not** picklable; use ``functools.partial`` with a module-level
+    function or an ``operator`` function instead.
+
+    If *fn* returns the ``REMOVED`` sentinel, the entry is deleted from
+    the noticeboard instead of being updated.
+
+    .. warning::
+
+       *fn* and *default* are pickled and sent to the noticeboard
+       thread for execution. Anyone who can call :func:`notice_update`
+       can therefore execute arbitrary Python on that thread. bocpy
+       treats all runtime code as equally trusted; audit callers if
+       that assumption does not hold.
+
+    .. warning::
+
+       More generally: bocpy worker sub-interpreters share the C-level
+       runtime (terminator, MCS request queues, message queues,
+       noticeboard) with the primary interpreter via ungated entry
+       points such as :py:func:`bocpy._core.terminator_inc`,
+       :py:func:`bocpy._core.terminator_dec`, and
+       :py:meth:`bocpy._core.BehaviorCapsule.release_all`. These are
+       intentionally callable from sub-interpreters because behavior
+       bodies legitimately schedule nested ``@when`` calls. Any
+       sub-interpreter running untrusted Python is therefore part of
+       the trusted computing base: it can drive the terminator
+       negative, schedule unbounded behaviors, or unlink an arbitrary
+       behavior from the MCS queue. Only run code you trust inside
+       behavior bodies.
+
+    :param key: The noticeboard key (max 63 UTF-8 bytes).
+    :type key: str
+    :param fn: A picklable callable taking the current value, returning the new.
+    :type fn: Callable[[Any], Any]
+    :param default: Value used when *key* does not yet exist.
+    :type default: Any
+    """
+
+
+def notice_delete(key: str) -> None:
+    """Delete a single noticeboard entry.
+
+    The deletion is fire-and-forget: the request is sent to the
+    noticeboard thread, which removes the entry under mutex.  If the
+    key does not exist, the operation is a no-op. Like
+    :func:`notice_write`, this carries **no ordering guarantee** with
+    respect to other behaviors.
+
+    :param key: The noticeboard key to delete (max 63 UTF-8 bytes).
+    :type key: str
+    """
+
+
+def noticeboard() -> Mapping[str, Any]:
+    """Return a cached snapshot of the noticeboard.
+
+    Must be called from within a ``@when`` behavior. The first call within a
+    behavior captures all entries under mutex and caches the data.
+    Subsequent calls in the same behavior return a view of the same
+    cached data.
+
+    The returned mapping is read-only.
+
+    Calling from outside a behavior (e.g. the main thread) will return a
+    snapshot that is never refreshed for that thread.
+
+    :return: A read-only mapping of keys to their stored values.
+    :rtype: Mapping[str, Any]
+    """
+
+
+def notice_read(key: str, default: Any = None) -> Any:
+    """Read a single key from the noticeboard.
+
+    Must be called from within a ``@when`` behavior. Convenience wrapper
+    that takes a snapshot and returns one value.
+
+    Calling from outside a behavior (e.g. the main thread) will return a
+    snapshot that is never refreshed for that thread.
+
+    :param key: The noticeboard key to read.
+    :type key: str
+    :param default: Value returned when key is absent.
+    :type default: Any
+    :return: The stored value, or *default* if the key does not exist.
+    :rtype: Any
+    """
+
+
+def noticeboard_version() -> int:
+    """Return the current noticeboard version counter.
+
+    The counter is incremented every time the noticeboard is
+    successfully written, updated, or cleared. Two reads returning the
+    same value mean no commit happened between them; a strictly larger
+    value means at least one commit happened.
+
+    The counter is global (across all threads and interpreters) and
+    monotonic. Useful as a *hint* for detecting noticeboard changes
+    without taking a full snapshot.
+
+    .. note::
+
+       This is *not* a synchronization primitive. Because
+       :func:`notice_write`, :func:`notice_update`, and
+       :func:`notice_delete` are fire-and-forget, the version may not
+       have advanced yet when a behavior that depends on a write
+       observes the noticeboard. For strict read-your-writes ordering,
+       use :func:`notice_sync`.
+
+    :return: The current noticeboard version.
+    :rtype: int
+    """
+
+
+def notice_sync(timeout: Optional[float] = 30.0) -> int:
+    """Block until the caller's prior noticeboard mutations are committed.
+
+    Because :func:`notice_write`, :func:`notice_update`, and
+    :func:`notice_delete` are fire-and-forget, a behavior that wants
+    read-your-writes ordering against a *subsequent* behavior must call
+    ``notice_sync()`` after its writes. By the time this returns, every
+    write/update/delete posted from the calling thread before the call
+    has been applied to the noticeboard.
+
+    The barrier carries **no ordering guarantee** with respect to
+    writes posted from other threads or behaviors interleaved with the
+    caller's; it only flushes the caller's own queued mutations.
+
+    :param timeout: Maximum seconds to wait. ``None`` waits forever.
+        Defaults to 30 seconds.
+    :type timeout: Optional[float]
+    :raises TimeoutError: If the barrier does not complete within
+        *timeout* seconds.
+    :raises RuntimeError: If the runtime is not started.
+    :return: The :func:`noticeboard_version` after the flush.
+    :rtype: int
+    """
+
+
 def wait(timeout: Optional[float] = None):
     """Block until all behaviors complete, with optional timeout.
 
+    On a successful return the runtime is **stopped**: workers are
+    joined, the noticeboard thread exits, the export tempdir is removed,
+    and the terminator is closed. The next ``@when`` call (or explicit
+    :func:`start`) will spin up a fresh runtime.
+
     Note that holding on to references to Cown objects such that they
     are deallocated after wait() is called results in undefined behavior.
 
     :param timeout: Maximum number of seconds to wait, or ``None`` to
-        wait indefinitely.
+        wait indefinitely. The timeout bounds only the quiescence and
+        noticeboard-drain phases; worker shutdown and tempdir cleanup
+        run to completion regardless.
     :type timeout: Optional[float]
     """
 
@@ -415,7 +608,11 @@ def when(*cowns):
 
 
 def start(**kwargs):
-    """Start the behavior scheduler and worker pool.
+    """Start the bocpy runtime and worker pool.
+
+    Spawns the worker sub-interpreters and the dedicated noticeboard
+    thread. Scheduling and release run on the caller and worker
+    threads themselves — there is no central scheduler thread.
 
     :param worker_count: The number of worker interpreters to start.  If
         ``None``, defaults to the number of available cores minus one.
diff --git a/src/bocpy/_core.c b/src/bocpy/_core.c
index 279c953..2b2224b 100644
--- a/src/bocpy/_core.c
+++ b/src/bocpy/_core.c
@@ -43,6 +43,43 @@ void atomic_store(atomic_int_least64_t *ptr, int_least64_t value) {
   *ptr = value;
 }
 
+// ----- atomic_intptr_t siblings ---------------------------------------------
+// The MSVC polyfill defines `atomic_intptr_t` and `atomic_int_least64_t` as
+// distinct typedefs; the plain `atomic_load` / `atomic_store` / etc. above
+// only accept `atomic_int_least64_t *`. Without these siblings, code that
+// touches an `atomic_intptr_t` field (e.g. BOCRequest::next, BOCCown::last,
+// BOCRecycleQueue::head, BOCQueue::tag, NB_NOTICEBOARD_TID) would silently
+// pass a mistyped pointer to the int64 polyfill on Windows. On POSIX C11 the
+// same names are aliased to the generic atomic_* macros (which already
+// dispatch on type via _Generic), so user code below is platform-uniform.
+//
+// All Interlocked*Pointer intrinsics on x86/x64 are full barriers; the
+// pointer-width matches `intptr_t` on both Win32 and Win64 (CPython itself
+// requires a sane intptr_t == void* relationship).
+static inline intptr_t atomic_load_intptr(atomic_intptr_t *ptr) { return *ptr; }
+
+static inline void atomic_store_intptr(atomic_intptr_t *ptr, intptr_t value) {
+  *ptr = value;
+}
+
+static inline intptr_t atomic_exchange_intptr(atomic_intptr_t *ptr,
+                                              intptr_t value) {
+  return (intptr_t)InterlockedExchangePointer((PVOID volatile *)ptr,
+                                              (PVOID)value);
+}
+
+static inline bool atomic_compare_exchange_strong_intptr(atomic_intptr_t *ptr,
+                                                         intptr_t *expected,
+                                                         intptr_t desired) {
+  intptr_t prev = (intptr_t)InterlockedCompareExchangePointer(
+      (PVOID volatile *)ptr, (PVOID)desired, (PVOID)*expected);
+  if (prev == *expected) {
+    return true;
+  }
+  *expected = prev;
+  return false;
+}
+
 // All Interlocked* intrinsics on x86/x64 are full barriers, so the
 // memory_order argument is accepted but ignored.
 // Note: atomic_load_explicit is a plain volatile read. On x86/x64 this
@@ -57,8 +94,43 @@ void atomic_store(atomic_int_least64_t *ptr, int_least64_t value) {
 
 #define thread_local __declspec(thread)
 
-typedef SRWLOCK BOCParkMutex;
-typedef CONDITION_VARIABLE BOCParkCond;
+typedef SRWLOCK BOCMutex;
+typedef CONDITION_VARIABLE BOCCond;
+
+static inline void boc_mtx_init(BOCMutex *m) { InitializeSRWLock(m); }
+
+static inline void mtx_destroy(BOCMutex *m) { (void)m; }
+
+static inline void mtx_lock(BOCMutex *m) { AcquireSRWLockExclusive(m); }
+
+static inline void mtx_unlock(BOCMutex *m) { ReleaseSRWLockExclusive(m); }
+
+static inline void cnd_init(BOCCond *c) { InitializeConditionVariable(c); }
+
+static inline void cnd_destroy(BOCCond *c) { (void)c; }
+
+static inline void cnd_signal(BOCCond *c) { WakeConditionVariable(c); }
+
+static inline void cnd_broadcast(BOCCond *c) { WakeAllConditionVariable(c); }
+
+static inline void cnd_wait(BOCCond *c, BOCMutex *m) {
+  SleepConditionVariableSRW(c, m, INFINITE, 0);
+}
+
+/// @brief Wait on a condition variable for at most @p seconds.
+/// @param c The condition variable
+/// @param m The mutex (must be held by caller)
+/// @return true if signalled (or spurious wake), false if the timeout expired
+static inline bool cnd_timedwait_s(BOCCond *c, BOCMutex *m, double seconds) {
+  if (seconds < 0)
+    seconds = 0;
+  DWORD ms = (DWORD)(seconds * 1000.0);
+  BOOL ok = SleepConditionVariableSRW(c, m, ms, 0);
+  if (!ok && GetLastError() == ERROR_TIMEOUT) {
+    return false;
+  }
+  return true;
+}
 
 void thrd_sleep(const struct timespec *duration, struct timespec *remaining) {
   const DWORD MS_PER_NS = 1000000;
@@ -68,21 +140,98 @@ void thrd_sleep(const struct timespec *duration, struct timespec *remaining) {
 }
 
 #elif defined __APPLE__
+#include <errno.h>
 #include <pthread.h>
 #include <stdatomic.h>
 #define thrd_sleep nanosleep
 #define thread_local _Thread_local
 
-typedef pthread_mutex_t BOCParkMutex;
-typedef pthread_cond_t BOCParkCond;
+typedef pthread_mutex_t BOCMutex;
+typedef pthread_cond_t BOCCond;
+
+static inline void boc_mtx_init(BOCMutex *m) { pthread_mutex_init(m, NULL); }
+
+static inline void mtx_destroy(BOCMutex *m) { pthread_mutex_destroy(m); }
+
+static inline void mtx_lock(BOCMutex *m) { pthread_mutex_lock(m); }
+
+static inline void mtx_unlock(BOCMutex *m) { pthread_mutex_unlock(m); }
+
+static inline void cnd_init(BOCCond *c) { pthread_cond_init(c, NULL); }
+
+static inline void cnd_destroy(BOCCond *c) { pthread_cond_destroy(c); }
+
+static inline void cnd_signal(BOCCond *c) { pthread_cond_signal(c); }
+
+static inline void cnd_broadcast(BOCCond *c) { pthread_cond_broadcast(c); }
+
+static inline void cnd_wait(BOCCond *c, BOCMutex *m) {
+  pthread_cond_wait(c, m);
+}
+
+/// @brief Wait on a condition variable for at most @p seconds.
+/// @param c The condition variable
+/// @param m The mutex (must be held by caller)
+/// @return true if signalled (or spurious wake), false if the timeout expired
+static inline bool cnd_timedwait_s(BOCCond *c, BOCMutex *m, double seconds) {
+  if (seconds < 0)
+    seconds = 0;
+  struct timespec ts;
+  clock_gettime(CLOCK_REALTIME, &ts);
+  double total = (double)ts.tv_sec + (double)ts.tv_nsec * 1e-9 + seconds;
+  ts.tv_sec = (time_t)total;
+  ts.tv_nsec = (long)((total - (double)ts.tv_sec) * 1e9);
+  if (ts.tv_nsec >= 1000000000L) {
+    ts.tv_sec += 1;
+    ts.tv_nsec -= 1000000000L;
+  }
+  int rc = pthread_cond_timedwait(c, m, &ts);
+  return rc != ETIMEDOUT;
+}
 
 #else // Linux
+#include <errno.h>
 #include <stdatomic.h>
 #include <threads.h>
 
-typedef mtx_t BOCParkMutex;
-typedef cnd_t BOCParkCond;
+typedef mtx_t BOCMutex;
+typedef cnd_t BOCCond;
+
+static inline void boc_mtx_init(BOCMutex *m) { mtx_init(m, mtx_plain); }
+
+/// @brief Wait on a condition variable for at most @p seconds.
+/// @param c The condition variable
+/// @param m The mutex (must be held by caller)
+/// @return true if signalled (or spurious wake), false if the timeout expired
+static inline bool cnd_timedwait_s(BOCCond *c, BOCMutex *m, double seconds) {
+  if (seconds < 0)
+    seconds = 0;
+  struct timespec ts;
+  clock_gettime(CLOCK_REALTIME, &ts);
+  double total = (double)ts.tv_sec + (double)ts.tv_nsec * 1e-9 + seconds;
+  ts.tv_sec = (time_t)total;
+  ts.tv_nsec = (long)((total - (double)ts.tv_sec) * 1e9);
+  if (ts.tv_nsec >= 1000000000L) {
+    ts.tv_sec += 1;
+    ts.tv_nsec -= 1000000000L;
+  }
+  int rc = cnd_timedwait(c, m, &ts);
+  return rc != thrd_timedout;
+}
+
+#endif
 
+#ifndef _WIN32
+// On POSIX the C11 atomic_* macros dispatch on type via _Generic, so the
+// `atomic_load(&intptr_var)` form Just Works. The `_intptr` siblings are
+// aliased to the generic forms purely so the source reads the same on
+// every platform; on Windows they expand to dedicated InterlockedXxxPointer
+// shims (see polyfill block above).
+#define atomic_load_intptr(ptr) atomic_load(ptr)
+#define atomic_store_intptr(ptr, val) atomic_store((ptr), (val))
+#define atomic_exchange_intptr(ptr, val) atomic_exchange((ptr), (val))
+#define atomic_compare_exchange_strong_intptr(ptr, expected, desired)          \
+  atomic_compare_exchange_strong((ptr), (expected), (desired))
 #endif
 
 // Forward declaration — BOCQueue is defined below.
@@ -120,6 +269,10 @@ static inline void boc_park_broadcast(BOCQueue *q);
 /// @param q The queue to park on
 static inline void boc_park_wait(BOCQueue *q);
 
+/// @brief Returns the current time as double-precision seconds.
+/// @return the current time
+static double boc_now_s(void);
+
 #if PY_VERSION_HEX >= 0x030D0000
 #define Py_BUILD_CORE
 #include <internal/pycore_crossinterp.h>
@@ -379,9 +532,9 @@ typedef struct boc_queue {
   /// @brief Number of threads parked on this queue's condvar
   atomic_int_least64_t waiters;
   /// @brief Mutex protecting condvar signal/wait
-  BOCParkMutex park_mutex;
+  BOCMutex park_mutex;
   /// @brief Condition variable for parking receivers
-  BOCParkCond park_cond;
+  BOCCond park_cond;
 } BOCQueue;
 
 /// @brief A tag for a BOC message.
@@ -405,77 +558,193 @@ static BOCRecycleQueue *BOC_RECYCLE_QUEUE_TAIL = NULL;
 static atomic_intptr_t BOC_RECYCLE_QUEUE_HEAD = 0;
 
 // ---------------------------------------------------------------------------
-// Platform condvar implementation
+// Noticeboard
 // ---------------------------------------------------------------------------
 
-#ifdef _WIN32
-
-static inline void boc_park_init(BOCQueue *q) {
-  InitializeSRWLock(&q->park_mutex);
-  InitializeConditionVariable(&q->park_cond);
-}
-
-static inline void boc_park_destroy(BOCQueue *q) {
-  // Windows SRWLOCK and CONDITION_VARIABLE have no destroy function.
-  (void)q;
-}
+#define NB_MAX_ENTRIES 64
+#define NB_KEY_SIZE 64
+
+// Forward declarations needed by NoticeboardEntry and the noticeboard
+// helpers below. The full definitions of BOCCown and its refcount helpers
+// appear further down the file (the noticeboard predates the cown
+// machinery in source order, but the new pin-tracking support added for
+// the snapshot cache needs the cown refcount macros).
+typedef struct boc_cown BOCCown;
+static int_least64_t cown_incref(BOCCown *cown);
+static int_least64_t cown_decref(BOCCown *cown);
+#define COWN_INCREF(c) cown_incref((c))
+#define COWN_DECREF(c) cown_decref(c)
 
-static inline void boc_park_lock(BOCQueue *q) {
-  AcquireSRWLockExclusive(&q->park_mutex);
-}
+// CownCapsule forward declaration so the noticeboard pin helper can fish
+// the underlying BOCCown out of a Python CownCapsule. The struct body is
+// defined alongside the type's PyTypeObject further down.
+typedef struct cown_capsule_object {
+  PyObject_HEAD BOCCown *cown;
+} CownCapsuleObject;
 
-static inline void boc_park_unlock(BOCQueue *q) {
-  ReleaseSRWLockExclusive(&q->park_mutex);
-}
+/// @brief A single noticeboard entry
+typedef struct nb_entry {
+  /// @brief The key for this entry (null-terminated UTF-8)
+  char key[NB_KEY_SIZE];
+  /// @brief The serialized cross-interpreter data
+  XIDATA_T *value;
+  /// @brief Whether the value was pickled during serialization
+  bool pickled;
+  /// @brief BOCCowns referenced by @ref value, pinned by this entry
+  /// @details Allocated with @c PyMem_RawMalloc; each pointer holds one
+  /// strong reference (COWN_INCREF). When the entry is overwritten,
+  /// deleted, or cleared, every pointer is COWN_DECREFed and the array
+  /// is freed. This is the noticeboard's mechanism for keeping the
+  /// underlying BOCCowns alive across the 1-pickle / N-unpickle cycle:
+  /// pickling no longer adds a pin (see @ref CownCapsule_reduce).
+  BOCCown **pinned_cowns;
+  /// @brief Number of entries in @ref pinned_cowns
+  int pinned_count;
+} NoticeboardEntry;
+
+/// @brief Global noticeboard for cross-behavior key-value storage
+typedef struct noticeboard {
+  /// @brief The stored entries
+  NoticeboardEntry entries[NB_MAX_ENTRIES];
+  /// @brief The number of entries currently stored
+  int count;
+  /// @brief Mutex protecting the noticeboard
+  BOCMutex mutex;
+} Noticeboard;
+
+static Noticeboard NB;
+
+/// @brief Monotonic version counter for the noticeboard
+/// @details Incremented under @ref Noticeboard::mutex on every successful
+/// write, delete, or clear. Threads use this to lazily invalidate their
+/// thread-local snapshot cache without taking the noticeboard mutex on
+/// every read. Exposed to Python via @ref _core_noticeboard_version for
+/// users who want to detect noticeboard changes without taking a full
+/// snapshot.
+static atomic_int_least64_t NB_VERSION = 0;
+
+/// @brief Thread-local snapshot cache for the current behavior
+static thread_local PyObject *NB_SNAPSHOT_CACHE = NULL;
+
+/// @brief Version of the noticeboard at the time the cached snapshot was built
+/// @details Captured under @ref Noticeboard::mutex during the rebuild. A
+/// reader that finds @ref NB_VERSION equal to this value can reuse the
+/// cached dict without rebuilding.
+static thread_local int_least64_t NB_SNAPSHOT_VERSION = -1;
+
+/// @brief Whether the cached snapshot has been version-checked this behavior
+/// @details Cleared by @ref _core_noticeboard_cache_clear at every behavior
+/// boundary (see @c worker.py). Set to @c true on the first snapshot call
+/// of a behavior. Subsequent calls within the same behavior return the
+/// cached dict without consulting @ref NB_VERSION at all, preserving the
+/// no-polling invariant: the noticeboard cannot be used as a synchronous
+/// communication channel between behaviors.
+static thread_local bool NB_VERSION_CHECKED = false;
+
+/// @brief Read-only proxy wrapping the cached snapshot dict
+/// @details A @c types.MappingProxyType created over @ref NB_SNAPSHOT_CACHE
+/// once per rebuild and returned to callers in place of the dict. Prevents
+/// user code from mutating the cached snapshot, which would otherwise
+/// corrupt every subsequent reader on the same thread until the next
+/// @ref NB_VERSION bump.
+static thread_local PyObject *NB_SNAPSHOT_PROXY = NULL;
+
+/// @brief Thread identity of the noticeboard mutator thread, or 0 if unset
+/// @details Set by @ref _core_set_noticeboard_thread at runtime startup
+/// and checked by @ref _core_noticeboard_write_direct and
+/// @ref _core_noticeboard_delete to enforce the invariant that only the
+/// noticeboard thread mutates the noticeboard. This eliminates the TOCTOU
+/// window in the Python-level read-modify-write performed by
+/// @c noticeboard_update.
+static atomic_intptr_t NB_NOTICEBOARD_TID = 0;
 
-static inline void boc_park_signal(BOCQueue *q) {
-  WakeConditionVariable(&q->park_cond);
-}
+// ---------------------------------------------------------------------------
+// notice_sync() — opt-in barrier for the noticeboard thread.
+//
+// The noticeboard thread runs independently of the behavior dispatch path, so
+// notice_write/_update/_delete are fire-and-forget. Callers that need
+// read-your-writes ordering use notice_sync():
+//   1. notice_sync_request() atomically allocates a monotonic sequence
+//      number and returns it.
+//   2. The caller posts ("sync", N) on the boc_noticeboard tag.
+//   3. The noticeboard-thread arm calls notice_sync_complete(N), which
+//      stores N into NB_SYNC_PROCESSED (monotonic, max-of) and broadcasts
+//      NB_SYNC_COND.
+//   4. The caller blocks in notice_sync_wait(my_seq, timeout) on
+//      NB_SYNC_COND until NB_SYNC_PROCESSED >= my_seq, or returns false
+//      on timeout.
+//
+// All synchronization lives in C primitives so the barrier works across
+// sub-interpreters (Python locks do not span interpreters).
+// ---------------------------------------------------------------------------
 
-static inline void boc_park_broadcast(BOCQueue *q) {
-  WakeAllConditionVariable(&q->park_cond);
-}
+/// @brief Monotonic counter incremented by every notice_sync caller.
+/// @details Sized for ~292 years of continuous 1 GHz fetch_add traffic
+/// before wrap; treated as effectively non-wrapping. If the wrap
+/// precondition ever becomes plausible (e.g. a much faster mutator),
+/// switch to @c atomic_uint_least64_t and update the wrap arithmetic
+/// in @ref _core_notice_sync_wait.
+static atomic_int_least64_t NB_SYNC_REQUESTED = 0;
 
-static inline void boc_park_wait(BOCQueue *q) {
-  SleepConditionVariableSRW(&q->park_cond, &q->park_mutex, INFINITE, 0);
-}
+/// @brief Highest sequence number processed by the noticeboard thread.
+static atomic_int_least64_t NB_SYNC_PROCESSED = 0;
 
-#elif defined __APPLE__ // macOS — pthreads
+/// @brief Mutex protecting NB_SYNC_COND.
+static BOCMutex NB_SYNC_MUTEX;
 
-static inline void boc_park_init(BOCQueue *q) {
-  pthread_mutex_init(&q->park_mutex, NULL);
-  pthread_cond_init(&q->park_cond, NULL);
-}
+/// @brief Condition variable signalled when NB_SYNC_PROCESSED advances.
+static BOCCond NB_SYNC_COND;
 
-static inline void boc_park_destroy(BOCQueue *q) {
-  pthread_cond_destroy(&q->park_cond);
-  pthread_mutex_destroy(&q->park_mutex);
-}
+// ---------------------------------------------------------------------------
+// Terminator — C-level run-down counter.
+//
+// Process-global rundown counter that gates @c terminator_wait. Used by the
+// Python @c wait()/@c stop() lifecycle to block until every in-flight
+// behavior has retired. The counter is incremented from caller threads in
+// @c whencall (before the schedule call) and decremented from worker
+// threads after @c behavior_release_all completes. A one-shot "Pyrona
+// seed" of 1 keeps the count positive between the runtime starting and
+// @c stop() taking it down via @c terminator_seed_dec.
+//
+// Lifecycle:
+//   - @c terminator_reset arms a fresh runtime: count = 1 (the seed),
+//     seeded = 1, closed = 0. Returns the prior (count, seeded) so
+//     @c Behaviors.start can detect drift carried over from a previous
+//     run that died without reconciliation.
+//   - @c terminator_inc returns -1 once @c terminator_close has been
+//     called, so the @c whencall fast path can refuse new work without
+//     racing teardown.
+//   - @c terminator_seed_dec is the idempotent one-shot that drops the
+//     seed; subsequent calls are no-ops.
+//   - @c terminator_wait blocks on the condvar until count reaches 0.
+//   - @c terminator_close raises the closed bit so any straggler
+//     @c terminator_inc returns -1.
+//
+// State is process-global (file-scope statics, NOT per-interpreter) so
+// every sub-interpreter sees the same counter, mutex, and condvar.
+// ---------------------------------------------------------------------------
 
-static inline void boc_park_lock(BOCQueue *q) {
-  pthread_mutex_lock(&q->park_mutex);
-}
+/// @brief Active behavior count + the Pyrona seed.
+static atomic_int_least64_t TERMINATOR_COUNT = 0;
 
-static inline void boc_park_unlock(BOCQueue *q) {
-  pthread_mutex_unlock(&q->park_mutex);
-}
+/// @brief Set to 1 by terminator_close() to refuse further increments.
+static atomic_int_least64_t TERMINATOR_CLOSED = 0;
 
-static inline void boc_park_signal(BOCQueue *q) {
-  pthread_cond_signal(&q->park_cond);
-}
+/// @brief One-shot guard for the Pyrona seed: 1 = seed still present.
+static atomic_int_least64_t TERMINATOR_SEEDED = 0;
 
-static inline void boc_park_broadcast(BOCQueue *q) {
-  pthread_cond_broadcast(&q->park_cond);
-}
+/// @brief Mutex protecting TERMINATOR_COND.
+static BOCMutex TERMINATOR_MUTEX;
 
-static inline void boc_park_wait(BOCQueue *q) {
-  pthread_cond_wait(&q->park_cond, &q->park_mutex);
-}
+/// @brief Condition variable signalled when TERMINATOR_COUNT reaches 0.
+static BOCCond TERMINATOR_COND;
 
-#else // Linux — C11 threads
+// ---------------------------------------------------------------------------
+// Platform condvar implementation
+// ---------------------------------------------------------------------------
 
 static inline void boc_park_init(BOCQueue *q) {
-  mtx_init(&q->park_mutex, mtx_plain);
+  boc_mtx_init(&q->park_mutex);
   cnd_init(&q->park_cond);
 }
 
@@ -498,7 +767,7 @@ static inline void boc_park_wait(BOCQueue *q) {
   cnd_wait(&q->park_cond, &q->park_mutex);
 }
 
-#endif
+// Noticeboard function implementations are below object_to_xidata
 
 /// @brief Creates a new BOCTag object from a Python Unicode string.
 /// @details The result object will not be dependent on the argument in any way
@@ -691,7 +960,11 @@ struct timespec BOC_LAST_REF_TRACKING_REPORT;
 
 static void boc_ref_tracking_report(const char *prefix) {
   struct timespec ts;
+#ifdef _WIN32
   timespec_get(&ts, TIME_UTC);
+#else
+  clock_gettime(CLOCK_REALTIME, &ts);
+#endif
   if (ts.tv_sec - BOC_LAST_REF_TRACKING_REPORT.tv_sec > 1) {
     int_least64_t alive = atomic_load(&BOC_ACTIVE_COWNS);
     int_least64_t total = atomic_load(&BOC_TOTAL_COWNS);
@@ -773,7 +1046,8 @@ static BOCRecycleQueue *BOCRecycleQueue_new(int_least64_t index) {
   queue->next = 0;
 
   intptr_t queue_ptr = (intptr_t)queue;
-  intptr_t old_head_ptr = atomic_exchange(&BOC_RECYCLE_QUEUE_HEAD, queue_ptr);
+  intptr_t old_head_ptr =
+      atomic_exchange_intptr(&BOC_RECYCLE_QUEUE_HEAD, queue_ptr);
   if (old_head_ptr == 0) {
     return queue;
   }
@@ -781,8 +1055,8 @@ static BOCRecycleQueue *BOCRecycleQueue_new(int_least64_t index) {
   BOCRecycleQueue *old_head = (BOCRecycleQueue *)old_head_ptr;
   old_head->index = index;
   old_head->tail = node;
-  atomic_store(&old_head->head, node_ptr);
-  atomic_store(&old_head->next, queue_ptr);
+  atomic_store_intptr(&old_head->head, node_ptr);
+  atomic_store_intptr(&old_head->next, queue_ptr);
 
   old_head->xidata_to_cowns = PyDict_New();
   if (old_head->xidata_to_cowns == NULL) {
@@ -876,163 +1150,1235 @@ static PyObject *object_to_xidata(PyObject *value, XIDATA_T **xidata_ptr) {
   return NULL;
 }
 
-/// @brief The threadsafe cown object.
-/// @details This can be safely referenced and used from multiple processes.
-typedef struct boc_cown {
-  int_least64_t id;
-  /// @brief The python object held in this cown.
-  /// @details This is only non-NULL when the cown is acquired.
-  PyObject *value;
-  /// @brief Whether the value is pickled when serialized
-  bool pickled;
-  /// @brief Whether the cown holds an exception object
-  bool exception;
-  /// @brief the threadsafe serialized cown contents
-  XIDATA_T *xidata;
-  /// @brief the module which last released this cown
-  BOCRecycleQueue *recycle_queue;
-  /// @brief The ID of the interpreter that currently has acquired this cown.
-  atomic_int_least64_t owner;
-  /// @brief The last behavior which needs to acquire this cown
-  atomic_intptr_t last; // (BOCBehavior *)
-  /// @brief Atomic reference count for the cown
-  atomic_int_least64_t rc;
-  /// @brief Atomic weak reference count for the cown
-  atomic_int_least64_t weak_rc;
-} BOCCown;
-
-static inline int_least64_t cown_weak_decref(BOCCown *cown) {
-  int_least64_t weak_rc = atomic_fetch_add(&cown->weak_rc, -1) - 1;
-  PRINTDBG("cown_weak_decref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n",
-           cown, cown->id, weak_rc);
+// ---------------------------------------------------------------------------
+// Noticeboard C functions
+// ---------------------------------------------------------------------------
 
-  if (weak_rc == 0) {
-    // reference count is truly zero, we can free the memory
-    PyMem_RawFree(cown);
-    BOC_REF_TRACKING_REMOVE_COWN();
+/// @brief Reject a noticeboard mutation called from outside the noticeboard
+/// thread.
+/// @details Sets a Python @c RuntimeError if a noticeboard thread has been
+/// registered (via @ref _core_set_noticeboard_thread) and the calling thread
+/// is not it. Prior to runtime startup the check is permissive so that
+/// @c Behaviors.stop and unit tests can drive the noticeboard from the
+/// main thread before the noticeboard thread is up. The single-writer
+/// invariant is what makes the Python-level read-modify-write in
+/// @c noticeboard_update TOCTOU-free.
+/// @param op_name Name of the operation, used in the error message
+/// @return 0 on success, -1 on error (with exception set)
+static int nb_check_noticeboard_thread(const char *op_name) {
+  uintptr_t owner = (uintptr_t)atomic_load_intptr(&NB_NOTICEBOARD_TID);
+  if (owner == 0) {
+    return 0;
+  }
+  uintptr_t current = (uintptr_t)PyThread_get_thread_ident();
+  if (current != owner) {
+    PyErr_Format(PyExc_RuntimeError,
+                 "%s must be called from the noticeboard thread", op_name);
+    return -1;
   }
-
-  return weak_rc;
+  return 0;
 }
 
-static inline void report_unhandled_exception(BOCCown *cown) {
-  fprintf(stderr, "Cown(%p) contains an unhandled exception: ", cown);
-
-  if (cown->value != NULL) {
-    PyObject_Print(cown->value, stderr, 0);
-    fprintf(stderr, "\n");
-    return;
+/// @brief Take strong references to every CownCapsule in @p cowns
+/// @details Allocates a fresh @c BOCCown** array (or returns NULL if
+/// @p cowns is empty), iterates the sequence calling @c COWN_INCREF on
+/// each entry's underlying BOCCown, and writes the resulting array and
+/// count to @p out_array / @p out_count. On error, no INCREFs leak: any
+/// already-taken pins are dropped before return.
+/// @param cowns A Python sequence of CownCapsule objects (may be NULL or
+///   None for "no pins")
+/// @param out_array Out param for the allocated array
+/// @param out_count Out param for the number of entries
+/// @return 0 on success, -1 on error (with exception set)
+///
+/// @details The caller is expected to pass a sequence of integer pointers
+/// to BOCCown structs that have already been COWN_INCREFed by the writer
+/// thread (typically via @ref _core_cown_pin_pointers). This function
+/// **transfers** those refs into the noticeboard entry: it does not take
+/// any additional ref. On error every transferred ref is released so the
+/// caller can treat -1 as "ownership not taken, original refs already
+/// released".
+static int nb_pin_cowns(PyObject *cowns, BOCCown ***out_array, int *out_count) {
+  *out_array = NULL;
+  *out_count = 0;
+
+  if (cowns == NULL || cowns == Py_None) {
+    return 0;
+  }
+
+  PyObject *seq =
+      PySequence_Fast(cowns, "noticeboard pin list must be a sequence");
+  if (seq == NULL) {
+    return -1;
   }
 
-  if (cown->xidata == NULL) {
-    fprintf(stderr,
-            "<fatal error: value and xidata are NULL on exception cown>\n");
-    return;
+  Py_ssize_t n = PySequence_Fast_GET_SIZE(seq);
+  if (n == 0) {
+    Py_DECREF(seq);
+    return 0;
   }
 
-  cown->value = xidata_to_object(cown->xidata, cown->pickled);
+  BOCCown **pins = (BOCCown **)PyMem_RawMalloc(sizeof(BOCCown *) * n);
+  if (pins == NULL) {
+    Py_DECREF(seq);
+    PyErr_NoMemory();
+    return -1;
+  }
 
-  if (cown->value == NULL) {
-    PyErr_Clear();
-    fprintf(stderr, "<fatal error: unable to deserialize exception>\n");
-    return;
+  int taken = 0;
+  for (Py_ssize_t i = 0; i < n; i++) {
+    PyObject *item = PySequence_Fast_GET_ITEM(seq, i);
+    BOCCown *cown = (BOCCown *)PyLong_AsVoidPtr(item);
+    if (cown == NULL) {
+      // PyLong_AsVoidPtr returns NULL both on error and for integer 0.
+      // Reject both paths explicitly: a NULL pin would be dereferenced
+      // downstream (COWN_DECREF on NULL is UB), and an integer 0 is
+      // indistinguishable from a crafted attacker pin pointing at the
+      // zero page.
+      if (!PyErr_Occurred()) {
+        PyErr_SetString(PyExc_ValueError,
+                        "noticeboard pin list must not contain NULL / "
+                        "integer 0 entries");
+      } else {
+        PyErr_SetString(PyExc_TypeError,
+                        "noticeboard pin list must contain only integer "
+                        "BOCCown pointers (use _core.cown_pin_pointers())");
+      }
+      goto fail;
+    }
+    pins[taken++] = cown;
   }
 
-  PyObject_Print(cown->value, stderr, 0);
-  fprintf(stderr, "\n");
-  return;
+  Py_DECREF(seq);
+  *out_array = pins;
+  *out_count = taken;
+  return 0;
+
+fail:
+  // Release every transferred ref the writer pre-INCREFed for us. The
+  // ones we already stashed into `pins` plus the rest of the sequence
+  // we never reached.
+  for (int i = 0; i < taken; i++) {
+    COWN_DECREF(pins[i]);
+  }
+  for (Py_ssize_t i = (Py_ssize_t)taken + 1; i < n; i++) {
+    PyObject *item = PySequence_Fast_GET_ITEM(seq, i);
+    BOCCown *c = (BOCCown *)PyLong_AsVoidPtr(item);
+    if (c != NULL) {
+      COWN_DECREF(c);
+    } else {
+      PyErr_Clear();
+    }
+  }
+  PyMem_RawFree(pins);
+  Py_DECREF(seq);
+  return -1;
 }
 
-static void BOCRecycleQueue_enqueue(BOCRecycleQueue *queue, XIDATA_T *xidata);
+/// @brief Drop the calling thread's snapshot cache and proxy
+/// @details Both objects are decref-cleared and the per-behavior version
+/// state is reset. Safe to call when nothing is cached.
+static void nb_drop_local_cache(void) {
+  Py_CLEAR(NB_SNAPSHOT_PROXY);
+  Py_CLEAR(NB_SNAPSHOT_CACHE);
+  NB_SNAPSHOT_VERSION = -1;
+  NB_VERSION_CHECKED = false;
+}
+
+/// @brief Write a key-value pair into the noticeboard under mutex
+/// @details The value is serialized to XIData here (in the main interpreter),
+/// so XIDATA_FREE is always safe to call from the same interpreter. The
+/// optional third argument is a sequence of CownCapsule objects whose
+/// underlying BOCCowns are referenced by the serialized bytes; the
+/// noticeboard takes a strong reference on each so that they outlive
+/// every reader's pickled view, regardless of whether the original
+/// CownCapsule is dropped by user code.
+/// @param self The module
+/// @param args Tuple of (key: str, value: object[, cowns: sequence])
+/// @return Py_None on success, NULL on error
+static PyObject *_core_noticeboard_write_direct(PyObject *self,
+                                                PyObject *args) {
+  BOC_STATE_SET(self);
 
-/// @brief Atomic decref for the cown
-/// @param cown the cown to decref
-/// @return the new reference count
-static inline int_least64_t cown_decref(BOCCown *cown) {
-  int_least64_t rc = atomic_fetch_add(&cown->rc, -1) - 1;
-  PRINTDBG("cown_decref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n", cown,
-           cown->id, rc);
-  if (rc != 0) {
-    return rc;
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "noticeboard_write_direct must be called from the primary "
+                    "interpreter");
+    return NULL;
   }
 
-  PRINTDBG("cleaning cown\n");
+  if (nb_check_noticeboard_thread("noticeboard_write_direct") < 0) {
+    return NULL;
+  }
 
-  if (cown->exception) {
-    report_unhandled_exception(cown);
+  const char *key;
+  Py_ssize_t key_len;
+  PyObject *value;
+  PyObject *cowns = Py_None;
+
+  if (!PyArg_ParseTuple(args, "s#O|O", &key, &key_len, &value, &cowns)) {
+    return NULL;
   }
 
-  // we can clear the object and recycle the xidata
-  if (cown->value != NULL) {
-    assert(cown->owner == get_interpid());
-    Py_CLEAR(cown->value);
+  if (key_len >= NB_KEY_SIZE) {
+    PyErr_SetString(PyExc_ValueError,
+                    "noticeboard key too long (max 63 UTF-8 bytes)");
+    return NULL;
   }
 
-  if (cown->xidata != NULL) {
-    BOCRecycleQueue_enqueue(cown->recycle_queue, cown->xidata);
+  if (memchr(key, '\0', key_len) != NULL) {
+    PyErr_SetString(PyExc_ValueError,
+                    "noticeboard key must not contain NUL characters");
+    return NULL;
   }
 
-  cown_weak_decref(cown);
+  // Pin the cowns BEFORE serializing so an error here does not leave us
+  // with a stored entry whose cowns can be freed under us.
+  BOCCown **new_pins = NULL;
+  int new_pin_count = 0;
+  if (nb_pin_cowns(cowns, &new_pins, &new_pin_count) < 0) {
+    return NULL;
+  }
 
-  return 0;
-}
+  // Serialize the value to XIData in the main interpreter
+  XIDATA_T *xidata = NULL;
+  PyObject *pickled = object_to_xidata(value, &xidata);
+  if (pickled == NULL) {
+    if (xidata != NULL) {
+      XIDATA_FREE(xidata);
+    }
+    // Roll back the pins we just took.
+    for (int i = 0; i < new_pin_count; i++) {
+      COWN_DECREF(new_pins[i]);
+    }
+    PyMem_RawFree(new_pins);
+    return NULL;
+  }
 
-#define COWN_DECREF(c) cown_decref(c)
-#define COWN_WEAK_DECREF(c) cown_weak_decref(c)
+  bool is_pickled = (pickled == Py_True);
+  Py_DECREF(pickled);
 
-/// @brief Atomic incref for the cown
-/// @param cown the cown to incref
-/// @return the new reference count
-static inline int_least64_t cown_incref(BOCCown *cown) {
-  int_least64_t rc = atomic_fetch_add(&cown->rc, 1) + 1;
-  PRINTDBG("cown_incref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n", cown,
-           cown->id, rc);
-  return rc;
-}
+  mtx_lock(&NB.mutex);
 
-static inline int_least64_t cown_weak_incref(BOCCown *cown) {
-  int_least64_t rc = atomic_fetch_add(&cown->weak_rc, 1) + 1;
-  PRINTDBG("cown_weak_incref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n",
-           cown, cown->id, rc);
-  return rc;
-}
+  // find existing entry or allocate new one
+  NoticeboardEntry *target = NULL;
+  for (int i = 0; i < NB.count; i++) {
+    if (strncmp(NB.entries[i].key, key, NB_KEY_SIZE) == 0) {
+      target = &NB.entries[i];
+      break;
+    }
+  }
 
-static inline bool cown_promote(BOCCown *cown) {
-  int_least64_t expected;
-  int_least64_t desired;
-  do {
-    expected = atomic_load(&cown->rc);
-    if (expected == 0) {
-      return false;
+  if (target == NULL) {
+    if (NB.count >= NB_MAX_ENTRIES) {
+      mtx_unlock(&NB.mutex);
+      XIDATA_FREE(xidata);
+      for (int i = 0; i < new_pin_count; i++) {
+        COWN_DECREF(new_pins[i]);
+      }
+      PyMem_RawFree(new_pins);
+      PyErr_SetString(PyExc_RuntimeError, "Noticeboard is full (max 64)");
+      return NULL;
     }
+    target = &NB.entries[NB.count++];
+    strncpy(target->key, key, NB_KEY_SIZE - 1);
+    target->key[NB_KEY_SIZE - 1] = '\0';
+    target->value = NULL;
+    target->pinned_cowns = NULL;
+    target->pinned_count = 0;
+  }
+
+  // Stash old value and old pins to free after releasing the mutex —
+  // XIDATA_FREE / COWN_DECREF may invoke Python __del__ which could
+  // re-enter the noticeboard.
+  XIDATA_T *old_value = target->value;
+  BOCCown **old_pins = target->pinned_cowns;
+  int old_pin_count = target->pinned_count;
+
+  target->value = xidata;
+  target->pickled = is_pickled;
+  target->pinned_cowns = new_pins;
+  target->pinned_count = new_pin_count;
+
+  // Bump the version under mutex so readers' acquire loads can lazily
+  // invalidate their thread-local snapshot caches without us touching
+  // their cache directly.
+  atomic_fetch_add(&NB_VERSION, 1);
+
+  mtx_unlock(&NB.mutex);
+
+  if (old_value != NULL) {
+    XIDATA_FREE(old_value);
+  }
+  if (old_pins != NULL) {
+    for (int i = 0; i < old_pin_count; i++) {
+      COWN_DECREF(old_pins[i]);
+    }
+    PyMem_RawFree(old_pins);
+  }
 
-    desired = expected + 1;
-  } while (!atomic_compare_exchange_strong(&cown->rc, &expected, desired));
+  // Note: this thread's NB_SNAPSHOT_CACHE is intentionally NOT cleared.
+  // Within a behavior, a writer must not observe its own write — that is
+  // the no-polling invariant. The cache will be lazily revalidated at
+  // the next behavior boundary (see _core_noticeboard_cache_clear).
 
-  return true;
+  Py_RETURN_NONE;
 }
 
-#define COWN_INCREF(c) cown_incref((c))
-#define COWN_WEAK_INCREF(c) cown_weak_incref((c))
-#define COWN_PROMOTE(c) cown_promote((c))
-
-static inline void cown_set_value(BOCCown *cown, PyObject *value) {
-  if (value == NULL) {
-    Py_XDECREF(cown->value);
-    return;
+/// @brief Return a cached read-only snapshot of the noticeboard
+/// @details Three fast paths, in order:
+///   1. If @ref NB_VERSION_CHECKED is true, the cached proxy was already
+///      validated for this behavior; return it without consulting
+///      @ref NB_VERSION. This preserves the no-polling invariant: a
+///      behavior cannot observe writes that happened mid-flight, even
+///      its own.
+///   2. If the cached dict's @ref NB_SNAPSHOT_VERSION matches the
+///      current @ref NB_VERSION, the cache is still fresh; mark it
+///      checked and return the proxy.
+///   3. Otherwise, drop the cache and fall through to the rebuild.
+/// The rebuild reads all entries under @ref Noticeboard::mutex,
+/// captures the version while still holding the mutex (so a writer
+/// cannot race past us), deserializes non-pickled values immediately,
+/// and defers @c pickle.loads to after the mutex is released. The
+/// returned object is a @c types.MappingProxyType wrapping the cached
+/// dict; user code cannot mutate the cache through it.
+/// @param self The module
+/// @return A read-only mapping (MappingProxyType) of str → Python object
+static PyObject *_core_noticeboard_snapshot(PyObject *self,
+                                            PyObject *Py_UNUSED(dummy)) {
+  BOC_STATE_SET(self);
+
+  if (NB_SNAPSHOT_PROXY != NULL) {
+    if (NB_VERSION_CHECKED) {
+      // Within-behavior repeat call: same proxy, no atomic load.
+      Py_INCREF(NB_SNAPSHOT_PROXY);
+      return NB_SNAPSHOT_PROXY;
+    }
+    // First snapshot call this behavior: do exactly one version check.
+    int_least64_t current = atomic_load(&NB_VERSION);
+    if (current == NB_SNAPSHOT_VERSION) {
+      NB_VERSION_CHECKED = true;
+      Py_INCREF(NB_SNAPSHOT_PROXY);
+      return NB_SNAPSHOT_PROXY;
+    }
+    nb_drop_local_cache();
   }
 
-  Py_XSETREF(cown->value, Py_NewRef(value));
-  cown->exception = PyExceptionInstance_Check(value);
-}
-
-/// @brief Create a new BOCCown.
-/// @param value The initial value.
-/// @return A new BOCCown, or NULL on error.
-static BOCCown *BOCCown_new(PyObject *value) {
-  BOCCown *cown = (BOCCown *)PyMem_RawMalloc(sizeof(BOCCown));
-  if (cown == NULL) {
-    PyErr_NoMemory();
+  PyObject *dict = PyDict_New();
+  if (dict == NULL) {
+    return NULL;
+  }
+
+  // Deferred entries: pickled values whose bytes were extracted under mutex
+  // but need unpickling outside the lock.
+  PyObject *deferred_keys[NB_MAX_ENTRIES];
+  PyObject *deferred_bytes[NB_MAX_ENTRIES];
+  int deferred_count = 0;
+
+  // Keepalive pins: while we hold the mutex we take an extra COWN_INCREF
+  // on every pin reachable from a deferred (pickled) entry. The bytes we
+  // are about to unpickle outside the mutex contain raw BOCCown pointers
+  // whose validity depends on the entry's pin list. Without this extra
+  // ref, a concurrent writer could overwrite the entry the instant we
+  // drop the mutex, release the old pins, and free the BOCCowns before
+  // we touch them — UAF in _cown_capsule_from_pointer. Released after
+  // the deferred unpickling completes. Each deferred entry contributes
+  // a heap-allocated pin pointer array sized to its pin count.
+  BOCCown **keepalive_pins[NB_MAX_ENTRIES];
+  int keepalive_counts[NB_MAX_ENTRIES];
+  for (int i = 0; i < NB_MAX_ENTRIES; i++) {
+    keepalive_pins[i] = NULL;
+    keepalive_counts[i] = 0;
+  }
+
+  mtx_lock(&NB.mutex);
+
+  // Capture the noticeboard version while still holding the mutex so
+  // that no concurrent writer can bump it between snapshot completion
+  // and version capture.
+  int_least64_t built_version = atomic_load(&NB_VERSION);
+
+  for (int i = 0; i < NB.count; i++) {
+    NoticeboardEntry *entry = &NB.entries[i];
+    if (entry->value == NULL) {
+      continue;
+    }
+
+    // XIDATA_NEWOBJECT is lightweight (no Python code execution)
+    PyObject *raw = XIDATA_NEWOBJECT(entry->value);
+    if (raw == NULL) {
+      mtx_unlock(&NB.mutex);
+      goto fail_deferred;
+    }
+
+    PyObject *key = PyUnicode_FromString(entry->key);
+    if (key == NULL) {
+      Py_DECREF(raw);
+      mtx_unlock(&NB.mutex);
+      goto fail_deferred;
+    }
+
+    if (!entry->pickled) {
+      // Non-pickled: add directly to dict
+      if (PyDict_SetItem(dict, key, raw) < 0) {
+        Py_DECREF(key);
+        Py_DECREF(raw);
+        mtx_unlock(&NB.mutex);
+        goto fail_deferred;
+      }
+      Py_DECREF(key);
+      Py_DECREF(raw);
+    } else {
+      // Pickled: defer unpickling to outside the mutex. Take a fresh
+      // COWN_INCREF on every pin so the BOCCowns referenced by the bytes
+      // survive past mtx_unlock — see keepalive_pins comment above.
+      if (entry->pinned_count > 0) {
+        BOCCown **pins = (BOCCown **)PyMem_RawMalloc(sizeof(BOCCown *) *
+                                                     entry->pinned_count);
+        if (pins == NULL) {
+          Py_DECREF(key);
+          Py_DECREF(raw);
+          mtx_unlock(&NB.mutex);
+          PyErr_NoMemory();
+          goto fail_deferred;
+        }
+        for (int j = 0; j < entry->pinned_count; j++) {
+          pins[j] = entry->pinned_cowns[j];
+          COWN_INCREF(pins[j]);
+        }
+        keepalive_pins[deferred_count] = pins;
+        keepalive_counts[deferred_count] = entry->pinned_count;
+      }
+      deferred_keys[deferred_count] = key;
+      deferred_bytes[deferred_count] = raw;
+      deferred_count++;
+    }
+  }
+
+  mtx_unlock(&NB.mutex);
+
+  // Unpickle deferred entries outside the mutex
+  for (int i = 0; i < deferred_count; i++) {
+    PyObject *value = _PyPickle_Loads(deferred_bytes[i]);
+    Py_DECREF(deferred_bytes[i]);
+    deferred_bytes[i] = NULL;
+
+    if (value == NULL) {
+      Py_DECREF(deferred_keys[i]);
+      deferred_keys[i] = NULL;
+      // Clean up remaining deferred entries
+      for (int j = i + 1; j < deferred_count; j++) {
+        Py_DECREF(deferred_keys[j]);
+        Py_DECREF(deferred_bytes[j]);
+      }
+      // Release every keepalive pin (including the one for this entry).
+      for (int j = 0; j < deferred_count; j++) {
+        if (keepalive_pins[j] != NULL) {
+          for (int k = 0; k < keepalive_counts[j]; k++) {
+            COWN_DECREF(keepalive_pins[j][k]);
+          }
+          PyMem_RawFree(keepalive_pins[j]);
+          keepalive_pins[j] = NULL;
+        }
+      }
+      Py_DECREF(dict);
+      return NULL;
+    }
+
+    if (PyDict_SetItem(dict, deferred_keys[i], value) < 0) {
+      Py_DECREF(deferred_keys[i]);
+      Py_DECREF(value);
+      for (int j = i + 1; j < deferred_count; j++) {
+        Py_DECREF(deferred_keys[j]);
+        Py_DECREF(deferred_bytes[j]);
+      }
+      for (int j = 0; j < deferred_count; j++) {
+        if (keepalive_pins[j] != NULL) {
+          for (int k = 0; k < keepalive_counts[j]; k++) {
+            COWN_DECREF(keepalive_pins[j][k]);
+          }
+          PyMem_RawFree(keepalive_pins[j]);
+          keepalive_pins[j] = NULL;
+        }
+      }
+      Py_DECREF(dict);
+      return NULL;
+    }
+
+    Py_DECREF(deferred_keys[i]);
+    Py_DECREF(value);
+
+    // Successful unpickle: the snapshot dict (and its CownCapsules)
+    // now hold their own refs on every BOCCown referenced by the bytes.
+    // Drop our keepalive pin for this entry.
+    if (keepalive_pins[i] != NULL) {
+      for (int k = 0; k < keepalive_counts[i]; k++) {
+        COWN_DECREF(keepalive_pins[i][k]);
+      }
+      PyMem_RawFree(keepalive_pins[i]);
+      keepalive_pins[i] = NULL;
+    }
+  }
+
+  PyObject *proxy = PyDictProxy_New(dict);
+  if (proxy == NULL) {
+    Py_DECREF(dict);
+    return NULL;
+  }
+
+  // The proxy holds a strong reference to dict; we keep our own as well so
+  // that the dict is reachable for direct mutation in the rebuild path
+  // and the proxy survives at least as long as the dict.
+  NB_SNAPSHOT_CACHE = dict;
+  NB_SNAPSHOT_PROXY = proxy;
+  NB_SNAPSHOT_VERSION = built_version;
+  NB_VERSION_CHECKED = true;
+  Py_INCREF(proxy);
+  return proxy;
+
+fail_deferred:
+  for (int i = 0; i < deferred_count; i++) {
+    Py_DECREF(deferred_keys[i]);
+    Py_DECREF(deferred_bytes[i]);
+    if (keepalive_pins[i] != NULL) {
+      for (int k = 0; k < keepalive_counts[i]; k++) {
+        COWN_DECREF(keepalive_pins[i][k]);
+      }
+      PyMem_RawFree(keepalive_pins[i]);
+      keepalive_pins[i] = NULL;
+    }
+  }
+  Py_DECREF(dict);
+  return NULL;
+}
+
+/// @brief Clear all noticeboard entries and free their XIData and pins
+/// @details Safe to call XIDATA_FREE directly because all noticeboard XIData
+/// is created by the main interpreter (in write_direct). Also drops every
+/// entry's pinned cowns (COWN_DECREF) and clears the calling thread's
+/// snapshot cache so that any cached proxy from before the clear is not
+/// reused after a runtime restart. Other threads' caches will lazily
+/// revalidate on their next snapshot call thanks to the @ref NB_VERSION
+/// bump; their cached CownCapsules keep the underlying BOCCowns alive
+/// until each cache is dropped.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Py_None
+static PyObject *_core_noticeboard_clear(PyObject *self,
+                                         PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "noticeboard_clear must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+
+  // Collect entries to free after releasing the mutex — XIDATA_FREE and
+  // COWN_DECREF may invoke Python __del__ which could re-enter the
+  // noticeboard.
+  XIDATA_T *to_free[NB_MAX_ENTRIES];
+  BOCCown **to_unpin[NB_MAX_ENTRIES];
+  int to_unpin_count[NB_MAX_ENTRIES];
+  int to_free_count = 0;
+  int to_unpin_entries = 0;
+
+  mtx_lock(&NB.mutex);
+
+  for (int i = 0; i < NB.count; i++) {
+    if (NB.entries[i].value != NULL) {
+      to_free[to_free_count++] = NB.entries[i].value;
+      NB.entries[i].value = NULL;
+    }
+    if (NB.entries[i].pinned_cowns != NULL) {
+      to_unpin[to_unpin_entries] = NB.entries[i].pinned_cowns;
+      to_unpin_count[to_unpin_entries] = NB.entries[i].pinned_count;
+      to_unpin_entries++;
+      NB.entries[i].pinned_cowns = NULL;
+      NB.entries[i].pinned_count = 0;
+    }
+  }
+  NB.count = 0;
+  memset(NB.entries, 0, sizeof(NB.entries));
+
+  // Bump the version under mutex; see noticeboard_write_direct for
+  // rationale.
+  atomic_fetch_add(&NB_VERSION, 1);
+
+  mtx_unlock(&NB.mutex);
+
+  for (int i = 0; i < to_free_count; i++) {
+    XIDATA_FREE(to_free[i]);
+  }
+  for (int i = 0; i < to_unpin_entries; i++) {
+    for (int j = 0; j < to_unpin_count[i]; j++) {
+      COWN_DECREF(to_unpin[i][j]);
+    }
+    PyMem_RawFree(to_unpin[i]);
+  }
+
+  // Drop this thread's cache so a subsequent runtime cycle does not
+  // reuse a stale proxy. Other threads will revalidate via NB_VERSION.
+  nb_drop_local_cache();
+
+  Py_RETURN_NONE;
+}
+
+/// @brief Delete a single noticeboard entry by key
+/// @details Acquires mutex, finds the entry, frees its XIData and pinned
+/// cowns, shifts remaining entries down, and decrements count. No-op if
+/// key not found.
+/// @param self The module
+/// @param args Tuple of (key: str)
+/// @return Py_None on success, NULL on error
+static PyObject *_core_noticeboard_delete(PyObject *self, PyObject *args) {
+  BOC_STATE_SET(self);
+
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "noticeboard_delete must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+
+  if (nb_check_noticeboard_thread("noticeboard_delete") < 0) {
+    return NULL;
+  }
+
+  const char *key;
+  Py_ssize_t key_len;
+
+  if (!PyArg_ParseTuple(args, "s#", &key, &key_len)) {
+    return NULL;
+  }
+
+  if (key_len >= NB_KEY_SIZE) {
+    PyErr_SetString(PyExc_ValueError,
+                    "noticeboard key too long (max 63 UTF-8 bytes)");
+    return NULL;
+  }
+
+  if (memchr(key, '\0', key_len) != NULL) {
+    PyErr_SetString(PyExc_ValueError,
+                    "noticeboard key must not contain NUL characters");
+    return NULL;
+  }
+
+  mtx_lock(&NB.mutex);
+
+  int found = -1;
+  for (int i = 0; i < NB.count; i++) {
+    if (strncmp(NB.entries[i].key, key, NB_KEY_SIZE) == 0) {
+      found = i;
+      break;
+    }
+  }
+
+  // Stash the entry's XIData and pins to free after releasing the mutex.
+  XIDATA_T *deleted_value = NULL;
+  BOCCown **deleted_pins = NULL;
+  int deleted_pin_count = 0;
+
+  if (found >= 0) {
+    deleted_value = NB.entries[found].value;
+    deleted_pins = NB.entries[found].pinned_cowns;
+    deleted_pin_count = NB.entries[found].pinned_count;
+
+    // shift remaining entries down
+    for (int i = found; i < NB.count - 1; i++) {
+      NB.entries[i] = NB.entries[i + 1];
+    }
+
+    // clear the last slot and decrement
+    memset(&NB.entries[NB.count - 1], 0, sizeof(NoticeboardEntry));
+    NB.count--;
+
+    // Bump the version under mutex; see noticeboard_write_direct.
+    atomic_fetch_add(&NB_VERSION, 1);
+  }
+
+  mtx_unlock(&NB.mutex);
+
+  if (deleted_value != NULL) {
+    XIDATA_FREE(deleted_value);
+  }
+  if (deleted_pins != NULL) {
+    for (int i = 0; i < deleted_pin_count; i++) {
+      COWN_DECREF(deleted_pins[i]);
+    }
+    PyMem_RawFree(deleted_pins);
+  }
+
+  // Note: this thread's NB_SNAPSHOT_CACHE is intentionally NOT cleared;
+  // the no-polling invariant applies equally to deletes.
+
+  Py_RETURN_NONE;
+}
+
+/// @brief Re-arm the per-behavior version check on the cached snapshot
+/// @details Called by the worker loop at every behavior boundary. Does
+/// NOT drop @ref NB_SNAPSHOT_PROXY: the cache may still be valid, and
+/// the next call to @ref _core_noticeboard_snapshot will perform exactly
+/// one atomic load against @ref NB_VERSION to find out. Within a
+/// behavior, the cache is then returned unconditionally for any further
+/// calls, preserving the no-polling invariant.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Py_None
+static PyObject *_core_noticeboard_cache_clear(PyObject *self,
+                                               PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+
+  NB_VERSION_CHECKED = false;
+
+  Py_RETURN_NONE;
+}
+
+/// @brief Return the current noticeboard version counter
+/// @details The counter is incremented under @ref Noticeboard::mutex on
+/// every successful @c notice_write, @c notice_delete, or
+/// @c noticeboard_clear. Read with sequentially-consistent semantics.
+/// Two reads returning the same value mean no commit happened between
+/// them; a strictly larger value means at least one commit happened.
+/// Useful for detecting noticeboard changes without taking a full
+/// snapshot.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return A Python int with the current noticeboard version
+static PyObject *_core_noticeboard_version(PyObject *self,
+                                           PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  return PyLong_FromLongLong((long long)atomic_load(&NB_VERSION));
+}
+
+/// @brief Register the calling thread as the noticeboard mutator thread
+/// @details Must be called from the noticeboard thread before it processes
+/// any noticeboard mutation messages. Subsequent calls to
+/// @ref _core_noticeboard_write_direct or @ref _core_noticeboard_delete
+/// from any other thread will raise @c RuntimeError. Pass with no
+/// arguments to install the current thread; the registration is global
+/// and persists until the runtime stops.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Py_None
+static PyObject *_core_set_noticeboard_thread(PyObject *self,
+                                              PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "set_noticeboard_thread must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+  uintptr_t tid = (uintptr_t)PyThread_get_thread_ident();
+  // One-shot per runtime: refuse if the slot is already owned.
+  // clear_noticeboard_thread() resets NB_NOTICEBOARD_TID to 0 at stop(),
+  // so a fresh start() cycle is fine. This closes the hijack-the-
+  // mutator-slot hole identified by the security lens.
+  intptr_t expected = 0;
+  if (!atomic_compare_exchange_strong_intptr(&NB_NOTICEBOARD_TID, &expected,
+                                             (intptr_t)tid)) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "set_noticeboard_thread: noticeboard mutator thread "
+                    "is already registered");
+    return NULL;
+  }
+  Py_RETURN_NONE;
+}
+
+/// @brief Clear the registered noticeboard mutator thread
+/// @details Restores the permissive (pre-startup) check. Called by the
+/// Python @c Behaviors.stop path after the noticeboard thread has joined
+/// so that subsequent main-thread calls (e.g. @c noticeboard_clear from
+/// a runtime restart cycle) are not rejected.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Py_None
+static PyObject *_core_clear_noticeboard_thread(PyObject *self,
+                                                PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "clear_noticeboard_thread must be called from the "
+                    "primary interpreter");
+    return NULL;
+  }
+  (void)atomic_exchange_intptr(&NB_NOTICEBOARD_TID, (intptr_t)0);
+  Py_RETURN_NONE;
+}
+
+/// @brief Allocate a fresh notice_sync sequence number.
+/// @details Atomically increments @ref NB_SYNC_REQUESTED and returns the
+/// new value. The caller posts @c ("sync", N) on the @c boc_noticeboard
+/// tag and then waits via @ref _core_notice_sync_wait until that sequence
+/// has been processed.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return A Python int with the caller's seq.
+static PyObject *_core_notice_sync_request(PyObject *self,
+                                           PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  int_least64_t seq = atomic_fetch_add(&NB_SYNC_REQUESTED, 1) + 1;
+  return PyLong_FromLongLong((long long)seq);
+}
+
+/// @brief Mark a notice_sync sequence as processed and wake waiters.
+/// @details Called from the noticeboard-thread Python arm when it pops
+/// a @c ("sync", N) sentinel off the queue. Stores @c max(processed, N)
+/// into @ref NB_SYNC_PROCESSED (defensive against any reordering, though
+/// the MPSC tag is FIFO) and broadcasts @ref NB_SYNC_COND.
+/// @param self The module (unused)
+/// @param args A tuple @c (N,) — the sequence number being completed
+/// @return Py_None
+static PyObject *_core_notice_sync_complete(PyObject *self, PyObject *args) {
+  BOC_STATE_SET(self);
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "notice_sync_complete must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+  long long seq;
+  if (!PyArg_ParseTuple(args, "L", &seq)) {
+    return NULL;
+  }
+
+  Py_BEGIN_ALLOW_THREADS mtx_lock(&NB_SYNC_MUTEX);
+  // Defense in depth: with a single noticeboard thread draining the
+  // FIFO boc_noticeboard tag, `seq` arrives strictly monotonically and
+  // a plain `atomic_store(seq)` would be correct. We keep the max-of
+  // pattern so that if a future change introduces a second mutator
+  // thread or any out-of-order delivery, NB_SYNC_PROCESSED can never
+  // regress and unblock waiters early. Both load and store happen under
+  // NB_SYNC_MUTEX (the only writer is here), so this is not a TOCTOU.
+  int_least64_t cur = atomic_load(&NB_SYNC_PROCESSED);
+  if ((int_least64_t)seq > cur) {
+    atomic_store(&NB_SYNC_PROCESSED, (int_least64_t)seq);
+  }
+  cnd_broadcast(&NB_SYNC_COND);
+  mtx_unlock(&NB_SYNC_MUTEX);
+  Py_END_ALLOW_THREADS
+
+      Py_RETURN_NONE;
+}
+
+/// @brief Block until @p my_seq has been processed by the noticeboard thread.
+/// @details Loops on @ref NB_SYNC_COND under @ref NB_SYNC_MUTEX until
+/// @ref NB_SYNC_PROCESSED is at least @p my_seq, or until @p timeout
+/// seconds elapse. A negative or @c None timeout means wait forever.
+/// Releases the GIL across the wait.
+/// @param self The module (unused)
+/// @param args A tuple @c (my_seq, timeout) — int and float-or-None
+/// @return @c True on success, @c False on timeout.
+static PyObject *_core_notice_sync_wait(PyObject *self, PyObject *args) {
+  BOC_STATE_SET(self);
+  long long my_seq;
+  PyObject *timeout_obj;
+  if (!PyArg_ParseTuple(args, "LO", &my_seq, &timeout_obj)) {
+    return NULL;
+  }
+
+  bool do_timeout = false;
+  double end_time = 0.0;
+  if (timeout_obj != Py_None) {
+    double timeout = PyFloat_AsDouble(timeout_obj);
+    if (timeout == -1.0 && PyErr_Occurred()) {
+      return NULL;
+    }
+    if (timeout >= 0.0) {
+      do_timeout = true;
+      end_time = boc_now_s() + timeout;
+    }
+  }
+
+  bool ok = true;
+  Py_BEGIN_ALLOW_THREADS mtx_lock(&NB_SYNC_MUTEX);
+  while (atomic_load(&NB_SYNC_PROCESSED) < (int_least64_t)my_seq) {
+    if (do_timeout) {
+      double now = boc_now_s();
+      if (now >= end_time) {
+        ok = false;
+        break;
+      }
+      cnd_timedwait_s(&NB_SYNC_COND, &NB_SYNC_MUTEX, end_time - now);
+    } else {
+      cnd_wait(&NB_SYNC_COND, &NB_SYNC_MUTEX);
+    }
+  }
+  mtx_unlock(&NB_SYNC_MUTEX);
+  Py_END_ALLOW_THREADS
+
+      if (ok) {
+    Py_RETURN_TRUE;
+  }
+  Py_RETURN_FALSE;
+}
+
+// ---------------------------------------------------------------------------
+// Terminator entry points.
+// ---------------------------------------------------------------------------
+
+/// @brief Try to register a new behavior with the terminator.
+/// @details Returns the post-increment count on success, or -1 if the
+/// terminator is closed (runtime is shutting down). The double-check of
+/// TERMINATOR_CLOSED around the fetch_add closes the close-vs-inc race:
+/// if close() lands between our first check and our fetch_add, the
+/// second check sees it and we undo, signalling on a 0-transition so a
+/// concurrent terminator_wait() does not miss the wakeup. Uses the
+/// portable plain-atomic forms (seq_cst) — see the polyfill block at
+/// the top of this file; the terminator is not on a hot path so the
+/// stronger ordering is free.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Python int — new count on success, -1 if closed.
+static PyObject *_core_terminator_inc(PyObject *self,
+                                      PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  if (atomic_load(&TERMINATOR_CLOSED)) {
+    return PyLong_FromLongLong(-1);
+  }
+  int_least64_t newval = atomic_fetch_add(&TERMINATOR_COUNT, 1) + 1;
+  if (atomic_load(&TERMINATOR_CLOSED)) {
+    int_least64_t after = atomic_fetch_add(&TERMINATOR_COUNT, -1) - 1;
+    if (after == 0) {
+      mtx_lock(&TERMINATOR_MUTEX);
+      cnd_broadcast(&TERMINATOR_COND);
+      mtx_unlock(&TERMINATOR_MUTEX);
+    }
+    return PyLong_FromLongLong(-1);
+  }
+  return PyLong_FromLongLong((long long)newval);
+}
+
+/// @brief Decrement the terminator. Wakes terminator_wait on 0-transition.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Python int — the new count.
+static PyObject *_core_terminator_dec(PyObject *self,
+                                      PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  int_least64_t newval = atomic_fetch_add(&TERMINATOR_COUNT, -1) - 1;
+  if (newval == 0) {
+    mtx_lock(&TERMINATOR_MUTEX);
+    cnd_broadcast(&TERMINATOR_COND);
+    mtx_unlock(&TERMINATOR_MUTEX);
+  }
+  return PyLong_FromLongLong((long long)newval);
+}
+
+/// @brief Set the closed bit. Future terminator_inc() calls return -1.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Py_None
+static PyObject *_core_terminator_close(PyObject *self,
+                                        PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "terminator_close must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+  atomic_store(&TERMINATOR_CLOSED, 1);
+  Py_RETURN_NONE;
+}
+
+/// @brief Block until TERMINATOR_COUNT reaches 0.
+/// @details A negative or @c None timeout means wait forever.
+/// Releases the GIL across the wait.
+/// @param self The module (unused)
+/// @param args A tuple @c (timeout,) — float-or-None
+/// @return @c True on success, @c False on timeout.
+static PyObject *_core_terminator_wait(PyObject *self, PyObject *args) {
+  BOC_STATE_SET(self);
+  PyObject *timeout_obj;
+  if (!PyArg_ParseTuple(args, "O", &timeout_obj)) {
+    return NULL;
+  }
+
+  bool do_timeout = false;
+  double end_time = 0.0;
+  if (timeout_obj != Py_None) {
+    double timeout = PyFloat_AsDouble(timeout_obj);
+    if (timeout == -1.0 && PyErr_Occurred()) {
+      return NULL;
+    }
+    if (timeout >= 0.0) {
+      do_timeout = true;
+      end_time = boc_now_s() + timeout;
+    }
+  }
+
+  bool ok = true;
+  Py_BEGIN_ALLOW_THREADS mtx_lock(&TERMINATOR_MUTEX);
+  while (atomic_load(&TERMINATOR_COUNT) != 0) {
+    if (do_timeout) {
+      double now = boc_now_s();
+      if (now >= end_time) {
+        ok = false;
+        break;
+      }
+      cnd_timedwait_s(&TERMINATOR_COND, &TERMINATOR_MUTEX, end_time - now);
+    } else {
+      cnd_wait(&TERMINATOR_COND, &TERMINATOR_MUTEX);
+    }
+  }
+  mtx_unlock(&TERMINATOR_MUTEX);
+  Py_END_ALLOW_THREADS
+
+      if (ok) {
+    Py_RETURN_TRUE;
+  }
+  Py_RETURN_FALSE;
+}
+
+/// @brief Idempotent one-shot decrement of the Pyrona seed.
+/// @details Called by stop()/wait() to remove the seed that keeps the
+/// terminator count above zero across momentary quiescence. Safe to call
+/// any number of times — only the first call performs the decrement.
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Python bool — True if this call removed the seed, False if
+/// the seed was already removed.
+static PyObject *_core_terminator_seed_dec(PyObject *self,
+                                           PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "terminator_seed_dec must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+  int_least64_t prev = atomic_exchange(&TERMINATOR_SEEDED, 0);
+  if (prev == 1) {
+    int_least64_t newval = atomic_fetch_add(&TERMINATOR_COUNT, -1) - 1;
+    if (newval == 0) {
+      mtx_lock(&TERMINATOR_MUTEX);
+      cnd_broadcast(&TERMINATOR_COND);
+      mtx_unlock(&TERMINATOR_MUTEX);
+    }
+    Py_RETURN_TRUE;
+  }
+  Py_RETURN_FALSE;
+}
+
+/// @brief Restore terminator state for a fresh runtime start.
+/// @details Sets count=1 (the Pyrona seed), clears the closed bit, and
+/// re-arms the seed one-shot. Called from Behaviors.start(). Returns
+/// the prior @c (count, seeded) tuple so callers can detect drift left
+/// over from a previous run that died without reaching its
+/// reconciliation point (e.g. KeyboardInterrupt or stop() that raised
+/// before the assertion).
+/// @param self The module (unused)
+/// @param args Unused
+/// @return A 2-tuple @c (prior_count, prior_seeded).
+static PyObject *_core_terminator_reset(PyObject *self,
+                                        PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  if (BOC_STATE->index != 0) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "terminator_reset must be called from the primary "
+                    "interpreter");
+    return NULL;
+  }
+  // Fence: raise the closed bit before we touch anything else so any
+  // stray thread still holding a reference to the previous runtime
+  // (e.g. a late whencall call) is refused by terminator_inc rather
+  // than slipping a new behavior past the reset boundary. We clear
+  // the bit again at the end, once the new COUNT/SEEDED values have
+  // been published, so a fresh start() sees closed=0.
+  atomic_store(&TERMINATOR_CLOSED, 1);
+  mtx_lock(&TERMINATOR_MUTEX);
+  int_least64_t prior_count = atomic_load(&TERMINATOR_COUNT);
+  int_least64_t prior_seeded = atomic_load(&TERMINATOR_SEEDED);
+  atomic_store(&TERMINATOR_COUNT, 1);
+  atomic_store(&TERMINATOR_SEEDED, 1);
+  atomic_store(&TERMINATOR_CLOSED, 0);
+  cnd_broadcast(&TERMINATOR_COND);
+  mtx_unlock(&TERMINATOR_MUTEX);
+  return Py_BuildValue("(LL)", (long long)prior_count, (long long)prior_seeded);
+}
+
+/// @brief Read the current TERMINATOR_SEEDED flag (for reconciliation).
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Python int — 0 or 1.
+static PyObject *_core_terminator_seeded(PyObject *self,
+                                         PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  return PyLong_FromLongLong((long long)atomic_load(&TERMINATOR_SEEDED));
+}
+
+/// @brief Read the current terminator count (for reconciliation tests).
+/// @param self The module (unused)
+/// @param args Unused
+/// @return Python int — the current TERMINATOR_COUNT.
+static PyObject *_core_terminator_count(PyObject *self,
+                                        PyObject *Py_UNUSED(args)) {
+  BOC_STATE_SET(self);
+  return PyLong_FromLongLong((long long)atomic_load(&TERMINATOR_COUNT));
+}
+
+/// @details This can be safely referenced and used from multiple processes.
+typedef struct boc_cown {
+  int_least64_t id;
+  /// @brief The python object held in this cown.
+  /// @details This is only non-NULL when the cown is acquired.
+  PyObject *value;
+  /// @brief Whether the value is pickled when serialized
+  bool pickled;
+  /// @brief Whether the cown holds an exception object
+  bool exception;
+  /// @brief the threadsafe serialized cown contents
+  XIDATA_T *xidata;
+  /// @brief the module which last released this cown
+  BOCRecycleQueue *recycle_queue;
+  /// @brief The ID of the interpreter that currently has acquired this cown.
+  atomic_int_least64_t owner;
+  /// @brief The last behavior which needs to acquire this cown
+  atomic_intptr_t last; // (BOCBehavior *)
+  /// @brief Atomic reference count for the cown
+  atomic_int_least64_t rc;
+  /// @brief Atomic weak reference count for the cown
+  atomic_int_least64_t weak_rc;
+} BOCCown;
+
+static inline int_least64_t cown_weak_decref(BOCCown *cown) {
+  int_least64_t weak_rc = atomic_fetch_add(&cown->weak_rc, -1) - 1;
+  PRINTDBG("cown_weak_decref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n",
+           cown, cown->id, weak_rc);
+
+  if (weak_rc == 0) {
+    // reference count is truly zero, we can free the memory
+    PyMem_RawFree(cown);
+    BOC_REF_TRACKING_REMOVE_COWN();
+  }
+
+  return weak_rc;
+}
+
+static inline void report_unhandled_exception(BOCCown *cown) {
+  fprintf(stderr, "Cown(%p) contains an unhandled exception: ", cown);
+
+  if (cown->value != NULL) {
+    PyObject_Print(cown->value, stderr, 0);
+    fprintf(stderr, "\n");
+    return;
+  }
+
+  if (cown->xidata == NULL) {
+    fprintf(stderr,
+            "<fatal error: value and xidata are NULL on exception cown>\n");
+    return;
+  }
+
+  cown->value = xidata_to_object(cown->xidata, cown->pickled);
+
+  if (cown->value == NULL) {
+    PyErr_Clear();
+    fprintf(stderr, "<fatal error: unable to deserialize exception>\n");
+    return;
+  }
+
+  PyObject_Print(cown->value, stderr, 0);
+  fprintf(stderr, "\n");
+  return;
+}
+
+static void BOCRecycleQueue_enqueue(BOCRecycleQueue *queue, XIDATA_T *xidata);
+
+/// @brief Atomic decref for the cown
+/// @param cown the cown to decref
+/// @return the new reference count
+static int_least64_t cown_decref(BOCCown *cown) {
+  int_least64_t rc = atomic_fetch_add(&cown->rc, -1) - 1;
+  PRINTDBG("cown_decref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n", cown,
+           cown->id, rc);
+  if (rc != 0) {
+    return rc;
+  }
+
+  PRINTDBG("cleaning cown\n");
+
+  if (cown->exception) {
+    report_unhandled_exception(cown);
+  }
+
+  // we can clear the object and recycle the xidata
+  if (cown->value != NULL) {
+    assert(cown->owner == get_interpid());
+    Py_CLEAR(cown->value);
+  }
+
+  if (cown->xidata != NULL) {
+    BOCRecycleQueue_enqueue(cown->recycle_queue, cown->xidata);
+  }
+
+  cown_weak_decref(cown);
+
+  return 0;
+}
+
+#define COWN_WEAK_DECREF(c) cown_weak_decref(c)
+
+/// @brief Atomic incref for the cown
+/// @param cown the cown to incref
+/// @return the new reference count
+static int_least64_t cown_incref(BOCCown *cown) {
+  int_least64_t rc = atomic_fetch_add(&cown->rc, 1) + 1;
+  PRINTDBG("cown_incref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n", cown,
+           cown->id, rc);
+  return rc;
+}
+
+static inline int_least64_t cown_weak_incref(BOCCown *cown) {
+  int_least64_t rc = atomic_fetch_add(&cown->weak_rc, 1) + 1;
+  PRINTDBG("cown_weak_incref(%p, cid=%" PRIdLEAST64 ") = %" PRIdLEAST64 "\n",
+           cown, cown->id, rc);
+  return rc;
+}
+
+static inline bool cown_promote(BOCCown *cown) {
+  int_least64_t expected;
+  int_least64_t desired;
+  do {
+    expected = atomic_load(&cown->rc);
+    if (expected == 0) {
+      return false;
+    }
+
+    desired = expected + 1;
+  } while (!atomic_compare_exchange_strong(&cown->rc, &expected, desired));
+
+  return true;
+}
+
+#define COWN_WEAK_INCREF(c) cown_weak_incref((c))
+#define COWN_PROMOTE(c) cown_promote((c))
+
+/// @brief Set the value of a cown, clearing the exception flag
+/// @note Callers that store an exception must set cown->exception = true
+/// after calling this function.
+static inline void cown_set_value(BOCCown *cown, PyObject *value) {
+  if (value == NULL) {
+    Py_XDECREF(cown->value);
+    cown->value = NULL;
+    cown->exception = false;
+    return;
+  }
+
+  Py_XSETREF(cown->value, Py_NewRef(value));
+  cown->exception = false;
+}
+
+/// @brief Create a new BOCCown.
+/// @param value The initial value.
+/// @return A new BOCCown, or NULL on error.
+static BOCCown *BOCCown_new(PyObject *value) {
+  BOCCown *cown = (BOCCown *)PyMem_RawMalloc(sizeof(BOCCown));
+  if (cown == NULL) {
+    PyErr_NoMemory();
     return NULL;
   }
 
@@ -1042,7 +2388,7 @@ static BOCCown *BOCCown_new(PyObject *value) {
   cown->xidata = NULL;
   cown->pickled = false;
   cown->exception = false;
-  atomic_store(&cown->last, 0);
+  atomic_store_intptr(&cown->last, 0);
   // each cown starts with both a strong and weak reference
   // the weak reference will only be decremented when the strong
   // reference count is zero.
@@ -1069,7 +2415,7 @@ static XIDATA_T *BOCRecycleQueue_dequeue(BOCRecycleQueue *queue,
                                          bool wait_for_consistency) {
   BOCRecycleNode *tail = queue->tail;
   intptr_t tail_ptr = (intptr_t)queue->tail;
-  intptr_t next_ptr = atomic_load(&tail->next);
+  intptr_t next_ptr = atomic_load_intptr(&tail->next);
   if (next_ptr == 0) {
     // two possibilities:
     // 1. queue is empty
@@ -1086,7 +2432,7 @@ static XIDATA_T *BOCRecycleQueue_dequeue(BOCRecycleQueue *queue,
 
     // the queue is inconsistent, so we spin/wait for step 3 to complete above
     while (next_ptr == 0) {
-      next_ptr = atomic_load(&tail->next);
+      next_ptr = atomic_load_intptr(&tail->next);
     }
   }
 
@@ -1190,18 +2536,18 @@ static void BOCRecycleQueue_enqueue(BOCRecycleQueue *queue, XIDATA_T *xidata) {
   BOCRecycleNode *node =
       (BOCRecycleNode *)PyMem_RawMalloc(sizeof(BOCRecycleNode));
   node->xidata = NULL;
-  atomic_store(&node->next, 0);
+  atomic_store_intptr(&node->next, 0);
 
   // step 1: swap the new node in as the new head
   intptr_t node_ptr = (intptr_t)node;
-  intptr_t old_head_ptr = atomic_exchange(&queue->head, node_ptr);
+  intptr_t old_head_ptr = atomic_exchange_intptr(&queue->head, node_ptr);
   BOCRecycleNode *old_head = (BOCRecycleNode *)old_head_ptr;
   // queue is now inconsistent
   // step 2: store the data in this node. This node is somewhere inside the
   // queue.
   old_head->xidata = xidata;
   // step 3: connect everything back together
-  atomic_store(&old_head->next, node_ptr);
+  atomic_store_intptr(&old_head->next, node_ptr);
   // queue is consistent
 }
 
@@ -1217,8 +2563,8 @@ static void BOCRecycleQueue_empty(BOCRecycleQueue *queue,
   }
 
   if (wait_for_consistency) {
-    assert((intptr_t)queue->tail == atomic_load(&queue->head));
-    assert(atomic_load(&queue->tail->next) == 0);
+    assert((intptr_t)queue->tail == atomic_load_intptr(&queue->head));
+    assert(atomic_load_intptr(&queue->tail->next) == 0);
   }
 }
 
@@ -1228,7 +2574,7 @@ static void BOCRecycleQueue_empty(BOCRecycleQueue *queue,
 /// @param queue The queue to free
 static void BOCRecycleQueue_free(BOCRecycleQueue *queue) {
   assert(queue->xidata_to_cowns == NULL);
-  if (queue->tail != NULL && atomic_load(&queue->tail->next) != 0) {
+  if (queue->tail != NULL && atomic_load_intptr(&queue->tail->next) != 0) {
     printf("BOC: recycle queue %" PRIdLEAST64 " not empty during finalize\n",
            queue->index);
     BOCRecycleQueue_empty(queue, true);
@@ -1242,11 +2588,10 @@ static void BOCRecycleQueue_free(BOCRecycleQueue *queue) {
 /// @details This capsule allows the cown to be exposed to the Python code
 /// level. There can be any number of them, and the will perform atomic
 /// reference counts on the underlying cown.
-typedef struct cown_capsule_object {
-  PyObject_HEAD
-      /// @brief the actual cown object wrapped by the capsule
-      BOCCown *cown;
-} CownCapsuleObject;
+/// @note The struct is forward-declared near the top of the file (next to
+/// the noticeboard helpers) so @c nb_pin_cowns can extract @c BOCCown
+/// pointers from a Python CownCapsule. This block carries the doc only;
+/// keep the field set in sync with the forward declaration.
 
 /// @brief Deallocates the CownCapsule
 /// @note This will perform an atomic decref on the underlying cown
@@ -1372,6 +2717,11 @@ static PyObject *CownCapsule_get_value(PyObject *op, void *Py_UNUSED(dummy)) {
 
 static int CownCapsule_set_value(PyObject *op, PyObject *value,
                                  void *Py_UNUSED(dummy)) {
+  if (value == NULL) {
+    PyErr_SetString(PyExc_TypeError, "cannot delete value attribute");
+    return -1;
+  }
+
   CownCapsuleObject *self = (CownCapsuleObject *)op;
 
   if (!cown_check_acquired(self->cown, true)) {
@@ -1379,10 +2729,6 @@ static int CownCapsule_set_value(PyObject *op, PyObject *value,
   }
 
   cown_set_value(self->cown, value);
-  if (self->cown->value == NULL) {
-    return -1;
-  }
-
   return 0;
 }
 
@@ -1567,8 +2913,18 @@ static PyObject *CownCapsule_get_impl(PyObject *op, void *Py_UNUSED(dummy)) {
 }
 
 /// @brief Pickle support for CownCapsule
-/// @details Pins the inner BOCCown via COWN_INCREF so it stays alive between
-/// pickle and unpickle. The reconstructor inherits this pin (no extra INCREF).
+/// @details Returns a (reconstructor, (pointer, pid)) tuple. Does NOT take a
+/// COWN_INCREF on the inner BOCCown: the bytes produced by pickling are
+/// dead data, not a reference. The caller is responsible for ensuring the
+/// underlying BOCCown is kept alive between pickling and unpickling. For
+/// transient pickles (send/receive on the message queue), the original
+/// CownCapsule held by the sender provides that liveness; for long-lived
+/// pickles (the noticeboard), the noticeboard layer pins the BOCCown
+/// independently via @ref nb_collect_cowns at write time.
+/// @note An earlier design did COWN_INCREF here as a "pin" and had the
+/// reconstructor inherit it. That assumed a 1-pickle / 1-unpickle pairing
+/// and was broken by the noticeboard, where one write is unpickled by
+/// every reader on every worker.
 /// @param op The CownCapsule object
 /// @param Py_UNUSED (ignored)
 /// @return A tuple (reconstructor, (pointer, pid)) for pickle, or NULL on error
@@ -1576,12 +2932,8 @@ static PyObject *CownCapsule_reduce(PyObject *op, PyObject *Py_UNUSED(dummy)) {
   CownCapsuleObject *self = (CownCapsuleObject *)op;
   BOCCown *cown = self->cown;
 
-  // pin the cown alive through the pickle/unpickle cycle
-  COWN_INCREF(cown);
-
   PyObject *ptr = PyLong_FromVoidPtr(cown);
   if (ptr == NULL) {
-    COWN_DECREF(cown);
     return NULL;
   }
 
@@ -1593,7 +2945,6 @@ static PyObject *CownCapsule_reduce(PyObject *op, PyObject *Py_UNUSED(dummy)) {
   PyObject *pid_obj = PyLong_FromLong(pid);
   if (pid_obj == NULL) {
     Py_DECREF(ptr);
-    COWN_DECREF(cown);
     return NULL;
   }
 
@@ -1601,7 +2952,6 @@ static PyObject *CownCapsule_reduce(PyObject *op, PyObject *Py_UNUSED(dummy)) {
   if (module == NULL) {
     Py_DECREF(pid_obj);
     Py_DECREF(ptr);
-    COWN_DECREF(cown);
     return NULL;
   }
 
@@ -1611,7 +2961,6 @@ static PyObject *CownCapsule_reduce(PyObject *op, PyObject *Py_UNUSED(dummy)) {
   if (reconstructor == NULL) {
     Py_DECREF(pid_obj);
     Py_DECREF(ptr);
-    COWN_DECREF(cown);
     return NULL;
   }
 
@@ -1620,18 +2969,12 @@ static PyObject *CownCapsule_reduce(PyObject *op, PyObject *Py_UNUSED(dummy)) {
   Py_DECREF(pid_obj);
   if (args == NULL) {
     Py_DECREF(reconstructor);
-    COWN_DECREF(cown);
     return NULL;
   }
 
   PyObject *result = PyTuple_Pack(2, reconstructor, args);
   Py_DECREF(reconstructor);
   Py_DECREF(args);
-  if (result == NULL) {
-    COWN_DECREF(cown);
-    return NULL;
-  }
-
   return result;
 }
 
@@ -1646,9 +2989,53 @@ static PyMethodDef CownCapsule_methods[] = {
     {NULL} /* Sentinel */
 };
 
+/// @brief Returns whether the cown holds an unhandled exception
+/// @param op The CownCapsule object
+/// @param Py_UNUSED ignored
+/// @return True if the cown holds an exception, False otherwise
+static PyObject *CownCapsule_get_exception(PyObject *op,
+                                           void *Py_UNUSED(dummy)) {
+  CownCapsuleObject *self = (CownCapsuleObject *)op;
+
+  if (!cown_check_acquired(self->cown, true)) {
+    return NULL;
+  }
+
+  return PyBool_FromLong(self->cown->exception);
+}
+
+/// @brief Sets the exception flag on the cown
+/// @param op The CownCapsule object
+/// @param value A truthy/falsy Python object
+/// @param Py_UNUSED ignored
+/// @return 0 on success, -1 on error
+static int CownCapsule_set_exception(PyObject *op, PyObject *value,
+                                     void *Py_UNUSED(dummy)) {
+  if (value == NULL) {
+    PyErr_SetString(PyExc_TypeError, "cannot delete exception attribute");
+    return -1;
+  }
+
+  CownCapsuleObject *self = (CownCapsuleObject *)op;
+
+  if (!cown_check_acquired(self->cown, true)) {
+    return -1;
+  }
+
+  int truthy = PyObject_IsTrue(value);
+  if (truthy < 0) {
+    return -1;
+  }
+
+  self->cown->exception = (bool)truthy;
+  return 0;
+}
+
 static PyGetSetDef CownCapsule_getset[] = {
     {"value", (getter)CownCapsule_get_value, (setter)CownCapsule_set_value,
      NULL, NULL},
+    {"exception", (getter)CownCapsule_get_exception,
+     (setter)CownCapsule_set_exception, NULL, NULL},
     {"impl", (getter)CownCapsule_get_impl, NULL, NULL, NULL},
     {NULL} /* Sentinel */
 };
@@ -2049,7 +3436,7 @@ static BOCQueue *get_queue_for_tag(PyObject *tag) {
         return NULL;
       }
 
-      atomic_store(&qptr->tag, (intptr_t)qtag);
+      atomic_store_intptr(&qptr->tag, (intptr_t)qtag);
       TAG_INCREF(qtag);
       BOC_STATE->queue_tags[i] = qtag;
       TAG_INCREF(qtag);
@@ -2062,10 +3449,10 @@ static BOCQueue *get_queue_for_tag(PyObject *tag) {
       continue;
     }
 
-    BOCTag *qtag = (BOCTag *)atomic_load(&qptr->tag);
+    BOCTag *qtag = (BOCTag *)atomic_load_intptr(&qptr->tag);
     while (qtag == NULL) {
       // waiting for another interpreter to allocate and assign
-      qtag = (BOCTag *)atomic_load(&qptr->tag);
+      qtag = (BOCTag *)atomic_load_intptr(&qptr->tag);
     }
 
     BOC_STATE->queue_tags[i] = qtag;
@@ -2082,7 +3469,24 @@ static BOCQueue *get_queue_for_tag(PyObject *tag) {
     // not the right queue, keep looking
   }
 
-  // no queue for this tag
+  // No queue for this tag — dump observed slot state to stderr so that
+  // intermittent failures (e.g. memory-ordering races on weak-memory
+  // architectures) leave a forensic trail even in release builds.
+  fprintf(stderr, "[bocpy] get_queue_for_tag: no queue found for tag ");
+  PyObject_Print(tag, stderr, Py_PRINT_RAW);
+  fprintf(stderr, " (interpreter index=%" PRIdLEAST64 ")\n", BOC_STATE->index);
+  qptr = BOC_QUEUES;
+  for (size_t i = 0; i < BOC_QUEUE_COUNT; ++i, ++qptr) {
+    int_least64_t state = atomic_load(&qptr->state);
+    BOCTag *qtag = (BOCTag *)atomic_load_intptr(&qptr->tag);
+    BOCTag *cached = BOC_STATE->queue_tags[i];
+    fprintf(stderr,
+            "[bocpy]   slot %2zu: state=%" PRIdLEAST64
+            " tag=%p tag_str=%s cached=%p cached_str=%s\n",
+            i, state, (void *)qtag, qtag != NULL ? qtag->str : "(null)",
+            (void *)cached, cached != NULL ? cached->str : "(null)");
+  }
+  fflush(stderr);
   return NULL;
 }
 
@@ -2101,12 +3505,12 @@ static BOCMessage *boc_message_new(PyObject *tag, PyObject *contents) {
   BOCQueue *qptr = get_queue_for_tag(tag);
   if (qptr == NULL) {
     PyMem_RawFree(message);
-    PyErr_SetString(PyExc_KeyError,
-                    "No queue available for tag: tag capacity exceeded");
+    PyErr_Format(PyExc_KeyError,
+                 "No queue available for tag %R: tag capacity exceeded", tag);
     return NULL;
   }
 
-  BOCTag *qtag = (BOCTag *)atomic_load(&qptr->tag);
+  BOCTag *qtag = (BOCTag *)atomic_load_intptr(&qptr->tag);
   if (qtag == NULL) {
     // non-assigned tag
     message->tag = tag_from_PyUnicode(tag, qptr);
@@ -2136,6 +3540,25 @@ static BOCMessage *boc_message_new(PyObject *tag, PyObject *contents) {
 }
 
 /// @brief Enqueues a message.
+/// @details The @c boc_worker queue is a fixed-capacity ring
+/// (@c BOC_CAPACITY = 16384 slots). Reaching that bound requires more
+/// than 16k behaviors to be simultaneously runnable but not yet picked
+/// up by any worker -- in practice, only a producer scheduling against
+/// many disjoint cowns far faster than every worker can drain. MCS
+/// chaining keeps behaviors that share a cown out of the queue until
+/// their predecessor releases, so chains do not exhaust capacity.
+///
+/// On overflow this returns -1 without setting a Python exception; the
+/// caller (typically @c behavior_resolve_one) reports the error. Once
+/// a behavior's MCS chains are linked the schedule cannot be undone:
+/// the behavior may still execute later if a predecessor releases and
+/// re-tries the resolve, otherwise its cowns leak until process exit.
+/// A robust fix is a queue redesign (e.g. linked-list MPSC instead of
+/// the fixed-capacity ring) rather than the half-step of producer-side
+/// reservations -- the latter trades a never-observed failure for an
+/// audit surface that silently shrinks queue capacity on any leaked
+/// reservation. If the failure is ever observed in practice, redesign
+/// the queue.
 /// @param module the _core module
 /// @param message the message to enqueue
 /// @return 1 if the message was enqueue, 0 otherwise
@@ -2195,7 +3618,7 @@ static int_least64_t boc_dequeue(PyObject *tag, BOCMessage **message) {
 
   BOCQueue *qptr = get_queue_for_tag(tag);
   if (qptr == NULL) {
-    PyErr_SetString(PyExc_KeyError, "No message queue found for that tag");
+    PyErr_Format(PyExc_KeyError, "No message queue found for tag: %R", tag);
     return -2;
   }
 
@@ -2246,7 +3669,16 @@ static int_least64_t boc_dequeue(PyObject *tag, BOCMessage **message) {
 static double boc_now_s() {
   const double S_PER_NS = 1.0e-9;
   struct timespec ts;
+  // Prefer clock_gettime on POSIX: timespec_get requires macOS 10.15+ while
+  // Python's default macOS deployment target is older, producing an
+  // -Wunguarded-availability-new warning. clock_gettime has been available on
+  // macOS since 10.12. Windows UCRT provides timespec_get but not
+  // clock_gettime, so fall back there.
+#ifdef _WIN32
   timespec_get(&ts, TIME_UTC);
+#else
+  clock_gettime(CLOCK_REALTIME, &ts);
+#endif
   double time = (double)ts.tv_sec;
   time += ts.tv_nsec * S_PER_NS;
   return time;
@@ -2354,7 +3786,7 @@ static PyObject *receive_single_tag(PyObject *tag, bool do_timeout,
 
     // Phase 2b: Untimed — park on condvar (indefinite wait)
     if (qptr == NULL) {
-      PyErr_SetString(PyExc_KeyError, "No message queue found for that tag");
+      PyErr_Format(PyExc_KeyError, "No message queue found for tag: %R", tag);
       return NULL;
     }
 
@@ -2658,6 +4090,34 @@ PyObject *_core_drain(PyObject *module, PyObject *args) {
 /// @brief Atomic counter for BOC behaviors
 atomic_int_least64_t BOC_BEHAVIOR_COUNT = 0;
 
+// Forward declaration so BOCBehavior can hold an array of request pointers;
+// the BOCRequest struct itself is defined further down (next to the request
+// helpers).
+struct boc_request;
+
+/// @brief Encapsulates a behavior's request for a cown.
+/// @details Hoisted ahead of BOCBehavior so the latter can carry a sized
+/// array of these. The actual helpers live further down with the rest of
+/// the request lifecycle code.
+typedef struct boc_request {
+  /// @brief The cown that has been requested
+  BOCCown *target;
+  /// @brief The ID of the next behavior
+  atomic_intptr_t next;
+  /// @brief Whether the request has been scheduled
+  atomic_int_least64_t scheduled;
+  /// @brief Atomic reference count.
+  /// @details Starts at 1 (the owner @c BOCBehavior's @c requests array).
+  /// A successor that observes this request as its predecessor during
+  /// @c request_start_enqueue_inner takes a second ref *immediately
+  /// before* publishing @c prev->next, so the predecessor cannot retire
+  /// during the spin on @c prev->scheduled that follows. The owner
+  /// releases its ref from @c behavior_release_all (or @c behavior_free,
+  /// defensively); the successor releases its ref after the spin
+  /// completes. The last drop frees the struct. See @c request_decref.
+  atomic_int_least64_t rc;
+} BOCRequest;
+
 typedef struct behavior_s {
   /// @brief Resource count, set to len(args) + 1
   atomic_int_least64_t count;
@@ -2680,6 +4140,22 @@ typedef struct behavior_s {
   BOCCown **captures;
   /// @brief The number of captured variables
   Py_ssize_t captures_size;
+  /// @brief Owned, deduped, target-sorted request array.
+  /// @details Populated by BehaviorCapsule_create_requests; freed either by
+  /// behavior_release_all (the normal MCS-unlink path) or by behavior_free
+  /// (defensive fallback if the behavior is destroyed without dispatch).
+  struct boc_request **requests;
+  /// @brief Number of entries in @c requests (post-dedup, ≤ args_size + 1).
+  Py_ssize_t requests_size;
+  /// @brief Pre-built dispatch message for the BehaviorCapsule.
+  /// @details Allocated by behavior_prepare_start before the 2PL link loop,
+  /// claimed by the unique caller that observes @c count → 0 inside
+  /// behavior_resolve_one. Targets @c boc_worker directly with the bare
+  /// BehaviorCapsule as the payload. Visibility is carried by the acq-rel
+  /// fetch_sub on @c count — no separate atomic on this field is required.
+  /// Freed defensively by behavior_free if a behavior is destroyed without
+  /// dispatching.
+  struct boc_message *start_message;
 } BOCBehavior;
 
 /// @brief Capsule for holding a pointer to a behavior
@@ -2687,6 +4163,14 @@ typedef struct behavior_capsule_object {
   PyObject_HEAD BOCBehavior *behavior;
 } BehaviorCapsuleObject;
 
+#define BehaviorCapsule_CheckExact(op)                                         \
+  Py_IS_TYPE((op), BOC_STATE->behavior_capsule_type)
+
+// Forward declaration: defined alongside the request helpers further down.
+// behavior_free uses it to clean up any unreleased request array if a
+// behavior is destroyed without going through behavior_release_all.
+static void request_decref(BOCRequest *request);
+
 BOCBehavior *behavior_new() {
   BOCBehavior *behavior;
   behavior = (BOCBehavior *)PyMem_RawMalloc(sizeof(BOCBehavior));
@@ -2704,6 +4188,9 @@ BOCBehavior *behavior_new() {
   behavior->args = NULL;
   behavior->captures_size = 0;
   behavior->captures = NULL;
+  behavior->requests = NULL;
+  behavior->requests_size = 0;
+  behavior->start_message = NULL;
   BOC_REF_TRACKING_ADD_BEHAVIOR();
 
   return behavior;
@@ -2744,6 +4231,29 @@ void behavior_free(BOCBehavior *behavior) {
     PyMem_RawFree(behavior->captures);
   }
 
+  if (behavior->requests != NULL) {
+    // Defensive cleanup: if a behavior is destroyed without
+    // behavior_release_all having been called (e.g. a scheduling failure
+    // mid-2PL), drop the owner ref on each request. If a successor is
+    // still holding a concurrent ref (unlikely here since the behavior
+    // never linked), the free is deferred until that successor's decref.
+    for (Py_ssize_t i = 0; i < behavior->requests_size; ++i) {
+      if (behavior->requests[i] != NULL) {
+        request_decref(behavior->requests[i]);
+      }
+    }
+    PyMem_RawFree(behavior->requests);
+  }
+
+  if (behavior->start_message != NULL) {
+    // Defensive cleanup: prepare_start succeeded but the message was
+    // never claimed (e.g. resolve_one was never called because
+    // schedule() failed mid-link). Free the unclaimed message — it
+    // never made it onto the queue, so this is just our private
+    // allocation.
+    boc_message_free(behavior->start_message);
+  }
+
   if (behavior->thunk != NULL) {
     BOCTag_free(behavior->thunk);
   }
@@ -2939,106 +4449,380 @@ static int BehaviorCapsule_init(PyObject *op, PyObject *args,
     return -1;
   }
 
-  PRINTDBG("BehaviorCapsule(%" PRIdLEAST64 ") adding captures...\n",
-           behavior->id);
+  PRINTDBG("BehaviorCapsule(%" PRIdLEAST64 ") adding captures...\n",
+           behavior->id);
+
+  behavior->captures = add_vars(captures, &behavior->captures_size);
+  if (behavior->captures == NULL) {
+    return -1;
+  }
+
+  // We add two additional counts. One for the result, and another so that
+  // the 2PL is finished before we start running the thunk. Without this,
+  // the calls to release at the end of the thunk could race with the calls to
+  // finish_enqueue in the 2PL.
+  behavior->count = (int_least64_t)(behavior->args_size + 2);
+
+  return 0;
+}
+
+/// @brief Resolves a single outstanding request for this behavior.
+/// @details Called when a request is at the head of the queue for a particular
+/// cown. If this is the last request, then the thunk is scheduled. The unique
+/// caller that observes count -> 0 claims the pre-built start message stashed
+/// by behavior_prepare_start and enqueues it.
+/// Visibility of the start_message pointer is carried by the acq-rel
+/// fetch_sub on count -- the only writer (prepare_start) ran before the link
+/// loop began, and only one decrementer can transition to 0. This path
+/// performs no allocation and therefore cannot fail past prepare.
+///
+/// Returns @c int rather than @c PyObject* so the count > 0 path is
+/// pure-atomic and can be invoked from inside a @c Py_BEGIN_ALLOW_THREADS
+/// span (no @c Py_RETURN_NONE = no Py_None refcount touch). The only
+/// Python-state operation remaining is @c PyErr_SetString on the
+/// @c boc_enqueue-full error path; that path requires @c count == 0 which
+/// is unreachable mid link-loop because @c BehaviorCapsule_init sizes
+/// @c count to @c args_size + 2. Callers that hit the error path must hold
+/// the GIL.
+///
+/// If @c boc_enqueue overflows the @c boc_worker ring, this raises
+/// @c RuntimeError("Message queue is full"); see @c boc_enqueue for the
+/// queue-full failure mode and recovery analysis.
+/// @param behavior the behavior whose count to decrement
+/// @return 0 on success, -1 on error with a Python exception set (caller
+///         must hold the GIL on the error path)
+static int behavior_resolve_one(BOCBehavior *behavior) {
+  int_least64_t count = atomic_fetch_add(&behavior->count, -1) - 1;
+  if (count == 0) {
+    BOCMessage *message = behavior->start_message;
+    behavior->start_message = NULL;
+    if (message == NULL) {
+      // Defensive: prepare_start was never called. This should not happen
+      // on the production path; raise so the failure is loud.
+      PyErr_SetString(PyExc_RuntimeError,
+                      "behavior_resolve_one: start message not prepared");
+      return -1;
+    }
+
+    if (boc_enqueue(message) < 0) {
+      boc_message_free(message);
+      PyErr_SetString(PyExc_RuntimeError, "Message queue is full");
+      return -1;
+    }
+  }
+
+  return 0;
+}
+
+/// @brief Pre-allocate the dispatch message for the BehaviorCapsule.
+/// @details Performs every fallible operation up front so the subsequent 2PL
+/// link loop is infallible. On success, the
+/// message is stashed on behavior->start_message and consumed by the unique
+/// caller that drives behavior->count to 0 in behavior_resolve_one. On
+/// failure, no state is published -- the caller (whencall) rolls back the
+/// terminator. Dispatch goes directly to @c boc_worker carrying the
+/// bare BehaviorCapsule (no @c ("start", ...) tuple, no central scheduler hop).
+/// @param behavior The behavior to prepare
+/// @return 0 on success, -1 on failure with a Python exception set
+static int behavior_prepare_start(BOCBehavior *behavior) {
+  if (behavior->start_message != NULL) {
+    PyErr_SetString(PyExc_RuntimeError, "behavior_prepare_start called twice");
+    return -1;
+  }
+
+  // Wrap the BOCBehavior in a fresh BehaviorCapsule. The queue's XIData
+  // layer will keep this object alive until the message is consumed.
+  PyTypeObject *type = BOC_STATE->behavior_capsule_type;
+  BehaviorCapsuleObject *capsule =
+      (BehaviorCapsuleObject *)type->tp_alloc(type, 0);
+  if (capsule == NULL) {
+    return -1;
+  }
+  capsule->behavior = behavior;
+  BEHAVIOR_INCREF(behavior);
+
+  // Dispatch the BehaviorCapsule directly to a worker. Workers match
+  // ["boc_worker", behavior] and run it. The capsule is the message
+  // payload; the queue's XIData layer keeps it alive in flight.
+  PyObject *contents = (PyObject *)capsule; // borrow the new reference
+  PyObject *tag = PyUnicode_FromString("boc_worker");
+  if (tag == NULL) {
+    Py_DECREF(capsule);
+    return -1;
+  }
+
+  BOCMessage *message = boc_message_new(tag, contents);
+  Py_DECREF(capsule);
+  Py_DECREF(tag);
+  if (message == NULL) {
+    return -1;
+  }
+
+  behavior->start_message = message;
+  return 0;
+}
+
+static PyObject *request_wrap_borrowed(BOCRequest *request);
+static BOCRequest *request_new_inner(BOCCown *cown);
+static int request_release_inner(BOCRequest *request);
+static int request_start_enqueue_inner(BOCRequest *request,
+                                       BOCBehavior *behavior);
+static void request_finish_enqueue_inner(BOCRequest *request);
+
+/// @brief Comparator for qsort: order requests by target cown pointer.
+/// @param a Pointer to a BOCRequest *
+/// @param b Pointer to a BOCRequest *
+/// @return Negative / zero / positive per the cown pointer ordering
+static int request_cmp_target(const void *a, const void *b) {
+  BOCRequest *ra = *(BOCRequest *const *)a;
+  BOCRequest *rb = *(BOCRequest *const *)b;
+  if (ra->target < rb->target) {
+    return -1;
+  }
+  if (ra->target > rb->target) {
+    return 1;
+  }
+  return 0;
+}
+
+/// @brief Build the deduped, target-sorted request array for this behavior.
+/// @details Allocates @c behavior->requests (owned by the BOCBehavior;
+/// freed by @c behavior_release_all on the normal path or @c behavior_free
+/// defensively) and returns a Python list of non-owning PyCapsules pointing
+/// into that array. Duplicate requests targeting the same cown are dropped
+/// and compensated for via @c behavior_resolve_one — the count was sized
+/// for the original args list and the dropped requests would never enter
+/// the MCS queue. Sorting in C ensures the Python @c Behavior.schedule()
+/// 2PL loop walks requests in deterministic cown order without a
+/// Python-level sort.
+/// @param op The BehaviorCapsule
+/// @return A list of borrowed-pointer PyCapsules in MCS-enqueue order
+static PyObject *BehaviorCapsule_create_requests(PyObject *op,
+                                                 PyObject *Py_UNUSED(dummy)) {
+  BehaviorCapsuleObject *self = (BehaviorCapsuleObject *)op;
+  BOCBehavior *behavior = self->behavior;
 
-  behavior->captures = add_vars(captures, &behavior->captures_size);
-  if (behavior->captures == NULL) {
-    return -1;
+  if (behavior->requests != NULL) {
+    PyErr_SetString(PyExc_RuntimeError,
+                    "create_requests called twice on the same behavior");
+    return NULL;
   }
 
-  // We add two additional counts. One for the result, and another so that
-  // the 2PL is finished before we start running the thunk. Without this,
-  // the calls to release at the end of the thunk could race with the calls to
-  // finish_enqueue in the 2PL.
-  behavior->count = (int_least64_t)(behavior->args_size + 2);
+  Py_ssize_t max_size = behavior->args_size + 1;
+  BOCRequest **requests =
+      (BOCRequest **)PyMem_RawCalloc((size_t)max_size, sizeof(BOCRequest *));
+  if (requests == NULL) {
+    PyErr_NoMemory();
+    return NULL;
+  }
 
-  return 0;
-}
+  // Result cown always gets a request (it cannot collide with any args
+  // cown — args cowns are user-visible, the result cown is fresh).
+  BOCRequest *result_request = request_new_inner(behavior->result);
+  if (result_request == NULL) {
+    PyMem_RawFree(requests);
+    return NULL;
+  }
+  requests[0] = result_request;
+  Py_ssize_t count = 1;
 
-/// @brief Resolves a single outstanding request for this behavior.
-/// @details Called when a request is at the head of the queue for a particular
-/// cown. If this is the last request, then the thunk is scheduled.
-/// @param module the _core module
-/// @param behavior the behavior capsule
-/// @return None on success, NULL on error
-static PyObject *behavior_resolve_one(BOCBehavior *behavior) {
-  int_least64_t count = atomic_fetch_add(&behavior->count, -1) - 1;
-  if (count == 0) {
-    // send a message to the scheduler that this behavior can start
-    PyObject *contents = Py_BuildValue("(si)", "start", behavior->id);
-    if (contents == NULL) {
-      return NULL;
+  BOCCown **ptr = behavior->args;
+  for (Py_ssize_t i = 0; i < behavior->args_size; ++i, ++ptr) {
+    BOCCown *cown = *ptr;
+    // Linear dedup against the existing entries. args_size is small in
+    // practice (bounded by the cown count of a single @when call), so
+    // O(n^2) here is fine.
+    bool seen = false;
+    for (Py_ssize_t j = 1; j < count; ++j) {
+      if (requests[j]->target == cown) {
+        seen = true;
+        break;
+      }
     }
 
-    PyObject *tag = PyUnicode_FromString("boc_behavior");
-    if (tag == NULL) {
-      return NULL;
+    if (seen) {
+      // Compensate behavior->count for the duplicate that won't enter
+      // the MCS queue (and therefore won't call resolve_one itself).
+      if (behavior_resolve_one(behavior) < 0) {
+        for (Py_ssize_t k = 0; k < count; ++k) {
+          request_decref(requests[k]);
+        }
+        PyMem_RawFree(requests);
+        return NULL;
+      }
+      continue;
     }
 
-    BOCMessage *message = boc_message_new(tag, contents);
-    Py_DECREF(contents);
-    Py_DECREF(tag);
-
-    if (message == NULL) {
+    BOCRequest *request = request_new_inner(cown);
+    if (request == NULL) {
+      for (Py_ssize_t k = 0; k < count; ++k) {
+        request_decref(requests[k]);
+      }
+      PyMem_RawFree(requests);
       return NULL;
     }
+    requests[count++] = request;
+  }
 
-    if (boc_enqueue(message) < 0) {
-      PyErr_SetString(PyExc_RuntimeError, "Message queue is full");
+  // Sort by target so the 2PL enqueue order is deterministic.
+  qsort(requests, (size_t)count, sizeof(BOCRequest *), request_cmp_target);
+
+  // Hand ownership of the array to the BOCBehavior.
+  behavior->requests = requests;
+  behavior->requests_size = count;
+
+  PyObject *list = PyList_New(count);
+  if (list == NULL) {
+    // Ownership has already been transferred — behavior_free (or
+    // behavior_release_all if the caller still tries to dispatch) will
+    // clean up.
+    return NULL;
+  }
+
+  for (Py_ssize_t i = 0; i < count; ++i) {
+    PyObject *capsule = request_wrap_borrowed(requests[i]);
+    if (capsule == NULL) {
+      Py_DECREF(list);
       return NULL;
     }
+    PyList_SET_ITEM(list, i, capsule);
   }
 
-  Py_RETURN_NONE;
-}
-
-static PyObject *BehaviorCapsule_bid(PyObject *op, PyObject *Py_UNUSED(dummy)) {
-  BehaviorCapsuleObject *self = (BehaviorCapsuleObject *)op;
-  return PyLong_FromLongLong(self->behavior->id);
+  return list;
 }
 
-static PyObject *BehaviorCapsule_thunk(PyObject *op,
-                                       PyObject *Py_UNUSED(dummy)) {
-  BehaviorCapsuleObject *self = (BehaviorCapsuleObject *)op;
-  return tag_to_PyUnicode(self->behavior->thunk);
-}
+/// @brief Release every request the behavior owns and free the array.
+/// @details Walks @c behavior->requests, calling @c request_release_inner
+/// (MCS unlink + handoff to next behavior) on each, then frees the
+/// per-request structs and the array itself. Invoked by the worker's
+/// release arm in place of the per-request Python @c Request.release loop.
+/// @param op The BehaviorCapsule whose requests should be released
+/// @return Py_None on success, NULL on error
+static PyObject *BehaviorCapsule_release_all(PyObject *op,
+                                             PyObject *Py_UNUSED(dummy)) {
+  BehaviorCapsuleObject *capsule = (BehaviorCapsuleObject *)op;
+  BOCBehavior *behavior = capsule->behavior;
 
-static PyObject *request_new(BOCCown *cown);
+  if (behavior->requests == NULL) {
+    Py_RETURN_NONE;
+  }
 
-static PyObject *BehaviorCapsule_create_requests(PyObject *op,
-                                                 PyObject *Py_UNUSED(dummy)) {
-  BehaviorCapsuleObject *self = (BehaviorCapsuleObject *)op;
-  BOCBehavior *behavior = self->behavior;
+  // Detach the array from the behavior up front so behavior_free's
+  // defensive cleanup will not double-free if anything below raises.
+  BOCRequest **requests = behavior->requests;
+  Py_ssize_t requests_size = behavior->requests_size;
+  behavior->requests = NULL;
+  behavior->requests_size = 0;
 
-  PyObject *list = PyList_New(self->behavior->args_size + 1);
-  if (list == NULL) {
-    return NULL;
+  for (Py_ssize_t i = 0; i < requests_size; ++i) {
+    if (request_release_inner(requests[i]) < 0) {
+      // Free the rest of the array even on error to limit the leak.
+      for (Py_ssize_t k = i; k < requests_size; ++k) {
+        request_decref(requests[k]);
+      }
+      PyMem_RawFree(requests);
+      return NULL;
+    }
+    request_decref(requests[i]);
   }
 
-  PyObject *capsule = request_new(self->behavior->result);
-  if (capsule == NULL) {
+  PyMem_RawFree(requests);
+  Py_RETURN_NONE;
+}
+
+/// @brief Schedule a behavior: prepare-then-link, infallible past prepare.
+/// @details Two-phase locking entry point that consolidates create_requests,
+/// prepare_start, and the link/finish loops into one C call.
+/// All allocations happen before the first
+/// MCS link op, so failures cannot leave the cown queues in a partial
+/// state. The Python @c Behavior.schedule() collapses to a single call to
+/// this function.
+/// @param op The BehaviorCapsule to schedule
+/// @return Py_None on success, NULL on error
+static PyObject *BehaviorCapsule_schedule(PyObject *op,
+                                          PyObject *Py_UNUSED(dummy)) {
+  BehaviorCapsuleObject *capsule = (BehaviorCapsuleObject *)op;
+  BOCBehavior *behavior = capsule->behavior;
+
+  // Build the request array if it has not already been built (e.g. by an
+  // external caller having invoked create_requests first). create_requests
+  // is idempotent only via its own guard; here we just skip if populated.
+  if (behavior->requests == NULL) {
+    PyObject *list = BehaviorCapsule_create_requests(op, NULL);
+    if (list == NULL) {
+      return NULL;
+    }
     Py_DECREF(list);
+  }
+
+  // Pre-allocate the start message. From this point onwards the link loop
+  // is infallible: no Python allocation, no callbacks.
+  if (behavior_prepare_start(behavior) < 0) {
     return NULL;
   }
 
-  PyList_SET_ITEM(list, 0, capsule);
+  BOCRequest **requests = behavior->requests;
+  Py_ssize_t n = behavior->requests_size;
+
+  // Drop the GIL across the pure-atomic 2PL link/finish span. The
+  // inner ops (atomic_exchange on target->last, atomic_store on prev->next,
+  // BEHAVIOR_INCREF, the spin on prev->scheduled, behavior_resolve_one's
+  // count decrement) touch no Python state. behavior_resolve_one was made
+  // int-returning specifically so it has no Py_RETURN_NONE on the hot path.
+  //
+  // The only Python-state operation reachable from the inner code is the
+  // PyErr_SetString / boc_message_free pair on the count==0 + queue-full
+  // branch. count is sized args_size + 2 by BehaviorCapsule_init, and the
+  // link loop applies at most args_size decrements, so count >= 2 on every
+  // iteration -- the count==0 branch is unreachable here. The final
+  // behavior_resolve_one below runs UNDER the GIL and may legitimately
+  // hit that branch (queue full); it remains the only PyErr surface.
+  bool ok = true;
+  Py_BEGIN_ALLOW_THREADS for (Py_ssize_t i = 0; i < n; ++i) {
+    // Phase 1: link this request into its cown's MCS queue. The only
+    // failure mode is the unreachable PyErr path documented above; if it
+    // somehow fires, surface it as a generic error after re-acquiring
+    // the GIL (we cannot raise here).
+    if (request_start_enqueue_inner(requests[i], behavior) < 0) {
+      ok = false;
+      break;
+    }
+  }
+  if (ok) {
+    // Phase 2: mark each request scheduled. Pure atomic stores; releases
+    // the spin in any successor that started linking concurrently.
+    for (Py_ssize_t i = 0; i < n; ++i) {
+      request_finish_enqueue_inner(requests[i]);
+    }
+  }
+  Py_END_ALLOW_THREADS
 
-  BOCCown **ptr = behavior->args;
-  for (Py_ssize_t i = 1; i <= self->behavior->args_size; ++i, ++ptr) {
-    capsule = request_new(*ptr);
-    if (capsule == NULL) {
-      Py_DECREF(list);
-      return NULL;
+      if (!ok) {
+    if (!PyErr_Occurred()) {
+      PyErr_SetString(PyExc_RuntimeError,
+                      "behavior_schedule: link phase failed");
     }
+    return NULL;
+  }
 
-    PyList_SET_ITEM(list, i, capsule);
+  // Final resolve_one to account for the +1 the constructor added so
+  // dispatch waits for the 2PL to complete (see BehaviorCapsule_init).
+  // Runs UNDER the GIL: it is the legitimate dispatcher of the start
+  // message and may set a Python exception on a queue-full failure.
+  if (behavior_resolve_one(behavior) < 0) {
+    return NULL;
   }
 
-  return list;
+  Py_RETURN_NONE;
 }
 
-static PyObject *BehaviorCapsule_set_result(PyObject *op, PyObject *args) {
+/// @brief Store an exception as the behavior's result
+/// @details Sets the result value and marks the exception flag. Intended for
+/// the worker exception handler.
+/// @param op The BehaviorCapsule object
+/// @param args The exception value
+/// @return Py_None on success, NULL on error
+static PyObject *BehaviorCapsule_set_exception(PyObject *op, PyObject *args) {
   PyObject *value = NULL;
 
   if (!PyArg_ParseTuple(args, "O", &value)) {
@@ -3048,19 +4832,10 @@ static PyObject *BehaviorCapsule_set_result(PyObject *op, PyObject *args) {
   BehaviorCapsuleObject *self = (BehaviorCapsuleObject *)op;
   BOCBehavior *behavior = self->behavior;
   cown_set_value(behavior->result, value);
+  behavior->result->exception = true;
   Py_RETURN_NONE;
 }
 
-/// @brief Resolves a single outstanding request for this behavior.
-/// @param module The _core module
-/// @param args The behavior capsule
-/// @return None on success, NULL on error
-static PyObject *BehaviorCapsule_resolve_one(PyObject *op,
-                                             PyObject *Py_UNUSED(dummy)) {
-  BehaviorCapsuleObject *self = (BehaviorCapsuleObject *)op;
-  return behavior_resolve_one(self->behavior);
-}
-
 static int acquire_vars(BOCCown **vars, Py_ssize_t size) {
   BOCCown **ptr = vars;
   for (Py_ssize_t i = 0; i < size; ++i, ++ptr) {
@@ -3249,7 +5024,10 @@ static PyObject *BehaviorCapsule_execute(PyObject *op, PyObject *args) {
     result = capsule;
   }
 
+  bool is_error = false;
+
   if (result == NULL) {
+    is_error = true;
     result = PyErr_GetRaisedException();
     if (result == NULL) {
       result = PyObject_CallFunction(PyExc_RuntimeError, "s",
@@ -3258,17 +5036,18 @@ static PyObject *BehaviorCapsule_execute(PyObject *op, PyObject *args) {
   }
 
   cown_set_value(behavior->result, result);
+  if (is_error) {
+    behavior->result->exception = true;
+  }
   return behavior->result->value;
 }
 
 static PyMethodDef BehaviorCapsule_methods[] = {
-    {"bid", BehaviorCapsule_bid, METH_NOARGS, NULL},
-    {"thunk", BehaviorCapsule_thunk, METH_NOARGS, NULL},
-    {"create_requests", BehaviorCapsule_create_requests, METH_NOARGS, NULL},
-    {"resolve_one", BehaviorCapsule_resolve_one, METH_NOARGS, NULL},
-    {"set_result", BehaviorCapsule_set_result, METH_VARARGS, NULL},
+    {"set_exception", BehaviorCapsule_set_exception, METH_VARARGS, NULL},
     {"acquire", BehaviorCapsule_acquire, METH_NOARGS, NULL},
     {"release", BehaviorCapsule_release, METH_NOARGS, NULL},
+    {"release_all", BehaviorCapsule_release_all, METH_NOARGS, NULL},
+    {"schedule", BehaviorCapsule_schedule, METH_NOARGS, NULL},
     {"execute", BehaviorCapsule_execute, METH_VARARGS, NULL},
     {NULL} /* Sentinel */
 };
@@ -3288,9 +5067,6 @@ static PyType_Spec BehaviorCapsule_Spec = {
     .flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_IMMUTABLETYPE,
     .slots = BehaviorCapsule_slots};
 
-#define BehaviorCapsule_CheckExact(op)                                         \
-  Py_IS_TYPE((op), BOC_STATE->behavior_capsule_type)
-
 static PyObject *_new_behavior_object(XIDATA_T *xidata) {
   BOCBehavior *behavior = (BOCBehavior *)xidata->data;
 
@@ -3324,32 +5100,42 @@ static int _behavior_shared(
   return 0;
 }
 
-/// @brief Encapsulates a behavior's request for a cown
-typedef struct boc_request {
-  /// @brief The cown that has been requested
-  BOCCown *target;
-  /// @brief The ID of the next behavior
-  atomic_intptr_t next;
-  /// @brief Whether the request has been scheduled
-  atomic_int_least64_t scheduled;
-} BOCRequest;
-
-/// @brief Frees the request capsule
-/// @param capsule A capsule containing a request object
-void request_free(PyObject *capsule) {
-  BOCRequest *request =
-      (BOCRequest *)PyCapsule_GetPointer(capsule, "boc_request");
-  PRINTDBG("request_free(%p)\n", request);
+/// @brief Free a BOCRequest's owned references and the struct itself.
+/// @details Private to @c request_decref. Drops the cown ref taken in
+/// @c request_new_inner and the behavior ref taken when a successor was
+/// linked into @c next during @c request_start_enqueue.
+/// @param request The request to free
+static void request_free_inner(BOCRequest *request) {
+  PRINTDBG("request_free_inner(%p)\n", request);
   COWN_DECREF(request->target);
-  BOCBehavior *behavior = (BOCBehavior *)atomic_load(&request->next);
+  BOCBehavior *behavior = (BOCBehavior *)atomic_load_intptr(&request->next);
   if (behavior != NULL) {
     BEHAVIOR_DECREF(behavior);
   }
-
   PyMem_RawFree(request);
 }
 
-PyObject *request_new(BOCCown *cown) {
+/// @brief Drop one reference to a BOCRequest; free on last drop.
+/// @details Starts with the owner ref (@c rc = 1) from @c request_new_inner.
+/// A successor acquires a second ref in @c request_start_enqueue_inner
+/// before reading @c prev->scheduled, so the predecessor cannot be freed
+/// under the successor's spin. See the @c rc field comment on BOCRequest.
+/// @param request The request to decref
+static void request_decref(BOCRequest *request) {
+  int_least64_t newval = atomic_fetch_add(&request->rc, -1) - 1;
+  assert(newval >= 0);
+  if (newval == 0) {
+    request_free_inner(request);
+  }
+}
+
+/// @brief Allocate a new BOCRequest targeting @p cown.
+/// @details Increments the cown's refcount; the request takes ownership of
+/// that reference until @c request_free_inner releases it. Starts with
+/// refcount 1 (the owner ref held by the behavior's @c requests array).
+/// @param cown The cown the request targets
+/// @return A new BOCRequest, or NULL on allocation failure
+static BOCRequest *request_new_inner(BOCCown *cown) {
   BOCRequest *request = (BOCRequest *)PyMem_RawMalloc(sizeof(BOCRequest));
   if (request == NULL) {
     PyErr_NoMemory();
@@ -3357,142 +5143,109 @@ PyObject *request_new(BOCCown *cown) {
   }
 
   request->target = cown;
-  PRINTDBG("request_new(%p)\n", request);
+  PRINTDBG("request_new_inner(%p)\n", request);
   COWN_INCREF(cown);
   request->next = 0;
   request->scheduled = 0;
-  PyObject *capsule =
-      PyCapsule_New((void *)request, "boc_request", request_free);
-  if (capsule == NULL) {
-    COWN_DECREF(cown);
-    return NULL;
-  }
-
-  return capsule;
-}
-
-/// @brief Creates a request for a cown.
-/// @param module The _core module
-/// @param args The CownCapsule object
-/// @return A capsule containing the request
-static PyObject *request_create(PyObject *module, PyObject *args) {
-  BOC_STATE_SET(module);
-
-  PyObject *op;
-
-  if (!PyArg_ParseTuple(args, "O", &op)) {
-    return NULL;
-  }
-
-  BOCCown *cown = cown_unwrap(op);
-  if (cown == NULL) {
-    return NULL;
-  }
-
-  return request_new(cown);
-}
-
-/// @brief Unwraps a request from its capsule.
-/// @param op The Capsule object
-/// @return a reference to a request, or NULL if there was an error
-static BOCRequest *request_unwrap(PyObject *op) {
-  if (!PyCapsule_CheckExact(op)) {
-    PyErr_SetString(PyExc_ValueError, "Expected a capsule");
-    return NULL;
-  }
-
-  return (BOCRequest *)PyCapsule_GetPointer(op, "boc_request");
-}
-
-// Release the cown to the next behavior.
-// This is called when the associated behavior has completed, and thus can
-// allow any waiting behavior to run.
-// If there is no next behavior, then the cown's `last` pointer is set to null.
-
-/// @brief Release the cown to the next behavior.
-/// @details This is called when the associated behavior has completed, and thus
-/// can allow any waiting behavior to run. If there is no next behavior, then
-/// the cown's `last` pointer is set to null.
-/// @param module The _core module
-/// @param args The request to release
-/// @return None if successful, NULL otherwise
-static PyObject *request_release(PyObject *module, PyObject *args) {
-  BOC_STATE_SET(module);
-
-  PyObject *op;
-
-  if (!PyArg_ParseTuple(args, "O", &op)) {
-    return NULL;
-  }
-
-  BOCRequest *request = request_unwrap(op);
-  if (request == NULL) {
-    return NULL;
-  }
-
+  atomic_store(&request->rc, 1);
+  return request;
+}
+
+/// @brief Wrap an existing BOCRequest in a non-owning PyCapsule.
+/// @details The capsule shares the BOCRequest pointer with the C array on
+/// the owning BOCBehavior. The destructor is NULL — the C array is
+/// responsible for freeing the request via @c behavior_release_all.
+/// @param request The request to wrap
+/// @return A new PyCapsule, or NULL on error
+static PyObject *request_wrap_borrowed(BOCRequest *request) {
+  return PyCapsule_New((void *)request, "boc_request", NULL);
+}
+
+/// @brief Release a single request, walking the MCS queue to hand off.
+/// @details Called by @c behavior_release_all on every request in the
+/// behavior's owned array. The request struct itself is NOT freed here —
+/// the caller frees the array as a whole afterwards.
+/// @param request The request to release
+/// @return 0 on success, -1 on error (Python exception set)
+static int request_release_inner(BOCRequest *request) {
   // This code is effectively a MCS-style queue lock release.
-  BOCBehavior *next = (BOCBehavior *)atomic_load(&request->next);
+  BOCBehavior *next = (BOCBehavior *)atomic_load_intptr(&request->next);
   if (next == NULL) {
     intptr_t expected_ptr = (intptr_t)request;
-    if (atomic_compare_exchange_strong(&request->target->last, &expected_ptr,
-                                       0)) {
-      Py_RETURN_NONE;
+    if (atomic_compare_exchange_strong_intptr(&request->target->last,
+                                              &expected_ptr, 0)) {
+      return 0;
     }
   }
 
-  // Wait for the next pointer to be set. The target.last != this request
-  // so this should not take long.
-
-  while (true) {
-    next = (BOCBehavior *)atomic_load(&request->next);
+  // Wait for the next pointer to be set by a successor's
+  // request_start_enqueue_inner. Release the GIL across the spin: the
+  // successor is advancing on another thread, may itself be under
+  // Py_BEGIN_ALLOW_THREADS (see BehaviorCapsule_schedule's link loop),
+  // and should not be blocked here. target->last has already been set
+  // past this request by the successor, so this spin terminates as
+  // soon as the successor's `atomic_store(&prev->next, behavior_ptr)`
+  // is visible. The spin is therefore bounded by another thread's
+  // atomic store; if it failed to terminate the runtime invariants
+  // would already be violated, so there is no useful interrupt to
+  // poll for here.
+  Py_BEGIN_ALLOW_THREADS while (true) {
+    next = (BOCBehavior *)atomic_load_intptr(&request->next);
     if (next) {
       break;
     }
   }
+  Py_END_ALLOW_THREADS
 
-  return behavior_resolve_one(next);
-}
-
-/// @brief Enqueues this request on the cown
-/// @param module The _core module
-/// @param args The request to enqueue, and the associated behavior
-/// @return None if successful, NULL otherwise
-static PyObject *request_start_enqueue(PyObject *module, PyObject *args) {
-  BOC_STATE_SET(module);
-
-  PyObject *op;
-  PyObject *behavior_op;
-
-  if (!PyArg_ParseTuple(args, "OO", &op, &behavior_op)) {
-    return NULL;
-  }
-
-  BOCRequest *request = request_unwrap(op);
-  if (request == NULL) {
-    return NULL;
-  }
-
-  if (!BehaviorCapsule_CheckExact(behavior_op)) {
-    PyErr_SetString(PyExc_TypeError, "Expected a BehaviorCapsule object");
-    return NULL;
+      if (behavior_resolve_one(next) < 0) {
+    return -1;
   }
+  return 0;
+}
 
-  BehaviorCapsuleObject *behavior_capsule =
-      (BehaviorCapsuleObject *)behavior_op;
-  BOCBehavior *behavior = behavior_capsule->behavior;
-
+/// @brief Release the cown to the next behavior.
+/// @details The public release entry point is @c behavior_release_all; the
+/// @c request_release_inner helper above is what walks the MCS queue.
+
+/// @brief Enqueue body called by @c behavior_schedule.
+/// @details Pure C, no Python allocation, no exception. The only failure
+/// surface is propagated by behavior_resolve_one (which can fail if the
+/// queue is full); we return its NULL/non-NULL via int. Callers that have
+/// already pre-allocated the start message via behavior_prepare_start can
+/// treat this as infallible from the link-loop perspective.
+/// @param request The request to enqueue
+/// @param behavior The behavior owning the request
+/// @return 0 on success, -1 on error with a Python exception set
+static int request_start_enqueue_inner(BOCRequest *request,
+                                       BOCBehavior *behavior) {
   intptr_t request_ptr = (intptr_t)request;
-  intptr_t prev_ptr = atomic_exchange(&request->target->last, request_ptr);
+  intptr_t prev_ptr =
+      atomic_exchange_intptr(&request->target->last, request_ptr);
   if (prev_ptr == 0) {
     // there is no prior request queued on the cown, so we can immediately
     // proceed
-    return behavior_resolve_one(behavior);
+    if (behavior_resolve_one(behavior) < 0) {
+      return -1;
+    }
+    return 0;
   }
 
   intptr_t behavior_ptr = (intptr_t)behavior;
   BOCRequest *prev = (BOCRequest *)prev_ptr;
-  assert(atomic_load(&prev->next) == 0);
-  atomic_store(&prev->next, behavior_ptr);
+  // Take a temporary ref on the predecessor request: we are about to
+  // spin on prev->scheduled below, and prev's owning behavior can run
+  // release_all concurrently once we have stored prev->next. Without
+  // this ref, the predecessor could be freed between our store of
+  // prev->next and our next load of prev->scheduled -- a UAF the
+  // distributed-release design must guard against because release runs
+  // on the worker thread, not on the same thread as the link loop. The
+  // matching decref happens after the spin completes. At the moment of
+  // the fetch_add, prev is still in
+  // the MCS queue for this cown (our exchange on target->last showed
+  // prev_ptr there), so prev cannot have been freed yet.
+  atomic_fetch_add(&prev->rc, 1);
+  assert(atomic_load_intptr(&prev->next) == 0);
+  atomic_store_intptr(&prev->next, behavior_ptr);
   PRINTDBG("request->next = bid=%" PRIdLEAST64 "\n", behavior->id);
   BEHAVIOR_INCREF(behavior);
   // wait for the previous request to be scheduled
@@ -3501,48 +5254,23 @@ static PyObject *request_start_enqueue(PyObject *module, PyObject *args) {
       break;
     }
   }
+  // Drop the temporary ref; this may be the final ref if the
+  // predecessor's owner has already run release_all.
+  request_decref(prev);
 
-  Py_RETURN_NONE;
+  return 0;
 }
 
-/// @brief Finalises the scheduling of the request.
-/// @param module The _core module
-/// @param args The request
-/// @return None if successful, NULL otherwise
-static PyObject *request_finish_enqueue(PyObject *module, PyObject *args) {
-  PyObject *op;
-
-  if (!PyArg_ParseTuple(args, "O", &op)) {
-    return NULL;
-  }
-
-  BOCRequest *request = request_unwrap(op);
-  if (request == NULL) {
-    return NULL;
-  }
-
+/// @brief Atomic-only finish of the second 2PL phase.
+/// @details Releases the spin in @c request_start_enqueue_inner waiting on
+/// the predecessor's scheduled flag. Pure atomic store, infallible.
+/// @param request The request to mark scheduled
+static void request_finish_enqueue_inner(BOCRequest *request) {
   atomic_exchange(&request->scheduled, true);
-
-  Py_RETURN_NONE;
-}
-
-static PyObject *request_target(PyObject *module, PyObject *args) {
-  PyObject *op;
-
-  if (!PyArg_ParseTuple(args, "O", &op)) {
-    return NULL;
-  }
-
-  BOCRequest *request = request_unwrap(op);
-  if (request == NULL) {
-    return NULL;
-  }
-
-  return PyLong_FromVoidPtr((void *)request->target);
 }
 
 /// @brief Whether this module is the "primary" module, i.e. the one owned by
-/// the scheduler.
+/// the main interpreter that drives runtime lifecycle.
 /// @param module The module to check
 /// @param Py_UNUSED
 /// @return Whether this is the primary module
@@ -3629,7 +5357,8 @@ static PyObject *_core_set_tags(PyObject *module, PyObject *args) {
 
     if (i >= tags_size) {
       // clear the tags on these unused queues
-      BOCTag *oldtag = (BOCTag *)atomic_exchange(&qptr->tag, (intptr_t)NULL);
+      BOCTag *oldtag =
+          (BOCTag *)atomic_exchange_intptr(&qptr->tag, (intptr_t)NULL);
       if (oldtag != NULL) {
         tag_disable(oldtag);
         TAG_DECREF(oldtag);
@@ -3652,7 +5381,8 @@ static PyObject *_core_set_tags(PyObject *module, PyObject *args) {
     }
 
     // assign a new tag
-    BOCTag *oldtag = (BOCTag *)atomic_exchange(&qptr->tag, (intptr_t)qtag);
+    BOCTag *oldtag =
+        (BOCTag *)atomic_exchange_intptr(&qptr->tag, (intptr_t)qtag);
     TAG_INCREF(qtag);
     if (oldtag != NULL) {
       tag_disable(oldtag);
@@ -3739,19 +5469,97 @@ static PyObject *_cown_capsule_from_pointer(PyObject *module, PyObject *args) {
     return NULL;
   }
 
-  // manually allocate without cown_capsule_wrap to avoid double INCREF;
-  // inherits the pin from __reduce__
+  // Take a fresh strong reference for this capsule. Each unpickle is an
+  // independent live reference to the BOCCown; the dealloc path does the
+  // matching COWN_DECREF. The caller must guarantee the BOCCown is still
+  // alive at this point (see CownCapsule_reduce for the contract).
   PyTypeObject *type = BOC_STATE->cown_capsule_type;
   CownCapsuleObject *capsule = (CownCapsuleObject *)type->tp_alloc(type, 0);
   if (capsule == NULL) {
-    COWN_DECREF(cown);
     return NULL;
   }
 
+  COWN_INCREF(cown);
   capsule->cown = cown;
   return (PyObject *)capsule;
 }
 
+/// @brief Pre-pin a list of CownCapsules and return their pointers as ints.
+/// @details Used by the Python @c notice_write helper to keep every
+/// BOCCown reachable from a noticeboard value alive across the message
+/// queue's pickle/unpickle gap. The writer thread calls this **before**
+/// sending the noticeboard_write message; the returned integer pointers
+/// are sent as part of the message and consumed by
+/// @ref nb_pin_cowns (which transfers ownership into the noticeboard
+/// entry without an extra INCREF). Without this, every CownCapsule in
+/// the value would be reduced to a bare pointer at pickle-time, the
+/// writer behavior would return and drop its CownCapsule wrappers, the
+/// underlying BOCCowns would be freed to recycle memory, and the
+/// receiving worker would unpickle and INCREF dangling pointers.
+/// @param module Unused
+/// @param args Tuple of (pins: sequence of CownCapsule)
+/// @return A new Python list of int (BOCCown* cast to integer) on success,
+///   NULL on error. On error every INCREF taken so far is rolled back.
+static PyObject *_core_cown_pin_pointers(PyObject *module, PyObject *args) {
+  BOC_STATE_SET(module);
+
+  PyObject *seq_arg;
+  if (!PyArg_ParseTuple(args, "O", &seq_arg)) {
+    return NULL;
+  }
+
+  PyObject *seq =
+      PySequence_Fast(seq_arg, "cown_pin_pointers requires a sequence");
+  if (seq == NULL) {
+    return NULL;
+  }
+
+  Py_ssize_t n = PySequence_Fast_GET_SIZE(seq);
+  PyObject *result = PyList_New(n);
+  if (result == NULL) {
+    Py_DECREF(seq);
+    return NULL;
+  }
+
+  PyTypeObject *capsule_type = BOC_STATE->cown_capsule_type;
+  Py_ssize_t i = 0;
+  for (; i < n; i++) {
+    PyObject *item = PySequence_Fast_GET_ITEM(seq, i);
+    if (!PyObject_TypeCheck(item, capsule_type)) {
+      PyErr_SetString(PyExc_TypeError,
+                      "cown_pin_pointers requires CownCapsule objects");
+      goto fail;
+    }
+    BOCCown *cown = ((CownCapsuleObject *)item)->cown;
+    COWN_INCREF(cown);
+    PyObject *ptr = PyLong_FromVoidPtr(cown);
+    if (ptr == NULL) {
+      // Roll back the ref we just took before joining the cleanup loop.
+      COWN_DECREF(cown);
+      goto fail;
+    }
+    PyList_SET_ITEM(result, i, ptr); // steals ref
+  }
+
+  Py_DECREF(seq);
+  return result;
+
+fail:
+  // Drop INCREFs for entries we already pre-pinned (indices 0..i-1).
+  for (Py_ssize_t j = 0; j < i; j++) {
+    PyObject *ptr_obj = PyList_GET_ITEM(result, j);
+    BOCCown *c = (BOCCown *)PyLong_AsVoidPtr(ptr_obj);
+    if (c != NULL) {
+      COWN_DECREF(c);
+    } else {
+      PyErr_Clear();
+    }
+  }
+  Py_DECREF(result);
+  Py_DECREF(seq);
+  return NULL;
+}
+
 static PyMethodDef _core_module_methods[] = {
     {"send", _core_send, METH_VARARGS,
      "send($module, tag, contents, /)\n--\n\nSends a message."},
@@ -3761,11 +5569,6 @@ static PyMethodDef _core_module_methods[] = {
      "Receives a message."},
     {"drain", _core_drain, METH_VARARGS,
      "drain($module, tags, /)\n--\n\nDrains all messages for the given tags."},
-    {"request_create", request_create, METH_VARARGS, NULL},
-    {"request_release", request_release, METH_VARARGS, NULL},
-    {"request_start_enqueue", request_start_enqueue, METH_VARARGS, NULL},
-    {"request_finish_enqueue", request_finish_enqueue, METH_VARARGS, NULL},
-    {"request_target", request_target, METH_VARARGS, NULL},
     {"is_primary", _core_is_primary, METH_NOARGS, NULL},
     {"index", _core_index, METH_NOARGS, NULL},
     {"recycle", _core_recycle, METH_NOARGS, NULL},
@@ -3774,6 +5577,68 @@ static PyMethodDef _core_module_methods[] = {
      "set_tags($module, tags, /)\n--\n\nAssigns tags to message queues."},
     {"_cown_capsule_from_pointer", _cown_capsule_from_pointer, METH_VARARGS,
      NULL},
+    {"cown_pin_pointers", _core_cown_pin_pointers, METH_VARARGS,
+     "cown_pin_pointers($module, pins, /)\n--\n\n"
+     "INCREF each CownCapsule and return raw pointer ints (transfers refs)."},
+    {"noticeboard_write_direct", _core_noticeboard_write_direct, METH_VARARGS,
+     "noticeboard_write_direct($module, key, value, /)"
+     "\n--\n\nWrites a key-value pair to the noticeboard."},
+    {"noticeboard_snapshot", _core_noticeboard_snapshot, METH_NOARGS,
+     "noticeboard_snapshot($module, /)"
+     "\n--\n\nReturns a cached snapshot of the noticeboard as a dict."},
+    {"noticeboard_clear", _core_noticeboard_clear, METH_NOARGS,
+     "noticeboard_clear($module, /)"
+     "\n--\n\nClears all noticeboard entries."},
+    {"noticeboard_delete", _core_noticeboard_delete, METH_VARARGS,
+     "noticeboard_delete($module, key, /)"
+     "\n--\n\nDeletes a single noticeboard entry by key."},
+    {"noticeboard_cache_clear", _core_noticeboard_cache_clear, METH_NOARGS,
+     "noticeboard_cache_clear($module, /)"
+     "\n--\n\nClears the thread-local snapshot cache."},
+    {"noticeboard_version", _core_noticeboard_version, METH_NOARGS,
+     "noticeboard_version($module, /)"
+     "\n--\n\nReturns the global noticeboard version counter."},
+    {"set_noticeboard_thread", _core_set_noticeboard_thread, METH_NOARGS,
+     "set_noticeboard_thread($module, /)"
+     "\n--\n\nRegisters the calling thread as the noticeboard mutator "
+     "thread."},
+    {"clear_noticeboard_thread", _core_clear_noticeboard_thread, METH_NOARGS,
+     "clear_noticeboard_thread($module, /)"
+     "\n--\n\nClears the registered noticeboard mutator thread."},
+    {"notice_sync_request", _core_notice_sync_request, METH_NOARGS,
+     "notice_sync_request($module, /)"
+     "\n--\n\nAllocates a fresh notice_sync sequence number."},
+    {"notice_sync_complete", _core_notice_sync_complete, METH_VARARGS,
+     "notice_sync_complete($module, seq, /)"
+     "\n--\n\nMarks a notice_sync sequence as processed and wakes waiters."},
+    {"notice_sync_wait", _core_notice_sync_wait, METH_VARARGS,
+     "notice_sync_wait($module, seq, timeout, /)"
+     "\n--\n\nBlocks until the given notice_sync sequence is processed."},
+    {"terminator_inc", _core_terminator_inc, METH_NOARGS,
+     "terminator_inc($module, /)"
+     "\n--\n\nIncrement the terminator. Returns new count or -1 if closed."},
+    {"terminator_dec", _core_terminator_dec, METH_NOARGS,
+     "terminator_dec($module, /)"
+     "\n--\n\nDecrement the terminator. Wakes terminator_wait on 0."},
+    {"terminator_close", _core_terminator_close, METH_NOARGS,
+     "terminator_close($module, /)"
+     "\n--\n\nMark the terminator closed; future terminator_inc returns -1."},
+    {"terminator_wait", _core_terminator_wait, METH_VARARGS,
+     "terminator_wait($module, timeout, /)"
+     "\n--\n\nBlock until the terminator count reaches 0 or timeout."},
+    {"terminator_seed_dec", _core_terminator_seed_dec, METH_NOARGS,
+     "terminator_seed_dec($module, /)"
+     "\n--\n\nIdempotent one-shot decrement of the Pyrona seed."},
+    {"terminator_reset", _core_terminator_reset, METH_NOARGS,
+     "terminator_reset($module, /)"
+     "\n--\n\nRestore terminator state for a fresh runtime start. "
+     "Returns the prior (count, seeded) for drift detection."},
+    {"terminator_count", _core_terminator_count, METH_NOARGS,
+     "terminator_count($module, /)"
+     "\n--\n\nRead the current terminator count."},
+    {"terminator_seeded", _core_terminator_seeded, METH_NOARGS,
+     "terminator_seeded($module, /)"
+     "\n--\n\nRead the current terminator SEEDED flag."},
     {NULL} /* Sentinel */
 };
 
@@ -3800,11 +5665,29 @@ static int _core_module_exec(PyObject *module) {
     queue_stub->head = 0;
     queue_stub->tail = NULL;
     queue_stub->next = 0;
-    atomic_store(&BOC_RECYCLE_QUEUE_HEAD, (intptr_t)queue_stub);
+    atomic_store_intptr(&BOC_RECYCLE_QUEUE_HEAD, (intptr_t)queue_stub);
     BOC_RECYCLE_QUEUE_TAIL = queue_stub;
 
+    // Initialize the noticeboard
+    memset(&NB, 0, sizeof(NB));
+    boc_mtx_init(&NB.mutex);
+
+    // Initialize the notice_sync barrier primitives.
+    boc_mtx_init(&NB_SYNC_MUTEX);
+    cnd_init(&NB_SYNC_COND);
+
+    // Initialize the terminator primitives.
+    // The Pyrona seed (count=1, seeded=1) is set by terminator_reset()
+    // when the runtime starts; here we only initialize the kernel objects.
+    boc_mtx_init(&TERMINATOR_MUTEX);
+    cnd_init(&TERMINATOR_COND);
+
 #ifdef BOC_REF_TRACKING
+#ifdef _WIN32
     timespec_get(&BOC_LAST_REF_TRACKING_REPORT, TIME_UTC);
+#else
+    clock_gettime(CLOCK_REALTIME, &BOC_LAST_REF_TRACKING_REPORT);
+#endif
 #endif
   }
 
@@ -3886,6 +5769,9 @@ static int _core_module_clear(PyObject *module) {
   Py_CLEAR(state->behavior_capsule_type);
   // this needs to be cleared here, as it was allocated on this interpreter.
   Py_CLEAR(state->recycle_queue->xidata_to_cowns);
+  // Clear the thread-local snapshot cache so the GC can collect any
+  // reference cycles anchored through the cached dict / proxy.
+  nb_drop_local_cache();
   return 0;
 }
 
@@ -3916,14 +5802,14 @@ void _core_module_free(void *module_ptr) {
       boc_park_destroy(qptr);
 
       if (atomic_load(&qptr->state) == BOC_QUEUE_ASSIGNED) {
-        BOCTag *qtag = (BOCTag *)atomic_load(&qptr->tag);
+        BOCTag *qtag = (BOCTag *)atomic_load_intptr(&qptr->tag);
         assert(qtag->queue == qptr);
         BOCTag_free(qtag);
       }
     }
 
     BOCRecycleQueue *queue = (BOCRecycleQueue *)BOC_RECYCLE_QUEUE_TAIL;
-    while (atomic_load(&queue->next) != 0) {
+    while (atomic_load_intptr(&queue->next) != 0) {
       BOCRecycleQueue *next = (BOCRecycleQueue *)queue->next;
       BOCRecycleQueue_free(queue);
       queue = next;
@@ -3931,7 +5817,32 @@ void _core_module_free(void *module_ptr) {
 
     BOCRecycleQueue_free(queue);
     BOC_RECYCLE_QUEUE_TAIL = NULL;
-    atomic_store(&BOC_RECYCLE_QUEUE_HEAD, 0);
+    atomic_store_intptr(&BOC_RECYCLE_QUEUE_HEAD, 0);
+
+    // Clear the thread-local snapshot cache before freeing entries
+    Py_CLEAR(NB_SNAPSHOT_CACHE);
+
+    // Collect noticeboard entries to free after releasing the mutex.
+    XIDATA_T *nb_to_free[NB_MAX_ENTRIES];
+    int nb_to_free_count = 0;
+
+    mtx_lock(&NB.mutex);
+    for (int i = 0; i < NB.count; i++) {
+      if (NB.entries[i].value != NULL) {
+        nb_to_free[nb_to_free_count++] = NB.entries[i].value;
+        NB.entries[i].value = NULL;
+      }
+    }
+    NB.count = 0;
+    mtx_unlock(&NB.mutex);
+
+    for (int i = 0; i < nb_to_free_count; i++) {
+      XIDATA_FREE(nb_to_free[i]);
+    }
+
+    // Destroy noticeboard mutex
+    mtx_destroy(&NB.mutex);
+
     BOC_REF_TRACKING_REPORT();
   }
 
diff --git a/src/bocpy/behaviors.py b/src/bocpy/behaviors.py
index 7504971..f28b4b9 100644
--- a/src/bocpy/behaviors.py
+++ b/src/bocpy/behaviors.py
@@ -1,4 +1,15 @@
-"""Runtime behaviors and helpers for bocpy's cown-based scheduler."""
+"""Runtime lifecycle and Python-side glue for bocpy's behavior runtime.
+
+This module owns the runtime singleton, the worker-pool launcher, the
+noticeboard thread, and the Python `Cown` / `Behavior` / `@when`
+facades. It does **not** contain a central scheduler thread: scheduling (2PL,
+request linking, dispatch) runs in the caller's thread via
+`_core.BehaviorCapsule.schedule`, and release runs in the worker thread that
+just executed the behavior. The only centralized helper that survives
+is the noticeboard thread, which serializes mutator messages so the
+C-level read-modify-write stays consistent without forcing behaviors
+to block on a mutex.
+"""
 
 import inspect
 import logging
@@ -8,7 +19,9 @@
 import tempfile
 from textwrap import dedent
 import threading
-from typing import Any, Generic, Mapping, Optional, TypeVar, Union
+import time
+from types import MappingProxyType
+from typing import Any, Callable, Generic, Mapping, Optional, TypeVar, Union
 
 from . import _core, set_tags
 from .transpiler import BehaviorInfo, export_main, export_module_from_file
@@ -29,6 +42,29 @@
 
 T = TypeVar("T")
 
+# Sentinel distinguishing "key absent" from "key is None" in noticeboard updates.
+_ABSENT = object()
+
+
+class _RemovedType:
+    """Sentinel returned by notice_update fn to delete the entry."""
+
+    _instance = None
+
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+        return cls._instance
+
+    def __repr__(self):
+        return "REMOVED"
+
+    def __reduce__(self):
+        return (_RemovedType, ())
+
+
+REMOVED = _RemovedType()
+
 
 class Cown(Generic[T]):
     """Lightweight wrapper around the underlying cown capsule."""
@@ -69,13 +105,23 @@ def release(self):
         """Releases the cown."""
         self.impl.release()
 
+    @property
+    def exception(self) -> bool:
+        """Whether the held value is the result of an unhandled exception."""
+        return self.impl.exception
+
+    @exception.setter
+    def exception(self, value: bool):
+        """Set or clear the exception flag."""
+        self.impl.exception = value
+
     @property
     def acquired(self) -> bool:
         """Whether the cown is currently acquired."""
         return self.impl.acquired()
 
     def __lt__(self, other: "Cown") -> bool:
-        """Order by the underying capsule for deterministic ordering."""
+        """Order by the underlying capsule for deterministic ordering."""
         if not isinstance(other, Cown):
             return NotImplemented
 
@@ -101,95 +147,6 @@ def __repr__(self) -> str:
         return repr(self.impl)
 
 
-class Request:
-    """Wrapper for requests produced by behaviors."""
-
-    def __init__(self, impl):
-        """Store the underlying request implementation."""
-        self.impl = impl
-
-    def release(self):
-        """Release the cown to the next behavior.
-
-        This is called when the associated behavior has completed, and thus can
-        allow any waiting behavior to run.
-
-        If there is no next behavior, then the cown's `last` pointer is set to null.
-        """
-        _core.request_release(self.impl)
-
-    def target(self) -> int:
-        """Returns the target cown of the request."""
-        return _core.request_target(self.impl)
-
-    def start_enqueue(self, behavior: "Behavior"):
-        """Start the first phase of the 2PL enqueue operation.
-
-        This enqueues the request onto the cown.  It will only return
-        once any previous behavior on this cown has finished enqueueing
-        on all its required cowns.  This ensures that the 2PL is obeyed.
-        """
-        _core.request_start_enqueue(self.impl, behavior.impl)
-
-    def finish_enqueue(self):
-        """Finish the second phase of the 2PL enqueue operation.
-
-        This will set the scheduled flag, so subsequent behaviors on this
-        cown can continue the 2PL enqueue.
-        """
-        _core.request_finish_enqueue(self.impl)
-
-
-class Behavior:
-    """Behavior that captures the content of a when body.
-
-    It contains all the state required to run the body, and release the cowns
-    when the body has finished.
-    """
-
-    def __init__(self, impl: _core.BehaviorCapsule):
-        """Wrap the capsule and materialize request wrappers."""
-        self.impl = impl
-        self.bid = impl.bid()
-        self.thunk = impl.thunk()
-        seen = set()
-        self.requests = []
-        for req_impl in impl.create_requests():
-            r = Request(req_impl)
-            if r.target() in seen:
-                # Duplicate cown: compensate behavior count since this request
-                # won't enter the MCS queue and thus won't call resolve_one.
-                self.impl.resolve_one()
-            else:
-                seen.add(r.target())
-                self.requests.append(r)
-        self.requests.sort(key=lambda r: r.target())
-
-    def schedule(self):
-        """Schedule the behavior using two-phase locking over requests."""
-        # Complete first phase of 2PL enqueuing on all cowns.
-        for r in self.requests:
-            r.start_enqueue(self)
-
-        # Complete second phase of 2PL enqueuing on all cowns.
-        for r in self.requests:
-            r.finish_enqueue()
-
-        # Resolve the additional request. [See comment in the Constructor]
-        # All the cowns may already be resolved, in which case, this will
-        # schedule the task.
-        self.impl.resolve_one()
-
-    def start(self):
-        """Send the behavior to a worker to execute."""
-        _core.send("boc_worker", self.impl)
-
-    def release(self):
-        """Release all owned requests."""
-        for r in self.requests:
-            r.release()
-
-
 WORKER_MAIN_END = "# END boc_export"
 
 
@@ -197,7 +154,7 @@ class Behaviors:
     """Coordinator that starts workers and schedules behaviors."""
 
     def __init__(self, num_workers: Optional[int], export_dir: Optional[str]):
-        """Creates a new Behaviors scheduler.
+        """Creates a new Behaviors runtime.
 
         :param num_workers: The number of worker interpreters to start.  If
             None, defaults to the number of available cores minus one.
@@ -216,7 +173,20 @@ def __init__(self, num_workers: Optional[int], export_dir: Optional[str]):
         self.behavior_lookup: Mapping[int, BehaviorInfo] = {}
         self.logger = logging.getLogger("behaviors")
         self.logger.debug("behaviors init")
-        self.scheduler = None
+        # The runtime has no central scheduler thread. Caller threads do 2PL
+        # inline (whencall -> behavior_schedule), workers release inline,
+        # and the C-level terminator is the only pending counter.
+        self.noticeboard = None
+        self._noticeboard_start_error: Optional[BaseException] = None
+        # Set to True by stop() once worker shutdown, noticeboard
+        # tear-down, and tempdir cleanup have all completed. The
+        # warned-stop / drain-error raise from stop() happens *after*
+        # this flips, so wait()/__exit__ can use the flag to
+        # distinguish "stop() raised but the runtime is dead -- clear
+        # the global handle" from "stop() raised mid-teardown and the
+        # runtime is still alive -- retain the handle so the caller
+        # can retry stop()".
+        self._teardown_complete = False
         self.final_cowns: tuple[Cown, ...] = ()
         self.bid = 0
 
@@ -308,49 +278,130 @@ def stop_workers(self):
         self.teardown_workers()
         self.logger.debug("workers stopped")
 
-    def start_scheduler(self):
-        """Start the scheduler loop in a dedicated thread."""
-        def scheduler():
-            self.logger.debug("starting the scheduler")
-            behaviors: Mapping[int, Behavior] = {}
-            terminator = 1
-            exception = None
-            self.logger.debug("all workers started, scheduling")
-            while terminator > 0:
-                match _core.receive("boc_behavior"):
-                    case ["boc_behavior", "terminator_decrement"]:
-                        terminator -= 1
-                        self.logger.debug(f"boc_behavior/terminator_decrement({terminator})")
-
-                    case ["boc_behavior", ("release", bid)]:
-                        self.logger.debug(f"boc_behavior/release(bid={bid})")
-                        behaviors[bid].release()
-                        del behaviors[bid]
-                        terminator -= 1
-
-                    case ["boc_behavior", ("schedule", behavior_impl)]:
-                        self.logger.debug("boc_behavior/schedule")
-                        behavior = Behavior(behavior_impl)
-                        terminator += 1
-                        self.logger.debug(f"boc_behavior/schedule(thunk={behavior.thunk})")
-                        # prevent runtime exiting until this has run
-                        behaviors[behavior.bid] = behavior
-                        behavior.schedule()
-                        behavior = None
-                        behavior_impl = None
-
-                    case ["boc_behavior", ("start", bid)]:
-                        self.logger.debug(f"boc_behavior/start(bid={bid})")
-                        behaviors[bid].start()
-
-            if exception:
-                raise exception
-
-        self.scheduler = threading.Thread(target=scheduler)
-        self.scheduler.start()
-
-    def start(self, module: Optional[tuple[str, str]]):
-        """Export the target module and spin up workers and scheduler."""
+    def start_noticeboard(self):
+        """Start the dedicated noticeboard mutator thread.
+
+        The noticeboard intentionally remains message-driven: writers
+        (``notice_write``/``notice_update``/``notice_delete``) are
+        fire-and-forget from the calling behavior, so behaviors never
+        block on the noticeboard mutex. This thread owns the C-level
+        single-writer slot and serves the ``boc_noticeboard`` queue.
+
+        Startup is synchronous: the thread signals readiness only after
+        ``set_noticeboard_thread()`` has successfully claimed the C-level
+        single-writer slot. If the claim fails (e.g. a prior ``stop()``
+        left the slot pinned), the exception is captured and re-raised
+        on the calling thread so the runtime never enters a half-started
+        state where mutations would queue forever with no consumer.
+        """
+        ready = threading.Event()
+        self._noticeboard_start_error = None
+
+        def noticeboard():
+            self.logger.debug("starting the noticeboard thread")
+            # Pin this thread as the only legitimate noticeboard mutator.
+            # The C layer rejects write_direct/delete from any other
+            # thread, eliminating the TOCTOU window in the Python-level
+            # read-modify-write performed by noticeboard_update.
+            try:
+                _core.set_noticeboard_thread()
+            except BaseException as ex:  # noqa: B036
+                # Captured here and re-raised on the starter thread by
+                # start_noticeboard so the runtime fails loudly instead
+                # of silently stranding the noticeboard mutator.
+                self._noticeboard_start_error = ex
+                ready.set()
+                return
+            ready.set()
+            while True:
+                match _core.receive("boc_noticeboard"):
+                    case ["boc_noticeboard", "shutdown"]:
+                        self.logger.debug("boc_noticeboard/shutdown")
+                        return
+
+                    case ["boc_noticeboard", ("noticeboard_write", key, value, cowns)]:
+                        try:
+                            _core.noticeboard_write_direct(key, value, cowns)
+                        except Exception as ex:
+                            self.logger.warning(f"noticeboard_write({key!r}) failed: {ex}")
+
+                    case ["boc_noticeboard", ("noticeboard_update", key, fn, default)]:
+                        try:
+                            # Force a fresh snapshot for this read-modify-write:
+                            # this thread is not a behavior, so the
+                            # default no-polling semantics do not apply here and
+                            # we want to see the latest committed state.
+                            _core.noticeboard_cache_clear()
+                            snap = _core.noticeboard_snapshot()
+                            current = snap.get(key, _ABSENT)
+                            if current is _ABSENT:
+                                current = default
+                            new_value = fn(current)
+                            if new_value is REMOVED:
+                                _core.noticeboard_delete(key)
+                            else:
+                                # write_direct bumps NB_VERSION; other readers'
+                                # caches will revalidate at their next behavior
+                                # boundary. Re-pin any cowns reachable from
+                                # the new value (the previous entry's pins are
+                                # released by write_direct). We are on the
+                                # noticeboard thread here so cown_pin_pointers
+                                # is safe — its INCREFs will be transferred
+                                # into the entry by write_direct.
+                                pin_ptrs = _core.cown_pin_pointers(
+                                    _gather_pins(new_value))
+                                _core.noticeboard_write_direct(
+                                    key, new_value, pin_ptrs)
+                        except Exception as ex:
+                            self.logger.warning(f"noticeboard_update({key!r}) failed: {ex}")
+                        finally:
+                            # Re-arm the version check for any subsequent
+                            # snapshot call from this thread.
+                            _core.noticeboard_cache_clear()
+
+                    case ["boc_noticeboard", ("noticeboard_delete", key)]:
+                        try:
+                            _core.noticeboard_delete(key)
+                        except Exception as ex:
+                            self.logger.warning(f"noticeboard_delete({key!r}) failed: {ex}")
+
+                    case ["boc_noticeboard", ("sync", seq)]:
+                        # Barrier sentinel posted by notice_sync(). Marking
+                        # this sequence complete wakes any caller blocked
+                        # in notice_sync_wait. Because the boc_noticeboard
+                        # tag is FIFO per producer, every write/update/delete
+                        # the caller posted before this sentinel has already
+                        # been processed above by the time we get here.
+                        _core.notice_sync_complete(seq)
+
+        self.noticeboard = threading.Thread(target=noticeboard)
+        self.noticeboard.start()
+        # Block until the thread has either claimed the noticeboard slot
+        # or captured an error. Without this handshake a failed claim
+        # would be invisible: notice_write/update/delete would enqueue
+        # to boc_noticeboard with no consumer, notice_sync() would block
+        # forever, and stop() would observe a non-alive thread and
+        # discard the entire backlog.
+        ready.wait()
+        if self._noticeboard_start_error is not None:
+            err = self._noticeboard_start_error
+            self._noticeboard_start_error = None
+            self.noticeboard.join()
+            raise RuntimeError(
+                "noticeboard thread failed to claim the C-level "
+                "single-writer slot"
+            ) from err
+
+    def start(self, module: Optional[tuple[str, str]] = None):
+        """Export the target module and spin up workers and the noticeboard thread.
+
+        :param module: Optional ``(module_name, source_path)`` tuple
+            identifying the user module to transpile and export.
+            ``None`` (the default) exports ``__main__`` instead, which
+            is the case auto-triggered by the first ``@when`` call in a
+            script.
+        :type module: Optional[tuple[str, str]]
+        """
         path = os.path.join(os.path.dirname(__file__), "worker.py")
 
         with open(path) as file:
@@ -371,7 +422,7 @@ def start(self, module: Optional[tuple[str, str]]):
 
         self.behavior_lookup = export.behaviors
         path = os.path.join(self.export_dir, f"{module_name}.py")
-        with open(path, "w") as file:
+        with open(path, "w", encoding="utf-8") as file:
             file.write(export.code)
 
         main_start = worker_script.find(WORKER_MAIN_END)
@@ -389,21 +440,273 @@ def start(self, module: Optional[tuple[str, str]]):
 
         self.worker_script = worker_script[:main_start] + "\n".join(lines) + worker_script[main_start:]
 
-        set_tags(["boc_behavior", "boc_worker", "boc_cleanup"])
+        set_tags(["boc_behavior", "boc_worker", "boc_cleanup", "boc_noticeboard"])
+        # Bring up workers and the noticeboard thread first. We seed
+        # the C-level terminator only after both succeed so a failure
+        # in start_noticeboard (or anywhere between here and the
+        # terminator_reset below) leaves the terminator in its
+        # post-stop() quiescent state (count=0, seeded=0) and the
+        # next start() can proceed cleanly without a drift diagnostic
+        # firing. On a partial-startup failure we also tear the
+        # workers back down so the subsequent start() is not blocked
+        # by stale shutdown handshakes or dangling sub-interpreters.
         self.start_workers()
-        self.start_scheduler()
+        try:
+            self.start_noticeboard()
+        except BaseException:
+            # Close the terminator first so any sibling thread that
+            # somehow races a whencall during the abort window is
+            # refused at terminator_inc rather than slipping a real
+            # behavior into boc_worker between our shutdown sentinels.
+            # TERMINATOR_CLOSED is 0 on the very first start() of the
+            # process and 1 after any prior stop()/abort; either way,
+            # set it to 1 explicitly. terminator_close() is idempotent.
+            _core.terminator_close()
+            self._abort_workers()
+            raise
+
+        # Arm the C-level terminator (count=1 seed, closed=0, seeded=1).
+        # reset() returns the prior (count, seeded) so we can detect a
+        # previous run that died without reaching its reconciliation
+        # point (KeyboardInterrupt, stop() that raised, etc.). We refuse
+        # to start on drift rather than silently clobbering whatever
+        # state was left behind -- the previous run is still leaking
+        # behaviors or cowns and starting fresh would mask the bug.
+        prior_count, prior_seeded = _core.terminator_reset()
+        if prior_count != 0 or prior_seeded != 0:
+            # We just armed the terminator (count=1, seeded=1, closed=0).
+            # Close it FIRST so any sibling thread that races a
+            # whencall during the abort window is refused before
+            # touching the half-shut-down pool. Then drop our own
+            # seed via terminator_seed_dec so the next start() sees
+            # (count=0, seeded=0) instead of re-firing the same
+            # drift diagnostic forever. Finally tear down workers
+            # and the noticeboard so the next start() can re-spawn
+            # without colliding with the orphans.
+            _core.terminator_close()
+            _core.terminator_seed_dec()
+            self._abort_noticeboard()
+            self._abort_workers()
+            raise RuntimeError(
+                "terminator drift carried over from a previous run "
+                f"(prior_count={prior_count}, prior_seeded={prior_seeded}). "
+                "This indicates a leaked whencall, a stop() that raised "
+                "before reconciliation, or an interrupted teardown. "
+                "Resolve the earlier failure before starting again."
+            )
+
+    def _abort_workers(self):
+        """Tear down the worker pool after a partial-startup failure.
+
+        Sends the same ``("boc_worker", "shutdown")`` / cleanup
+        handshake as :py:meth:`stop_workers` but without the cown
+        round-up, which is unsafe before the runtime is fully alive.
+        Used only on the error path of :py:meth:`start`; on the normal
+        path :py:meth:`stop_workers` performs the equivalent work.
+        """
+        self.logger.debug("aborting workers after failed startup")
+        for _ in range(self.num_workers):
+            _core.send("boc_worker", "shutdown")
+        for _ in range(self.num_workers):
+            try:
+                _, contents = _core.receive("boc_behavior")
+                assert contents == "shutdown"
+            except Exception as ex:
+                self.logger.exception(ex)
+        for _ in range(self.num_workers):
+            _core.send("boc_cleanup", True)
+        self.teardown_workers()
+
+    def _abort_noticeboard(self):
+        """Tear down the noticeboard thread after a startup failure.
+
+        Idempotent: if the thread never started or already exited (the
+        common case when ``start_noticeboard`` raised), this is a
+        no-op aside from clearing the C-level slot.
+        """
+        if self.noticeboard is not None and self.noticeboard.is_alive():
+            try:
+                _core.send("boc_noticeboard", "shutdown")
+            except Exception as ex:
+                self.logger.exception(ex)
+            self.noticeboard.join()
+        try:
+            _core.clear_noticeboard_thread()
+        except Exception as ex:
+            self.logger.exception(ex)
 
     def __enter__(self):
         """Enter context by starting the runtime."""
         self.start()
+        return self
 
     def stop(self, timeout: Optional[float] = None):
-        """Stop scheduler and workers, removing any temp exports."""
-        _core.send("boc_behavior", "terminator_decrement")
-        self.scheduler.join(timeout)
+        """Quiesce all behaviors and tear the runtime down.
+
+        :param timeout: Upper bound on the **quiescence** and
+            **noticeboard-drain** phases (steps 1, 2, and 4 below). The
+            worker shutdown handshake (step 5), orphan-behavior drain,
+            and tempdir cleanup that follow run to completion regardless;
+            ``timeout`` does not bound total ``stop()`` runtime. ``None``
+            means wait forever for quiescence.
+        :type timeout: Optional[float]
+
+        With no central scheduler thread, ``stop()`` drives
+        the C terminator directly. The sequence is:
+
+        1. Drop the seed (idempotent) so quiescence is reachable.
+        2. Block on ``terminator_wait`` until every in-flight
+           behavior has decremented (worker side) and no caller is
+           racing to schedule more.
+        3. Close the terminator. Any later ``whencall`` raises
+           ``RuntimeError("runtime is shutting down")`` from the
+           ``terminator_inc()`` check rather than racing teardown.
+        4. Tear down the noticeboard thread (it must have drained any
+           in-flight messages from the last behaviors before the
+           single-writer slot is released).
+        5. Stop workers and clean up the export tempdir.
+
+        After ``terminator_wait`` returns we assert ``terminator_count
+        == 0 and terminator_seeded == 0``; any non-zero value indicates
+        a bookkeeping bug (a missed decrement, or a scheduling-after-
+        wait that slipped past ``terminator_close``).
+        """
+        # Take down the seed and wait for quiescence. Both
+        # are idempotent so a second stop() / wait() is a no-op.
+        # Compute one deadline up front so each stage gets the *remaining*
+        # budget rather than the original timeout. Without this, a
+        # caller-supplied timeout=T would let terminator_wait, the
+        # noticeboard drain, and stop_workers each consume up to T,
+        # turning the visible upper bound into 3*T.
+        if timeout is None:
+            deadline = None
+        else:
+            deadline = time.monotonic() + timeout
+
+        def _remaining():
+            if deadline is None:
+                return None
+            return max(0.0, deadline - time.monotonic())
+
+        _core.terminator_seed_dec()
+        _core.terminator_wait(_remaining())
+
+        # Post-wait reconciliation. If wait() timed out the count is
+        # still > 0 -- skip the assertion in that case so a partial
+        # teardown does not mask the underlying timeout.
+        c_count = _core.terminator_count()
+        c_seeded = _core.terminator_seeded()
+        quiesced = (c_count == 0 and c_seeded == 0)
+        # Close the terminator unconditionally before any further drain
+        # work. On the clean path this is the documented refusal point;
+        # on the warned path it MUST happen before _drain_orphan_behaviors
+        # so a late whencall caller cannot slip a fresh BehaviorCapsule
+        # into boc_worker between the drain's last receive() and the
+        # cleanup that follows. terminator_close() is idempotent.
+        _core.terminator_close()
+        if not quiesced:
+            self.logger.warning(
+                "stop(): terminator did not reach quiescence "
+                f"(count={c_count}, seeded={c_seeded}). "
+                "This typically means stop() was invoked with a timeout "
+                "that elapsed while behaviors were still in flight."
+            )
+
+        # Drain the noticeboard thread.
+        _core.send("boc_noticeboard", "shutdown")
+        self.noticeboard.join(_remaining())
+        if self.noticeboard.is_alive():
+            # join() timed out. Do not proceed to stop_workers / cleanup:
+            # the noticeboard thread still owns the single-writer slot
+            # and may be holding NB_MUTEX while processing an in-flight
+            # mutation. Tearing workers down under it would be racy.
+            raise RuntimeError(
+                "stop(): noticeboard thread did not shut down within "
+                f"timeout={timeout!r}. The runtime is left running so "
+                "the leak can be diagnosed; a later stop() call may "
+                "succeed once the in-flight mutation completes."
+            )
+        # Shut workers down and reset noticeboard ownership.
         self.stop_workers()
+        # Defensive drain: if stop() entered the "terminator did not
+        # quiesce" branch above (or any late whencall slipped in
+        # between terminator_close and the worker shutdown messages),
+        # behaviors may still sit in boc_worker with their MCS links
+        # pinned. Release them inline so we do not leak cowns on a
+        # warned-only stop, and drop the terminator holds the whencall
+        # callers took. With a clean stop this is a no-op.
+        drain_errors = self._drain_orphan_behaviors()
+        _core.clear_noticeboard_thread()
+        _core.noticeboard_clear()
+        # Teardown is complete: workers are joined, the noticeboard
+        # thread has exited, and the C-level slot is released. The
+        # tempdir cleanup that follows is bookkeeping; if it raises
+        # the runtime is still gone and wait()/__exit__ should null
+        # the global BEHAVIORS handle so the next @when starts fresh
+        # rather than retrying stop() on a dead instance.
+        self._teardown_complete = True
         if os.path.exists(self.export_dir) and self.export_tmp:
-            shutil.rmtree(self.export_dir)
+            try:
+                shutil.rmtree(self.export_dir)
+            except Exception as ex:
+                # An orphan tempdir is annoying but not fatal: log and
+                # continue so the caller observes a normal stop().
+                self.logger.exception(ex)
+        if drain_errors:
+            # Surface the first failure so the caller sees the leak at
+            # the failure site rather than later as a mysterious
+            # deadlock on the affected cowns. The remaining errors
+            # were logged inside _drain_orphan_behaviors.
+            raise RuntimeError(
+                "stop(): release_all failed for "
+                f"{len(drain_errors)} orphan behavior(s) during drain; "
+                "cowns may be leaked"
+            ) from drain_errors[0]
+
+    def _drain_orphan_behaviors(self):
+        """Release any BehaviorCapsules left on ``boc_worker`` post-shutdown.
+
+        Called after :py:meth:`stop_workers`. Each orphan has had its
+        cowns scheduled (MCS links established) but never acquired by
+        a worker. ``release_all`` walks the MCS queues, hands off to any
+        waiting successors, and frees the request array; ``terminator_dec``
+        drops the hold the ``whencall`` caller took before
+        ``behavior_schedule``. The result Cown of each dropped behavior
+        is *not* mutated here: it has already been released (owner
+        ``NO_OWNER``, ``value`` is ``NULL``, ``xidata`` is set), and
+        writing into ``value`` would put it in a state ``cown_acquire``
+        cannot recover from on a subsequent runtime restart.
+
+        :returns: A list of exceptions captured from
+            ``release_all`` failures, or ``[]`` on a clean
+            drain. ``stop()`` re-raises if non-empty so a release-side
+            leak is visible at the failure site rather than later as a
+            mysterious deadlock on the affected cowns.
+        """
+        errors = []
+        while True:
+            msg = _core.receive("boc_worker", timeout=0)
+            if msg[0] == _core.TIMEOUT:
+                return errors
+            payload = msg[1]
+            if isinstance(payload, _core.BehaviorCapsule):
+                self.logger.warning(
+                    "behavior dropped during stop(); the runtime was "
+                    "torn down before this behavior could acquire its cowns"
+                )
+                try:
+                    payload.release_all()
+                except Exception as ex:
+                    self.logger.exception(ex)
+                    errors.append(ex)
+                try:
+                    _core.terminator_dec()
+                except Exception as ex:
+                    self.logger.exception(ex)
+            # Non-capsule payloads (e.g. a stray "shutdown") are silently
+            # ignored. Worker shutdowns balance 1:1 with workers, so a
+            # stray sentinel here would already indicate a bug elsewhere;
+            # the loop body just falls through to the next receive().
 
     def __exit__(self, exc_type, exc_value, traceback):
         """Ensure stop is called on context exit."""
@@ -435,7 +738,20 @@ def whencall(thunk: str, args: list[Union[Cown, list[Cown]]], captures: list[Any
 
     behavior = _core.BehaviorCapsule(thunk, result.impl, cowns, captures)
     logging.debug(f"whencall:behavior=Behavior(thunk={thunk}, result={result}, args={args}, captures={captures})")
-    _core.send("boc_behavior", ("schedule", behavior))
+    # Caller threads run the entire 2PL inline. Register with the
+    # C terminator first so a concurrent stop()/terminator_close() will
+    # refuse the schedule rather than racing teardown. Once the
+    # terminator hold is taken, behavior_schedule is infallible past
+    # prepare; any failure during the prepare phase rolls the hold back.
+    # The matching decrement happens on the worker thread once the
+    # behavior body runs.
+    if _core.terminator_inc() < 0:
+        raise RuntimeError("runtime is shutting down")
+    try:
+        behavior.schedule()
+    except BaseException:
+        _core.terminator_dec()
+        raise
     return result
 
 
@@ -450,7 +766,10 @@ def get_caller_module():
 def start(worker_count: Optional[int] = None,
           export_dir: Optional[str] = None,
           module: Optional[tuple[str, str]] = None):
-    """Start the behavior scheduler and worker pool.
+    """Start the behavior runtime: worker pool plus noticeboard thread.
+
+    The runtime distributes scheduling (2PL link/release) across caller
+    and worker threads; there is no central scheduler thread.
 
     :param worker_count: The number of worker interpreters to start.  If
         None, defaults to the number of available cores minus one.
@@ -465,7 +784,7 @@ def start(worker_count: Optional[int] = None,
     """
     global BEHAVIORS
     if BEHAVIORS is not None:
-        raise RuntimeError("Behavior scheduler already started")
+        raise RuntimeError("Behavior runtime already started")
 
     if worker_count is None:
         worker_count = WORKER_COUNT
@@ -476,7 +795,17 @@ def start(worker_count: Optional[int] = None,
     if module is None:
         module = get_caller_module()
     BEHAVIORS = Behaviors(worker_count, export_dir)
-    BEHAVIORS.start(module)
+    try:
+        BEHAVIORS.start(module)
+    except BaseException:
+        # Failed startup must not leave a half-initialised Behaviors
+        # instance bound globally: the next @when would skip start()
+        # entirely and run against a runtime whose noticeboard thread
+        # never claimed the C-level slot (or whose workers never
+        # spawned). Reset the slot so the caller can retry once the
+        # underlying cause is cleared.
+        BEHAVIORS = None
+        raise
 
 
 def when(*cowns):
@@ -517,6 +846,12 @@ def when_factory(func):
                     found = True
                     break
 
+                if name in frame.f_globals:
+                    val = frame.f_globals[name]
+                    captures.append(val)
+                    found = True
+                    break
+
                 frame = frame.f_back
 
             if not found:
@@ -535,5 +870,340 @@ def wait(timeout: Optional[float] = None):
     """Block until all behaviors complete, with optional timeout."""
     global BEHAVIORS
     if BEHAVIORS:
-        BEHAVIORS.stop(timeout)
+        # Clear BEHAVIORS only if stop() drove the runtime all the
+        # way through teardown (workers joined, noticeboard exited,
+        # tempdir removed). On stop()'s noticeboard-join-timeout path
+        # the runtime is intentionally left running so the caller can
+        # diagnose the leak and retry; nulling the global handle
+        # there would strand the live workers / noticeboard thread
+        # with no Python-side reference.
+        try:
+            BEHAVIORS.stop(timeout)
+        except BaseException:
+            if BEHAVIORS._teardown_complete:
+                BEHAVIORS = None
+            raise
         BEHAVIORS = None
+
+
+def _validate_noticeboard_key(key: str) -> None:
+    """Validate a noticeboard key, raising on invalid input.
+
+    The C layer (noticeboard_write_direct) has its own checks, but we
+    validate here to fail fast on the caller's interpreter.
+    """
+    if not isinstance(key, str):
+        raise TypeError("noticeboard key must be a str")
+    if "\x00" in key:
+        raise ValueError("noticeboard key must not contain NUL characters")
+    if len(key.encode("utf-8")) > 63:
+        raise ValueError("noticeboard key too long (max 63 UTF-8 bytes)")
+
+
+def _require_noticeboard_ready(key: str, operation: str) -> None:
+    """Check that the runtime is running and the key is valid."""
+    if _core.is_primary() and BEHAVIORS is None:
+        raise RuntimeError(f"cannot {operation} the noticeboard before the runtime is started")
+    _validate_noticeboard_key(key)
+
+
+# Container types we recurse into when scanning a noticeboard value for
+# CownCapsules to pin. Custom user objects are also descended via __dict__.
+_NB_CONTAINER_TYPES = (list, tuple, set, frozenset)
+
+
+def _collect_cown_capsules(obj: Any, out: list, seen: set) -> None:
+    """Recursively collect every CownCapsule reachable from *obj*.
+
+    The result is appended to *out* (a list of CownCapsule instances).
+    The noticeboard uses this list to take an independent strong
+    reference on every BOCCown referenced by the serialized bytes, so
+    that the cowns outlive every reader's pickled view regardless of
+    whether the original Cown wrapper is dropped.
+
+    *seen* is a set of object ids used to break reference cycles.
+
+    Recurses into Cown wrappers (extracting ``impl``), built-in
+    containers (list/tuple/set/frozenset/dict), and any other object
+    that exposes a ``__dict__``. Strings and bytes are not descended
+    even though they are sequences.
+    """
+    obj_id = id(obj)
+    if obj_id in seen:
+        return
+    if isinstance(obj, _core.CownCapsule):
+        out.append(obj)
+        seen.add(obj_id)
+        return
+    if isinstance(obj, Cown):
+        out.append(obj.impl)
+        seen.add(obj_id)
+        return
+    if isinstance(obj, (str, bytes, bytearray, int, float, bool, type(None))):
+        # Common leaf types: skip cheaply without recording in `seen`.
+        return
+    seen.add(obj_id)
+    if isinstance(obj, dict):
+        for k, v in obj.items():
+            _collect_cown_capsules(k, out, seen)
+            _collect_cown_capsules(v, out, seen)
+        return
+    if isinstance(obj, _NB_CONTAINER_TYPES):
+        for item in obj:
+            _collect_cown_capsules(item, out, seen)
+        return
+    # Fall back to inspecting attributes for ordinary user classes. Built-in
+    # opaque objects (e.g. compiled regex patterns) have no __dict__ and are
+    # left alone.
+    d = getattr(obj, "__dict__", None)
+    if d is not None:
+        _collect_cown_capsules(d, out, seen)
+    # Walk __slots__ up the MRO: slot-only classes (e.g. @dataclass(slots=True))
+    # have no __dict__ at all, so cowns stored in slot attributes would
+    # otherwise be silently missed and recycled out from under the
+    # noticeboard entry.
+    cls = type(obj)
+    for klass in cls.__mro__:
+        slots = klass.__dict__.get("__slots__")
+        if not slots:
+            continue
+        if isinstance(slots, str):
+            slots = (slots,)
+        for name in slots:
+            # __dict__ and __weakref__ are reserved slot names that
+            # expose the mapping / weakref itself; skip them.
+            if name in ("__dict__", "__weakref__"):
+                continue
+            try:
+                attr = getattr(obj, name)
+            except AttributeError:
+                continue
+            _collect_cown_capsules(attr, out, seen)
+
+
+def _gather_pins(value: Any) -> list:
+    """Return the list of CownCapsules to pin for *value*."""
+    pins: list = []
+    _collect_cown_capsules(value, pins, set())
+    return pins
+
+
+def notice_write(key: str, value: Any) -> None:
+    """Write a value to the noticeboard.
+
+    The write is fire-and-forget: the value is serialized immediately and
+    handed to a dedicated noticeboard thread, which applies it under
+    mutex. ``notice_write`` returns as soon as the message is enqueued.
+
+    **No ordering guarantee.** A subsequent behavior — even one that
+    chains directly off the writer through a shared cown — is *not*
+    guaranteed to observe this write. The noticeboard mutator runs on
+    its own thread and may not have processed the message by the time
+    the next behavior reads. Treat the noticeboard as eventually
+    consistent shared state, never as a synchronization channel between
+    behaviors. Use cowns or ``send``/``receive`` for that.
+
+    The noticeboard supports up to 64 distinct keys.  Writes beyond the
+    limit are not applied; the noticeboard thread catches the resulting
+    error and logs a warning.  No exception propagates to the caller.
+
+    :param key: The noticeboard key (max 63 UTF-8 bytes).
+    :type key: str
+    :param value: The value to store.
+    :type value: Any
+    """
+    _require_noticeboard_ready(key, "write to")
+    # Gather every CownCapsule reachable from `value` so the noticeboard
+    # can take an independent strong reference on each. We pre-pin them
+    # here on the writer thread (cown_pin_pointers does COWN_INCREF on
+    # each and returns the raw pointers as ints). The pointers ride
+    # along in the message; the noticeboard thread transfers ownership
+    # into the noticeboard entry without an extra INCREF. This closes
+    # the window where the writer behavior could return and drop its
+    # pin list before the noticeboard thread dequeues the message —
+    # without pre-pinning the BOCCowns get freed to the recycle pool
+    # and the unpickle of the value's CownCapsules touches dangling
+    # pointers.
+    pin_ptrs = _core.cown_pin_pointers(_gather_pins(value))
+    _core.send("boc_noticeboard", ("noticeboard_write", key, value, pin_ptrs))
+
+
+def notice_update(key: str, fn: Callable[[Any], Any], default: Any = None) -> None:
+    """Atomically update a noticeboard entry.
+
+    Reads the current value for *key* (or *default* if absent), applies
+    *fn* to it, and writes the result back.  The read-modify-write is
+    atomic because the single-threaded noticeboard mutator performs all
+    three steps without interleaving.
+
+    Like :func:`notice_write`, the call is fire-and-forget and carries
+    **no ordering guarantee** with respect to other behaviors. The
+    update is processed on the noticeboard thread; subsequent behaviors
+    may or may not observe the result.
+
+    Both *fn* and *default* must be picklable — they are serialized and
+    sent to the noticeboard thread via the message queue.  Lambdas and
+    closures are **not** picklable; use ``functools.partial`` with a
+    module-level function or an ``operator`` function instead::
+
+        import operator
+        from functools import partial
+        notice_update("total", partial(operator.add, 5), default=0)
+        notice_update("best", partial(max, 42), default=float("-inf"))
+
+    If *fn* raises, the key retains its previous value and a warning is
+    logged by the noticeboard thread.
+
+    **Important:** *fn* runs synchronously on the single-threaded
+    noticeboard mutator.  It must be fast, pure (no side effects), and
+    must not call any bocpy API (``notice_write``, ``send``, ``when``,
+    etc.). A blocking or expensive *fn* will stall every other
+    noticeboard mutation.
+
+    If *fn* returns the ``REMOVED`` sentinel, the entry is deleted from
+    the noticeboard instead of being updated.
+
+    .. warning::
+
+       *fn* and *default* are pickled and sent to the noticeboard thread
+       for execution. **Anyone with permission to call this function can
+       therefore cause arbitrary Python code to run on the noticeboard
+       thread**, which holds the privileged noticeboard-mutator role.
+       In the current threat model bocpy treats all code running in the
+       runtime (primary and sub-interpreters) as equally trusted, so
+       this is no worse than any other cross-interpreter message. If you
+       need to run untrusted behavior code, restrict what can reach
+       ``boc_noticeboard`` and audit callers of :func:`notice_update`.
+
+    :param key: The noticeboard key (max 63 UTF-8 bytes).
+    :type key: str
+    :param fn: A picklable callable taking the current value, returning the new.
+    :type fn: Callable[[Any], Any]
+    :param default: Value used when *key* does not yet exist.
+    :type default: Any
+    """
+    _require_noticeboard_ready(key, "update")
+    if not callable(fn):
+        raise TypeError("notice_update fn must be callable")
+    _core.send("boc_noticeboard", ("noticeboard_update", key, fn, default))
+
+
+def notice_delete(key: str) -> None:
+    """Delete a single noticeboard entry.
+
+    The deletion is fire-and-forget: the request is sent to the
+    noticeboard thread, which removes the entry under mutex.  If the
+    key does not exist, the operation is a no-op. Like
+    :func:`notice_write`, this carries **no ordering guarantee** with
+    respect to other behaviors.
+
+    Alternatively, use ``notice_update`` with a function that returns
+    ``REMOVED`` to conditionally delete an entry based on its current
+    value.
+
+    :param key: The noticeboard key to delete (max 63 UTF-8 bytes).
+    :type key: str
+    """
+    _require_noticeboard_ready(key, "delete from")
+    _core.send("boc_noticeboard", ("noticeboard_delete", key))
+
+
+def noticeboard() -> Mapping[str, Any]:
+    """Return a cached snapshot of the noticeboard.
+
+    Must be called from within a ``@when`` behavior. The first call within a
+    behavior captures all entries under mutex and caches the data.
+    Subsequent calls in the same behavior return a view of the same
+    cached data.
+
+    The returned mapping is read-only.
+
+    Calling from outside a behavior (e.g. the main thread) will return a
+    snapshot that is never refreshed for that thread.
+
+    :return: A read-only mapping of keys to their stored values.
+    :rtype: Mapping[str, Any]
+    """
+    return MappingProxyType(_core.noticeboard_snapshot())
+
+
+def notice_read(key: str, default: Any = None) -> Any:
+    """Read a single key from the noticeboard.
+
+    Must be called from within a ``@when`` behavior. Convenience wrapper
+    that takes a snapshot and returns one value.
+
+    Calling from outside a behavior (e.g. the main thread) will return a
+    snapshot that is never refreshed for that thread.
+
+    :param key: The noticeboard key to read.
+    :type key: str
+    :param default: Value returned when key is absent.
+    :type default: Any
+    :return: The stored value, or *default* if the key does not exist.
+    :rtype: Any
+    """
+    _validate_noticeboard_key(key)
+    return _core.noticeboard_snapshot().get(key, default)
+
+
+def noticeboard_version() -> int:
+    """Return the current noticeboard version counter.
+
+    The counter is incremented every time the noticeboard is
+    successfully written, updated, or cleared. Two reads returning the
+    same value mean no commit happened between them; a strictly larger
+    value means at least one commit happened.
+
+    The counter is global (across all threads and interpreters) and
+    monotonic. Useful as a *hint* for detecting noticeboard changes
+    without taking a full snapshot — for example, polling for any
+    change before deciding whether to refresh a derived view.
+
+    .. note::
+
+       This is *not* a synchronization primitive. Because
+       :func:`notice_write`, :func:`notice_update`, and
+       :func:`notice_delete` are fire-and-forget, the version may not
+       have advanced yet when a behavior that depends on a write
+       observes the noticeboard. For strict read-your-writes ordering,
+       use :func:`notice_sync`.
+
+    :return: The current noticeboard version.
+    :rtype: int
+    """
+    return _core.noticeboard_version()
+
+
+def notice_sync(timeout: Optional[float] = 30.0) -> int:
+    """Block until the caller's prior noticeboard mutations are committed.
+
+    Because :func:`notice_write`, :func:`notice_update`, and
+    :func:`notice_delete` are fire-and-forget, a behavior that wants
+    read-your-writes ordering against a *subsequent* behavior must call
+    ``notice_sync()`` after its writes. The call posts a sentinel onto
+    the ``boc_noticeboard`` tag (which is FIFO per producer) and blocks
+    until the noticeboard thread has drained that sentinel. By the time
+    this returns, every write/update/delete posted from the calling
+    thread before the sentinel has been applied to the noticeboard.
+
+    The barrier carries **no ordering guarantee** with respect to
+    writes posted from other threads or behaviors interleaved with the
+    caller's; it only flushes the caller's own queued mutations.
+
+    :param timeout: Maximum seconds to wait. ``None`` waits forever.
+        Defaults to 30 seconds.
+    :type timeout: float or None
+    :raises TimeoutError: If the noticeboard thread does not drain the
+        caller's sentinel within *timeout* seconds.
+    :raises RuntimeError: If the runtime is not started.
+    :return: The :func:`noticeboard_version` after the flush.
+    :rtype: int
+    """
+    if _core.is_primary() and BEHAVIORS is None:
+        raise RuntimeError("cannot notice_sync before the runtime is started")
+    seq = _core.notice_sync_request()
+    _core.send("boc_noticeboard", ("sync", seq))
+    if not _core.notice_sync_wait(seq, timeout):
+        raise TimeoutError(f"notice_sync({timeout}s) timed out waiting for seq={seq}")
+    return _core.noticeboard_version()
diff --git a/src/bocpy/transpiler.py b/src/bocpy/transpiler.py
index e79a7f0..a84b32c 100644
--- a/src/bocpy/transpiler.py
+++ b/src/bocpy/transpiler.py
@@ -77,16 +77,16 @@ def known_vars(self):
     def visit_Import(self, node: ast.Import):  # noqa: N802
         """Record imported names and keep the node."""
         for name in node.names:
-            self.imports.add(name.name)
+            self.imports.add(name.asname if name.asname else name.name)
 
         return node
 
     def visit_ImportFrom(self, node: ast.ImportFrom):  # noqa: N802
         """Record imported names and ensure whencall is available."""
         for name in node.names:
-            self.imports.add(name.name)
+            self.imports.add(name.asname if name.asname else name.name)
 
-        if node.module == "bocpy" and "whencall" not in node.names:
+        if node.module == "bocpy" and not any((a.asname or a.name) == "whencall" for a in node.names):
             node.names.append(ast.alias(name="whencall"))
             self.imports.add("whencall")
 
@@ -207,7 +207,9 @@ def visit_FunctionDef(self, node: ast.FunctionDef):  # noqa: N802
         # no longer function properly.
         self.cap_finder.clear()
         self.cap_finder.visit(behavior_node)
-        captures = list(self.cap_finder.captured_vars)
+        # __file__ is rewritten to a string constant by visit_Name below,
+        # so it must not be added to the parameter list as a capture.
+        captures = [c for c in self.cap_finder.captured_vars if c != "__file__"]
 
         # add the additional arguments to the function
         for name in captures:
@@ -248,6 +250,16 @@ def visit_FunctionDef(self, node: ast.FunctionDef):  # noqa: N802
                                            ("behaviors", Mapping[int, BehaviorInfo])])
 
 
+# Module-level dunders (__name__, __doc__, __package__, __spec__, __loader__)
+# are exposed via __builtins__, but inside a behavior they should refer to the
+# *user* module's value, not the worker's exported module. Removing them from
+# `known_vars` lets the capture mechanism pick them up from the call-site
+# frame's globals at runtime. __file__ is handled separately via inlining in
+# WhenTransformer.visit_Name.
+MODULE_DUNDERS = {"__name__", "__doc__", "__package__",
+                  "__spec__", "__loader__"}
+
+
 def export_module(tree: ast.Module, path: str = None) -> ExportResult:
     """Extract an AST as a BOC-enlightened module with generated behaviors.
 
@@ -256,7 +268,7 @@ def export_module(tree: ast.Module, path: str = None) -> ExportResult:
     :return: An export result with code and metadata
     :rtype: ExportResult
     """
-    builtins = set(globals()["__builtins__"].keys())
+    builtins = set(globals()["__builtins__"].keys()) - MODULE_DUNDERS
 
     boc_export = BOCModuleTransformer()
     boc_export.visit(tree)
diff --git a/src/bocpy/worker.py b/src/bocpy/worker.py
index fb014ce..0b6185d 100644
--- a/src/bocpy/worker.py
+++ b/src/bocpy/worker.py
@@ -38,17 +38,67 @@ def load_boc_module(module_name, file_path):
 
 
 def run_behavior(behavior):
-    """Execute a single behavior and notify the scheduler."""
-    bid = behavior.bid()
-    behavior.acquire()
+    """Execute a single behavior and release its requests inline."""
     try:
-        behavior.execute(boc_export)
-    except Exception as ex:
-        logger.exception(ex)
-        behavior.set_result(ex)
-
-    behavior.release()
-    send("boc_behavior", ("release", bid))
+        try:
+            _core.noticeboard_cache_clear()
+            behavior.acquire()
+        except Exception as ex:
+            # acquire() / cache_clear() failed before the body ran. The
+            # MCS chain for this behavior is still linked (behavior_schedule
+            # established the links on the caller thread), so we must
+            # unwind it here or every successor blocks forever. Mark
+            # the result Cown with the exception so any caller awaiting
+            # it sees a diagnostic instead of a permanent None.
+            logger.exception(ex)
+            try:
+                behavior.set_exception(ex)
+            except Exception as inner:
+                logger.exception(inner)
+            # acquire() is sequential (result -> args -> captures) and
+            # bails on first failure, so on a partial-success raise some
+            # cowns are owned by this worker and some are not. release()
+            # is similarly tolerant (it short-circuits NO_OWNER cowns),
+            # so calling it here releases the ones we did acquire before
+            # release_all hands the request to a successor. Without this
+            # the successor's cown_acquire fails with "already acquired
+            # by <this interp>" and every behavior on that cown strands.
+            try:
+                behavior.release()
+            except Exception as inner:
+                logger.exception(inner)
+            try:
+                behavior.release_all()
+            except Exception as inner:
+                logger.exception(inner)
+            return
+
+        try:
+            behavior.execute(boc_export)
+        except Exception as ex:
+            logger.exception(ex)
+            behavior.set_exception(ex)
+
+        try:
+            behavior.release()
+        except Exception as ex:
+            logger.exception(ex)
+        # Release the request array on the worker thread instead of
+        # round-tripping ("release", capsule) through the (now-gone)
+        # central scheduler thread.
+        try:
+            behavior.release_all()
+        except Exception as ex:
+            logger.exception(ex)
+    finally:
+        # Drop the terminator hold unconditionally. If anything above
+        # raised, failing to decrement here would leave wait() hung
+        # forever. Log and swallow so a single misbehaving worker step
+        # cannot strand the runtime.
+        try:
+            _core.terminator_dec()
+        except Exception as ex:
+            logger.exception(ex)
 
 
 def do_work():
@@ -58,19 +108,36 @@ def do_work():
         logger.debug("worker starting")
         send("boc_behavior", "started")
         while running:
-            match receive("boc_worker"):
-                case ["boc_worker", "shutdown"]:
-                    logger.debug("boc_worker/shutdown")
-                    running = False
-
-                case ["boc_worker", behavior]:
-                    run_behavior(behavior)
-                    behavior = None
+            try:
+                match receive("boc_worker"):
+                    case ["boc_worker", "shutdown"]:
+                        logger.debug("boc_worker/shutdown")
+                        running = False
+
+                    case ["boc_worker", behavior]:
+                        run_behavior(behavior)
+                        behavior = None
+            except Exception as ex:
+                # A failure inside run_behavior or receive must not
+                # break the loop -- if it did, this worker would exit
+                # without sending its "shutdown" reply and stop_workers
+                # would block forever waiting for it.
+                logger.exception(ex)
 
         logger.debug("worker stopped")
-        send("boc_behavior", "shutdown")
     except Exception as ex:
         logger.exception(ex)
+    finally:
+        # Always tell stop_workers we are leaving the loop, even on an
+        # unexpected exception, so it never hangs in receive("boc_behavior").
+        try:
+            send("boc_behavior", "shutdown")
+        except Exception as ex:
+            logger.exception(ex)
+        try:
+            _core.noticeboard_cache_clear()
+        except Exception as ex:
+            logger.exception(ex)
 
 
 def cleanup():
diff --git a/test/test_boc.py b/test/test_boc.py
index 69063fc..9cbbda9 100644
--- a/test/test_boc.py
+++ b/test/test_boc.py
@@ -1,14 +1,70 @@
 """Behavior-oriented concurrency tests."""
 
 import functools
+import sys
+import threading
 from typing import NamedTuple
 
-from bocpy import Cown, receive, send, TIMEOUT, wait, when
+from bocpy import Cown, drain, receive, send, start, TIMEOUT, wait, when
+from bocpy._core import CownCapsule
 import pytest
 
-
 RECEIVE_TIMEOUT = 10
 
+GLOBAL_FACTOR = 7
+
+
+def receive_asserts(count=1):
+    """Drain all expected assertion messages, then fail on first mismatch.
+
+    The "assert" queue is always drained before returning so that leftover
+    messages from a failing test do not leak into subsequent tests in CI.
+    """
+    failed = None
+    timed_out = False
+    try:
+        for _ in range(count):
+            result = receive("assert", RECEIVE_TIMEOUT)
+            if result[0] == TIMEOUT:
+                timed_out = True
+                break
+            _, (actual, expected) = result
+            if failed is None and actual != expected:
+                failed = (actual, expected)
+    finally:
+        drain("assert")
+
+    assert not timed_out, (
+        "Timed out waiting for an 'assert' message from a behavior. "
+        "Check that every @when arg count matches the decorated "
+        "function's parameter count."
+    )
+    if failed is not None:
+        actual, expected = failed
+        assert actual == expected, f"expected {expected!r}, got {actual!r}"
+
+
+class Multiplier:
+    """Multiplies a cown's value by a module-level global inside a method."""
+
+    def multiply(self, x: Cown) -> Cown:
+        """Schedule a behavior that captures GLOBAL_FACTOR from module scope."""
+        factor = GLOBAL_FACTOR  # noqa: F841 — captured by @when below
+
+        @when(x)
+        def do_multiply(x):
+            return x.value * factor  # noqa: B023
+
+        return do_multiply
+
+    def multiply_direct(self, x: Cown) -> Cown:
+        """Schedule a behavior that captures GLOBAL_FACTOR directly."""
+        @when(x)
+        def do_multiply(x):
+            return x.value * GLOBAL_FACTOR
+
+        return do_multiply
+
 
 def simple(x: Cown) -> Cown:
     """Double a cown's value in a behavior."""
@@ -146,28 +202,6 @@ def teardown_class(cls):
         """Ensure runtime is drained after suite."""
         wait()
 
-    def receive_asserts(self, count=1):
-        """Drain assertion messages and compare actual vs expected.
-
-        Uses a timeout so that if a behavior never fires (e.g. due to a
-        parameter-count mismatch in @when) the test fails quickly instead
-        of hanging forever.
-        """
-        failed = None
-        for _ in range(count):
-            result = receive("assert", RECEIVE_TIMEOUT)
-            assert result[0] != TIMEOUT, (
-                "Timed out waiting for an 'assert' message from a behavior. "
-                "Check that every @when arg count matches the decorated "
-                "function's parameter count."
-            )
-            _, (actual, expected) = result
-            if actual != expected:
-                failed = (actual, expected)
-
-        if failed is not None:
-            assert failed[0] != failed[1]
-
     def test_simple_dispatch(self):
         """Verify single when schedules and returns doubled value."""
         x = Cown(1)
@@ -178,22 +212,24 @@ def test_simple_dispatch(self):
         def _(y):
             send("assert", (y.value, 2))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_nested_dispatch(self):
         """Ensure nested behaviors see updated state."""
         x = Cown(1)
         y = nested(x)
 
+        # Only assert the final state. The intermediate value of x is racy:
+        # the inner nested_triple is scheduled on x from inside nested_double
+        # and may run before or after a behavior the main thread enqueues on
+        # x, depending on worker timing.
         @when(x, y)
         def check_double(x, y):
-            send("assert", (x.value, 2))
-
             @when(x, y.value)
-            def check_triple(x, y):
+            def check_triple(x, _inner):
                 send("assert", (x.value, 6))
 
-        self.receive_asserts(2)
+        receive_asserts()
 
     def test_exception(self):
         """Exceptions propagate as values in behaviors."""
@@ -205,7 +241,7 @@ def _(y):
             send("assert", (isinstance(y.value, ZeroDivisionError), True))
             y.value = None
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_two_cown_coordination(self):
         """Move value between two cowns with coordinated when."""
@@ -228,7 +264,7 @@ def _(x, y):
         check(x, 50)
         check(y, 50)
 
-        self.receive_asserts(4)
+        receive_asserts(4)
 
     def test_classes(self, num_philosophers=5, hunger=4):
         """Simulate dining philosophers and verify fork usage."""
@@ -247,7 +283,7 @@ def test_classes(self, num_philosophers=5, hunger=4):
             def _(f):
                 send("assert", (f.value.uses, 2*f.value.hunger))
 
-        self.receive_asserts(num_philosophers)
+        receive_asserts(num_philosophers)
 
     @pytest.mark.parametrize("n", [1, 10, 15])
     def test_variable_termination(self, n: int):
@@ -259,7 +295,7 @@ def test_variable_termination(self, n: int):
         def check(result):
             send("assert", (result.value, expected))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_cown_grouping(self):
         """Verify cown grouping returns correct sums."""
@@ -270,7 +306,7 @@ def check(results: list[Cown]):
             for r in results:
                 send("assert", (r.value, expected))
 
-        self.receive_asserts(len(results))
+        receive_asserts(len(results))
 
     def test_grouped_cown_mutation(self):
         """Write to cowns within a group and verify mutations stick."""
@@ -286,7 +322,7 @@ def verify(group: list[Cown[int]]):
             for i, c in enumerate(group):
                 send("assert", (c.value, i * 2))
 
-        self.receive_asserts(5)
+        receive_asserts(5)
 
     def test_group_and_single_mutation(self):
         """Mutate a group and a single cown in the same behavior."""
@@ -308,7 +344,7 @@ def check_zeroed(group: list[Cown[int]]):
             for c in group:
                 send("assert", (c.value, 0))
 
-        self.receive_asserts(4)
+        receive_asserts(4)
 
     def test_behavior_chain(self):
         """Chain three behaviors where each result feeds the next."""
@@ -330,7 +366,7 @@ def step3(s2):
         def check(s3):
             send("assert", (s3.value, 13))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_contention(self):
         """Many behaviors on the same cown serialize correctly."""
@@ -346,7 +382,7 @@ def _(c):
         def check(c):
             send("assert", (c.value, n))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_exception_type_error(self):
         """Verify TypeError inside a behavior is captured in the result cown."""
@@ -361,7 +397,7 @@ def check(b):
             send("assert", (isinstance(b.value, TypeError), True))
             b.value = None
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_exception_key_error(self):
         """Verify KeyError inside a behavior is captured in the result cown."""
@@ -376,7 +412,7 @@ def check(b):
             send("assert", (isinstance(b.value, KeyError), True))
             b.value = None
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_complex_object_repeated_mutation(self):
         """Multiple sequential behaviors mutate the same object in a cown."""
@@ -393,7 +429,7 @@ def _(a):
         def check(a):
             send("assert", (sorted(a.value.items), list(range(10))))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_duplicate_cown_same_twice(self):
         """Same cown passed twice to @when completes without deadlock."""
@@ -407,7 +443,7 @@ def add(a, b):
         def check(r):
             send("assert", (r.value, 10))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_duplicate_cown_same_thrice(self):
         """Same cown passed three times to @when completes without deadlock."""
@@ -421,7 +457,7 @@ def triple(a, b, d):
         def check(r):
             send("assert", (r.value, 9))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_duplicate_cown_non_adjacent(self):
         """Non-adjacent duplicate cowns in @when complete correctly."""
@@ -436,7 +472,7 @@ def mixed(x, y, z):
         def check(r):
             send("assert", (r.value, 40))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_duplicate_cown_in_group(self):
         """Duplicate cowns within a group complete without deadlock."""
@@ -450,7 +486,7 @@ def group_sum(group):
         def check(r):
             send("assert", (r.value, 14))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_duplicate_cown_mutation(self):
         """Mutating a cown passed twice reflects same underlying value."""
@@ -465,7 +501,7 @@ def mutate(a, b):
         def check(r):
             send("assert", (r.value, 42))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_cown_of_cown_direct(self):
         """CownCapsule as direct child of a Cown survives release/acquire."""
@@ -474,9 +510,9 @@ def test_cown_of_cown_direct(self):
 
         @when(outer)
         def read_outer(o):
-            send("assert", (type(o.value).__name__, "CownCapsule"))
+            send("assert", (type(o.value).__name__, "Cown"))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_cown_of_cown_access_inner(self):
         """Inner cown's value is accessible after outer round-trip."""
@@ -487,7 +523,7 @@ def test_cown_of_cown_access_inner(self):
         def check_both(o, i):
             send("assert", (i.value, 99))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_cown_of_cown_in_container(self):
         """CownCapsule nested in a dict survives pickle round-trip."""
@@ -496,9 +532,9 @@ def test_cown_of_cown_in_container(self):
 
         @when(outer)
         def check_container(o):
-            send("assert", (type(o.value["key"]).__name__, "CownCapsule"))
+            send("assert", (type(o.value["key"]).__name__, "Cown"))
 
-        self.receive_asserts()
+        receive_asserts()
 
     def test_cown_of_cown_schedule_inner(self):
         """Extract inner cown from outer and schedule a behavior on it."""
@@ -517,4 +553,311 @@ def schedule_on_inner(r):
             def read_inner(i):
                 send("assert", (i.value, 10))
 
-        self.receive_asserts()
+        receive_asserts()
+
+
+class TestGlobalCapture:
+    """Tests for capturing module-level globals inside class methods."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_method_captures_global_via_local(self):
+        """A method assigns a global to a local; @when captures the local."""
+        m = Multiplier()
+        x = Cown(5)
+        result = m.multiply(x)
+
+        @when(result)
+        def _(r):
+            send("assert", (r.value, 5 * GLOBAL_FACTOR))
+
+        receive_asserts()
+
+    def test_method_captures_global_directly(self):
+        """A method's @when captures a module-level global by name."""
+        m = Multiplier()
+        x = Cown(3)
+        result = m.multiply_direct(x)
+
+        @when(result)
+        def _(r):
+            send("assert", (r.value, 3 * GLOBAL_FACTOR))
+
+        receive_asserts()
+
+    @pytest.mark.parametrize("value", [1, 10, 100])
+    def test_method_captures_global_parametrized(self, value):
+        """Parametrized: global capture from a method works across inputs."""
+        m = Multiplier()
+        x = Cown(value)
+        result = m.multiply_direct(x)
+
+        @when(result)
+        def _(r):
+            send("assert", (r.value, value * GLOBAL_FACTOR))  # noqa: B023
+
+        receive_asserts()
+
+
+class TestExceptionFlag:
+    """Tests for the Cown.exception flag distinguishing thrown vs returned."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_exception_flag_on_throw(self):
+        """Thrown exception sets .exception to True."""
+        x = Cown(1)
+
+        @when(x)
+        def bad(x):
+            x.value /= 0
+
+        @when(bad)
+        def check(b):
+            send("assert", (b.exception, True))
+            send("assert", (isinstance(b.value, ZeroDivisionError), True))
+            b.value = None
+
+        receive_asserts(2)
+
+    def test_exception_flag_on_return(self):
+        """Returned Exception object has .exception False."""
+        x = Cown(1)
+
+        @when(x)
+        def returns_exc(x):
+            return ValueError("not an error")
+
+        @when(returns_exc)
+        def check(r):
+            send("assert", (r.exception, False))
+            send("assert", (isinstance(r.value, ValueError), True))
+
+        receive_asserts(2)
+
+    def test_exception_flag_cleared_on_value_write(self):
+        """Writing .value clears the exception flag."""
+        x = Cown(1)
+
+        @when(x)
+        def bad(x):
+            x.value /= 0
+
+        @when(bad)
+        def check(b):
+            send("assert", (b.exception, True))
+            b.value = "fixed"
+            send("assert", (b.exception, False))
+
+        receive_asserts(2)
+
+    def test_exception_flag_manual_set_clear(self):
+        """Manual .exception set and clear works."""
+        x = Cown(42)
+
+        @when(x)
+        def check(x):
+            send("assert", (x.exception, False))
+            x.exception = True
+            send("assert", (x.exception, True))
+            x.exception = False
+            send("assert", (x.exception, False))
+
+        receive_asserts(3)
+
+    def test_returned_exception_no_unhandled_report(self, capsys):
+        """Returned Exception doesn't trigger unhandled exception report."""
+        x = Cown(1)
+
+        @when(x)
+        def returns_exc(x):
+            return ValueError("just a value")
+
+        @when(returns_exc)
+        def check(r):
+            send("assert", (r.exception, False))
+            send("assert", (isinstance(r.value, ValueError), True))
+
+        receive_asserts(2)
+        wait()
+        captured = capsys.readouterr()
+        assert "unhandled exception" not in captured.err.lower()
+
+
+class TestUnicodeSource:
+    """Source containing non-ASCII characters must round-trip through export.
+
+    Regression: the exported behavior module was previously written without
+    an explicit ``encoding`` argument, so on platforms whose locale encoding
+    is not UTF-8 (notably Windows / cp1252) any non-ASCII literal in the
+    source was written as a non-UTF-8 byte. Worker sub-interpreters then
+    failed to import the module with a SyntaxError on the offending byte,
+    causing the worker pool to fail to start.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_non_ascii_literal_in_behavior(self):
+        """A behavior containing a non-ASCII string literal runs correctly."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            # "€" (U+20AC) is 3 bytes in UTF-8 and a single byte 0x80 in
+            # cp1252; if the export file is not written as UTF-8 the
+            # worker fails to import this module.
+            send("assert", ("€", "€"))
+
+        receive_asserts()
+
+
+class TestModuleDunderCapture:
+    """Module-level dunders inside a behavior must resolve to the user module.
+
+    Regression: ``__name__``, ``__doc__``, ``__package__``, ``__spec__``,
+    and ``__loader__`` are exposed via ``__builtins__``. They were being
+    silently filtered out of the capture set, so inside a behavior they
+    resolved against the worker's exported module (e.g. ``__name__`` was
+    ``"__bocmain__"`` instead of the original module name). They must now
+    flow through the capture mechanism so the call-site value is used.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_name_resolves_to_user_module(self):
+        """__name__ inside a behavior is the user module, not the worker's."""
+        x = Cown(0)
+        expected = __name__
+
+        @when(x)
+        def _(x):
+            send("assert", (__name__, expected))  # noqa: B023
+
+        receive_asserts()
+
+    def test_package_resolves_to_user_module(self):
+        """__package__ inside a behavior matches the user module's value."""
+        x = Cown(0)
+        expected = __package__
+
+        @when(x)
+        def _(x):
+            send("assert", (__package__, expected))  # noqa: B023
+
+        receive_asserts()
+
+
+# ---------------------------------------------------------------------------
+# Cross-worker scheduling and cown-identity round-trip invariants.
+#
+# These two properties of the BOC runtime are not asserted directly by
+# any of the @when / Cown / capture tests above:
+#
+#   1. With workers >= 2, behaviors really run on more than one worker
+#      thread. Without this, every "parallel" workload degenerates to
+#      single-threaded throughput.
+#   2. A Cown round-tripped through XIData into a worker arrives back
+#      as a CownCapsule. This exercises the XIData round-trip path
+#      that the 2PL dedup machinery relies on.
+# ---------------------------------------------------------------------------
+
+
+class TestCrossWorker:
+    """Verify cross-worker scheduling and cown round-trip through XIData."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Drain leftover tagged messages so subsequent tests start clean."""
+        for tag in ("probe_tid", "probe_id"):
+            try:
+                drain(tag)
+            except Exception:
+                pass
+
+    def test_two_workers_observe_distinct_thread_ids(self):
+        """At workers=2, >=2 distinct worker thread ids must appear."""
+        if sys.version_info < (3, 12):
+            pytest.skip(
+                "per-interpreter GIL only available on Python 3.12+; on "
+                "shared-GIL interpreters a single worker can drain the "
+                "queue before the other wakes up, so this property does "
+                "not hold")
+        tid_samples = 16
+        cells = [Cown(0) for _ in range(tid_samples)]
+
+        start(worker_count=2)
+        try:
+            for c in cells:
+                @when(c)
+                def _tid(_c):
+                    send("probe_tid", threading.get_ident())
+        finally:
+            del cells
+            wait()
+
+        thread_ids = set()
+        for _ in range(tid_samples):
+            msg = receive(["probe_tid"], RECEIVE_TIMEOUT)
+            assert msg is not None and msg[0] != TIMEOUT, (
+                "thread-id probe timed out")
+            thread_ids.add(msg[1])
+
+        assert len(thread_ids) >= 2, (
+            f"only {len(thread_ids)} distinct worker thread id observed "
+            f"across {tid_samples} samples on workers=2; cross-worker "
+            "scheduling appears broken")
+
+    def test_cown_round_trips_through_xidata(self):
+        """A Cown sent from a worker arrives back as a CownCapsule.
+
+        Cross-interpreter ``send`` does not preserve raw ``CownCapsule``
+        pointer equality on the receive side — XIData may resurrect a
+        fresh wrapper. The 2PL identity invariant the runtime relies on
+        lives in the runtime's ``xidata_to_cowns`` dedup machinery at
+        acquire time, not in ``__eq__`` after a queue round-trip. This
+        test therefore asserts only that every slot's probe came back
+        with a ``CownCapsule`` payload (i.e. the cown survived XIData
+        in both directions); it does not assert wrapper identity.
+        """
+        ring_size = 4
+        ring = [Cown(0) for _ in range(ring_size)]
+        seen = {}
+
+        start(worker_count=2)
+        try:
+            for idx, cell in enumerate(ring):
+                # The transpiler auto-captures `idx` and `cell` as free
+                # variables; do NOT use the `idx=idx` default-arg trick
+                # — it confuses the worker module export.
+                @when(cell)
+                def _probe(c):
+                    send("probe_id", (idx, c))  # noqa: B023
+            for _ in range(ring_size):
+                msg = receive(["probe_id"], RECEIVE_TIMEOUT)
+                assert msg is not None and msg[0] != TIMEOUT, (
+                    "identity probe timed out")
+                _, (probe_idx, probe_cown) = msg
+                seen[probe_idx] = probe_cown
+        finally:
+            del ring
+            wait()
+
+        for idx in range(ring_size):
+            observed = seen.get(idx)
+            assert observed is not None, (
+                f"identity probe missing for slot {idx}")
+            assert isinstance(observed, CownCapsule), (
+                f"slot {idx} returned {type(observed).__name__}, "
+                "expected CownCapsule")
diff --git a/test/test_noticeboard.py b/test/test_noticeboard.py
new file mode 100644
index 0000000..456d334
--- /dev/null
+++ b/test/test_noticeboard.py
@@ -0,0 +1,1508 @@
+"""Tests for the noticeboard feature."""
+
+from functools import partial
+
+from bocpy import (Cown, drain, notice_delete, notice_read, notice_sync,
+                   notice_update, notice_write, noticeboard,
+                   noticeboard_version, receive,
+                   REMOVED, send, start, TIMEOUT, wait, when)
+import bocpy._core as _core
+import pytest
+
+
+RECEIVE_TIMEOUT = 10
+
+
+def receive_asserts(count=1):
+    """Drain all expected assertion messages, then fail on first mismatch.
+
+    The "assert" queue is always drained before returning so that leftover
+    messages from a failing test do not leak into subsequent tests in CI.
+    """
+    failed = None
+    timed_out = False
+    try:
+        for _ in range(count):
+            result = receive("assert", RECEIVE_TIMEOUT)
+            if result[0] == TIMEOUT:
+                timed_out = True
+                break
+            _, (actual, expected) = result
+            if failed is None and actual != expected:
+                failed = (actual, expected)
+    finally:
+        drain("assert")
+
+    assert not timed_out, (
+        "Timed out waiting for an 'assert' message from a behavior. "
+        "Check that every @when arg count matches the decorated "
+        "function's parameter count."
+    )
+    if failed is not None:
+        actual, expected = failed
+        assert actual == expected, f"expected {expected!r}, got {actual!r}"
+
+
+class TestNoticeboard:
+    """Tests for noticeboard write/read round-trip and snapshot isolation."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_write_then_read_roundtrip(self):
+        """Write a value in one behavior, read it in a subsequent one."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("greeting", "hello")
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("greeting"), "hello"))
+
+        receive_asserts()
+
+    def test_write_overwrite(self):
+        """Overwriting a key replaces the previous value."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("counter", 10)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_write("counter", 20)
+            notice_sync()
+
+        @when(x, step2)
+        def step3(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("counter"), 20))
+
+        receive_asserts()
+
+    def test_snapshot_returns_mapping(self):
+        """Snapshot returns a read-only mapping even with no writes."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            from collections.abc import Mapping
+            snap = noticeboard()
+            send("assert", (isinstance(snap, Mapping), True))
+
+        receive_asserts()
+
+    def test_multiple_keys(self):
+        """Multiple keys can coexist in the noticeboard."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("a", 1)
+            notice_write("b", 2)
+            notice_write("c", 3)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("a"), 1))
+            send("assert", (snap.get("b"), 2))
+            send("assert", (snap.get("c"), 3))
+
+        receive_asserts(3)
+
+    def test_frozen_snapshot(self):
+        """Snapshot is frozen: a write after snapshot doesn't change it."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("val", 100)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap1 = noticeboard()
+            notice_write("val", 200)
+            notice_sync()
+            snap2 = noticeboard()
+            # Both calls in the same behavior return the same cached snapshot
+            send("assert", (snap1.get("val"), 100))
+            send("assert", (snap1.get("val"), snap2.get("val")))
+
+        receive_asserts(2)
+
+    def test_snapshot_cache_cleared_between_behaviors(self):
+        """Each behavior gets a fresh snapshot, not the previous one's cache."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("seq", 1)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("seq"), 1))
+            notice_write("seq", 2)
+            notice_sync()
+
+        @when(x, step2)
+        def step3(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("seq"), 2))
+
+        receive_asserts(2)
+
+    def test_picklable_value(self):
+        """Complex (picklable) values round-trip through the noticeboard."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("data", [1, 2, 3])
+            notice_sync()
+
+        @when(x)
+        def step2(x):
+            snap = noticeboard()
+            send("assert", (snap.get("data"), [1, 2, 3]))
+
+        receive_asserts()
+
+    def test_set_value_forces_pickle_path(self):
+        """A set is not natively shareable and must take the pickle path."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("tags", {1, 2, 3})
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("tags"), {1, 2, 3}))
+
+        receive_asserts()
+
+    def test_int_value(self):
+        """Integer values (native cross-interpreter) round-trip correctly."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("num", 42)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", (snap.get("num"), 42))
+
+        receive_asserts()
+
+    def test_none_value(self):
+        """None round-trips through the noticeboard."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("empty", None)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", ("empty" in snap, True))
+            send("assert", (snap["empty"], None))
+
+        receive_asserts(2)
+
+    def test_notice_read_existing_key(self):
+        """notice_read returns the value for an existing key."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("color", "blue")
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            send("assert", (notice_read("color"), "blue"))
+
+        receive_asserts()
+
+    def test_notice_read_missing_key_default(self):
+        """notice_read returns None for a missing key by default."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            send("assert", (notice_read("nonexistent"), None))
+
+        receive_asserts()
+
+    def test_notice_read_missing_key_custom_default(self):
+        """notice_read returns the custom default for a missing key."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            send("assert", (notice_read("nonexistent", 42), 42))
+
+        receive_asserts()
+
+    def test_notice_read_uses_cached_snapshot(self):
+        """Two notice_read calls in the same behavior use the same snapshot."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("tick", 1)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            val1 = notice_read("tick")
+            notice_write("tick", 99)
+            notice_sync()
+            val2 = notice_read("tick")
+            # Both reads see the cached snapshot, not the new write
+            send("assert", (val1, val2))
+
+        receive_asserts()
+
+
+class TestNoticeboardBoundary:
+    """Boundary tests for noticeboard key length and entry capacity."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each boundary test."""
+        _core.noticeboard_clear()
+
+    def test_max_key_length_63_bytes(self):
+        """A key of exactly 63 UTF-8 bytes is accepted."""
+        x = Cown(0)
+        long_key = "k" * 63  # exactly 63 bytes
+
+        @when(x)
+        def step1(x):
+            notice_write(long_key, "ok")
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            val = notice_read(long_key)
+            send("assert", (val, "ok"))
+
+        receive_asserts()
+
+    def test_key_length_64_bytes_rejected(self):
+        """A key of 64 UTF-8 bytes is rejected with ValueError."""
+        x = Cown(0)
+        too_long = "k" * 64  # 64 bytes, exceeds 63-byte limit
+
+        @when(x)
+        def _(x):
+            try:
+                notice_write(too_long, "fail")
+                notice_sync()
+                send("assert", (False, True))  # should not reach here
+            except ValueError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+    def test_64_entries_accepted(self):
+        """The noticeboard accepts up to 64 distinct keys."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            for i in range(64):
+                notice_write(f"slot{i}", i)
+                notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            send("assert", (len(snap) >= 64, True))
+            send("assert", (snap.get("slot0"), 0))
+            send("assert", (snap.get("slot63"), 63))
+
+        receive_asserts(3)
+
+    def test_65th_entry_silently_dropped(self):
+        """The 65th distinct key is silently dropped by the noticeboard thread."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            for i in range(65):
+                notice_write(f"cap{i}", i)
+                notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            # Only 64 entries should be present; the 65th is dropped
+            cap_keys = [k for k in snap if k.startswith("cap")]
+            send("assert", (len(cap_keys), 64))
+            # The first 64 keys (cap0..cap63) should be present
+            send("assert", (snap.get("cap0"), 0))
+            send("assert", (snap.get("cap63"), 63))
+            # The 65th key (cap64) should be missing
+            send("assert", ("cap64" not in snap, True))
+
+        receive_asserts(4)
+
+    def test_write_non_string_key_rejected(self):
+        """Non-string key raises TypeError."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            try:
+                notice_write(123, "value")
+                notice_sync()
+                send("assert", (False, True))
+            except TypeError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+    def test_key_with_nul_rejected(self):
+        """A key containing NUL is rejected with ValueError."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            try:
+                notice_write("a\x00b", "value")
+                notice_sync()
+                send("assert", (False, True))
+            except ValueError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+
+class TestNoticeboardConcurrency:
+    """Stress tests for concurrent noticeboard writes from independent behaviors."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each test."""
+        _core.noticeboard_clear()
+
+    def test_concurrent_writes_from_independent_behaviors(self):
+        """Independent behaviors on separate cowns write unique keys concurrently."""
+        cowns = [Cown(i) for i in range(8)]
+        for i in range(8):
+
+            @when(cowns[i])
+            def writer(c):
+                notice_write(f"cw_{c.value}", c.value * 10)
+                # Block this behavior until the write commits, so the
+                # reader (which acquires every cown below) is guaranteed
+                # to observe it.
+                notice_sync()
+
+        # The reader requires every writer cown, so it cannot run until
+        # every writer behavior has returned — and notice_sync() above
+        # ensures each writer's mutation is committed before it returns.
+        @when(cowns)
+        def reader(cowns):
+            snap = noticeboard()
+            count = sum(1 for k in snap if k.startswith("cw_"))
+            send("assert", (count, 8))
+            send("assert", (snap.get("cw_0"), 0))
+            send("assert", (snap.get("cw_7"), 70))
+
+        receive_asserts(3)
+
+
+class TestNoticeboardUTF8:
+    """Tests for multi-byte UTF-8 key handling."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each test."""
+        _core.noticeboard_clear()
+
+    def test_multibyte_key_within_limit(self):
+        """A 3-byte character at byte position 60 fits within 63-byte limit."""
+        x = Cown(0)
+        # "€" is 3 UTF-8 bytes; 60 ASCII + 3 = 63 bytes total
+        key_63 = "a" * 60 + "€"
+
+        @when(x)
+        def step1(x):
+            notice_write(key_63, "ok")
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            val = notice_read(key_63)
+            send("assert", (val, "ok"))
+
+        receive_asserts()
+
+    def test_multibyte_key_exceeds_limit(self):
+        """A 3-byte character at byte position 61 exceeds the 63-byte limit."""
+        x = Cown(0)
+        # 61 ASCII + 3 = 64 bytes total, exceeds limit
+        key_64 = "a" * 61 + "€"
+
+        @when(x)
+        def _(x):
+            try:
+                notice_write(key_64, "fail")
+                notice_sync()
+                send("assert", (False, True))
+            except ValueError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+
+class TestNoticeboardRestart:
+    """Tests for noticeboard state across runtime restart."""
+
+    def test_noticeboard_empty_after_restart(self):
+        """After wait() + new behaviors, noticeboard starts fresh."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("before_restart", 42)
+            notice_sync()
+
+        @when(x)
+        def step2(x):
+            snap = noticeboard()
+            send("assert", (snap.get("before_restart"), 42))
+
+        receive_asserts()
+        wait()
+
+        # Start fresh — noticeboard should be cleared by stop()
+        y = Cown(0)
+
+        @when(y)
+        def check(y):
+            snap = noticeboard()
+            send("assert", ("before_restart" not in snap, True))
+
+        receive_asserts()
+        wait()
+
+
+# Module-level helpers for notice_update tests (must be picklable).
+
+
+def _increment(x):
+    """Return x + 1."""
+    return x + 1
+
+
+def _add_ten(x):
+    """Return x + 10."""
+    return x + 10
+
+
+def _wrap_value(x):
+    """Return (x, 'seen') to verify what fn received."""
+    return (x, "seen")
+
+
+def _div_by_zero(x):
+    """Raise ZeroDivisionError."""
+    return x / 0
+
+
+def _return_removed(x):
+    """Return the REMOVED sentinel."""
+    return REMOVED
+
+
+def _conditionally_remove(x):
+    """Return REMOVED if x > 100, else x + 1."""
+    if x > 100:
+        return REMOVED
+    return x + 1
+
+
+class TestNoticeUpdate:
+    """Tests for notice_update atomic read-modify-write."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each test."""
+        _core.noticeboard_clear()
+
+    def test_basic_increment(self):
+        """Update an existing key with a module-level function."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("counter", 10)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_update("counter", _increment)
+            notice_sync()
+
+        @when(x, step2)
+        def step3(x, _):
+            send("assert", (notice_read("counter"), 11))
+
+        receive_asserts()
+
+    def test_default_on_absent_key(self):
+        """Update a missing key uses the default value."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_update("missing", _add_ten, default=0)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            send("assert", (notice_read("missing"), 10))
+
+        receive_asserts()
+
+    def test_none_sentinel(self):
+        """A key holding None is distinguished from an absent key."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("k", None)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_update("k", _wrap_value, default="WRONG")
+            notice_sync()
+
+        @when(x, step2)
+        def step3(x, _):
+            val = notice_read("k")
+            # fn should have received None (the stored value), not "WRONG"
+            send("assert", (val, (None, "seen")))
+
+        receive_asserts()
+
+    def test_concurrent_updates(self):
+        """Multiple independent behaviors updating the same key."""
+        n = 8
+        cowns = [Cown(i) for i in range(n)]
+        for i in range(n):
+
+            @when(cowns[i])
+            def writer(c):
+                notice_update("counter", _increment, default=0)
+                notice_sync()
+
+        # Reader requires every writer cown -> runs only after every
+        # writer behavior returns -> after every notice_sync() commits.
+        @when(cowns)
+        def reader(_):
+            send("assert", (notice_read("counter"), n))
+
+        receive_asserts()
+
+    def test_key_validation_type(self):
+        """Non-string key raises TypeError."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            try:
+                notice_update(123, _increment)
+                notice_sync()
+                send("assert", (False, True))
+            except TypeError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+    def test_fn_not_callable(self):
+        """Non-callable fn raises TypeError."""
+        x = Cown(0)
+
+        @when(x)
+        def _(x):
+            try:
+                notice_update("key", "not_callable")
+                notice_sync()
+                send("assert", (False, True))
+            except TypeError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+    def test_fn_raises_keeps_previous_value(self):
+        """If fn raises, the key retains its previous value."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("safe", 42)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_update("safe", _div_by_zero)
+            notice_sync()
+
+        @when(x, step2)
+        def step3(x, _):
+            send("assert", (notice_read("safe"), 42))
+
+        receive_asserts()
+
+    def test_functools_partial(self):
+        """functools.partial with a builtin works as fn."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_update("best", partial(max, 42), default=0)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            send("assert", (notice_read("best"), 42))
+
+        receive_asserts()
+
+
+class TestNoticeboardReadOnly:
+    """Tests that the snapshot is read-only (MappingProxyType)."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_snapshot_mutation_rejected(self):
+        """Direct mutation of the snapshot raises TypeError."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("immut", 1)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            try:
+                snap["immut"] = 999
+                send("assert", (False, True))  # should not reach here
+            except TypeError:
+                send("assert", (True, True))
+            # Original value is unaffected
+            send("assert", (notice_read("immut"), 1))
+
+        receive_asserts(2)
+
+    def test_snapshot_del_rejected(self):
+        """Deleting a key from the snapshot raises TypeError."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("del_test", 42)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            snap = noticeboard()
+            try:
+                del snap["del_test"]
+                send("assert", (False, True))
+            except TypeError:
+                send("assert", (True, True))
+
+        receive_asserts()
+
+
+class TestNoticeboardPreRuntime:
+    """Tests for noticeboard calls before the runtime is started."""
+
+    @classmethod
+    def setup_class(cls):
+        """Ensure runtime is stopped before this class runs.
+
+        These tests never start the runtime, so we do not need per-test
+        wait() calls; one shutdown at class entry is enough and avoids
+        hammering the worker lifecycle (which can intermittently trip
+        CPython 3.13 sub-interpreter teardown bugs).
+        """
+        wait()
+
+    def test_notice_write_before_start(self):
+        """notice_write raises RuntimeError before the runtime is started."""
+        with pytest.raises(RuntimeError, match="cannot write to the noticeboard"):
+            notice_write("key", "value")
+            notice_sync()
+
+    def test_notice_update_before_start(self):
+        """notice_update raises RuntimeError before the runtime is started."""
+        with pytest.raises(RuntimeError, match="cannot update the noticeboard"):
+            notice_update("key", _increment)
+            notice_sync()
+
+
+class TestNoticeboardFireAndForget:
+    """Tests for fire-and-forget write semantics."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each test."""
+        _core.noticeboard_clear()
+
+    def test_write_persists_after_behavior_failure(self):
+        """A notice_write sent before a behavior raises is still applied."""
+        x = Cown(0)
+
+        @when(x)
+        def failing(x):
+            notice_write("survivor", 42)
+            notice_sync()
+            raise ValueError("intentional failure")
+
+        @when(x, failing)
+        def check(x, _):
+            send("assert", (notice_read("survivor"), 42))
+
+        receive_asserts()
+
+
+# Module-level helpers for notice_delete / REMOVED tests.
+
+
+def _read_ring_first_value(_ignored):
+    """Return the value of ``ring[0]`` from the noticeboard.
+
+    Module-level so the transpiler can serialize it for the worker.
+    """
+    ring = noticeboard()["ring"]
+    return ring[0].value
+
+
+def _read_ring_size(_ignored):
+    """Return the length of the noticeboard's ``ring`` entry."""
+    return len(noticeboard()["ring"])
+
+
+class SlotHolder:
+    """Slot-only container used by the `__slots__` MRO regression test.
+
+    Module-level so the transpiler can serialize it for the workers.
+    Has no ``__dict__``; every attribute lives in a slot.
+    """
+
+    __slots__ = ("cown", "label")
+
+    def __init__(self, cown, label):
+        """Store *cown* and *label* as the instance's only state."""
+        self.cown = cown
+        self.label = label
+
+
+class SlotSubclass(SlotHolder):
+    """Slot-only subclass: slots declared at a different MRO level."""
+
+    __slots__ = ("extra",)
+
+    def __init__(self, cown, label, extra):
+        """Initialise the base fields plus a subclass-only slot."""
+        super().__init__(cown, label)
+        self.extra = extra
+
+
+class TestNoticeboardCownPinning:
+    """Regression tests: cowns stored on the noticeboard outlive the writer.
+
+    These cover the bug where a ``Cown`` placed on the noticeboard was
+    only kept alive by the original wrapper's COWN_INCREF; once the
+    wrapper went out of scope, every worker that had unpickled a copy
+    would issue a matching DECREF on dealloc, sending the underlying
+    BOCCown's refcount negative. The fix takes an independent strong
+    reference inside the noticeboard entry.
+    """
+
+    @classmethod
+    def setup_class(cls):
+        """Start the runtime so the noticeboard thread is registered."""
+        start()
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each test."""
+        _core.noticeboard_clear()
+
+    def test_ring_of_cowns_survives_writer_dropping_reference(self):
+        """A list of cowns on the noticeboard is usable after writer drops it."""
+        # Build a small ring of cowns in a behavior, publish it to the
+        # noticeboard, then drop every local reference to the ring on the
+        # writer side. The noticeboard becomes the only thing keeping the
+        # cowns alive across worker reads.
+        x = Cown(0)
+
+        @when(x)
+        def writer(x):
+            ring = [Cown(i * 10) for i in range(8)]
+            notice_write("ring", ring)
+            notice_sync()
+            # Local goes out of scope at function return — only the
+            # noticeboard's pin is left.
+
+        @when(x, writer)
+        def first_read(x, _):
+            ring = noticeboard()["ring"]
+            send("assert", (len(ring), 8))
+
+        @when(x, first_read)
+        def second_read(x, _):
+            ring = noticeboard()["ring"]
+            send("assert", (len(ring), 8))
+
+        @when(x, second_read)
+        def acquire_first(x, _):
+            ring = noticeboard()["ring"]
+            # Acquire the first cown for read; this dereferences the
+            # underlying BOCCown and would assert if it had been freed.
+            with ring[0] as v:
+                send("assert", (v, 0))
+
+        receive_asserts(count=3)
+
+    def test_overwrite_releases_old_cown_pins(self):
+        """Overwriting a noticeboard entry releases the old entry's pins."""
+        x = Cown(0)
+
+        @when(x)
+        def first_write(x):
+            first = [Cown(i) for i in range(4)]
+            notice_write("ring", first)
+            notice_sync()
+
+        @when(x, first_write)
+        def second_write(x, _):
+            second = [Cown(100 + i) for i in range(4)]
+            notice_write("ring", second)
+            notice_sync()
+
+        @when(x, second_write)
+        def check(x, _):
+            ring = noticeboard()["ring"]
+            with ring[0] as v:
+                send("assert", (v, 100))
+
+        receive_asserts()
+
+    def test_delete_releases_cown_pins(self):
+        """notice_delete drops the entry's pins; a fresh write reuses the slot."""
+        x = Cown(0)
+
+        @when(x)
+        def initial_write(x):
+            ring = [Cown(i) for i in range(3)]
+            notice_write("ring", ring)
+            notice_sync()
+
+        @when(x, initial_write)
+        def remove_entry(x, _):
+            notice_delete("ring")
+            notice_sync()
+
+        # The delete is non-blocking; verify in a subsequent behavior so
+        # the noticeboard thread has had a chance to process the message and the
+        # per-behavior snapshot cache is rebuilt.
+        @when(x, remove_entry)
+        def verify_gone(x, _):
+            send("assert", ("ring" in noticeboard(), False))
+
+        # After delete + new write the noticeboard reads the new entry.
+        @when(x, verify_gone)
+        def write_new(x, _):
+            new_ring = [Cown(999)]
+            notice_write("ring", new_ring)
+            notice_sync()
+
+        @when(x, write_new)
+        def check_new(x, _):
+            ring = noticeboard()["ring"]
+            with ring[0] as v:
+                send("assert", (v, 999))
+
+        receive_asserts(count=2)
+
+    def test_slot_only_holder_cown_survives_writer(self):
+        """Cowns reachable through ``__slots__`` are pinned by the noticeboard.
+
+        Regression: ``_collect_cown_capsules`` used to only descend
+        into ``obj.__dict__``. A slot-only class has no ``__dict__``,
+        so any cown stored in a slot attribute was silently dropped
+        from the pin list -- the BOCCown would be freed with pickled
+        bytes still referring to it, and the next reader would crash
+        on the dangling pointer.
+        """
+        x = Cown(0)
+
+        @when(x)
+        def writer(x):
+            holder = SlotHolder(Cown(12345), "first")
+            notice_write("slot_holder", holder)
+            notice_sync()
+            # Local goes out of scope at function return -- only the
+            # noticeboard's pin should keep the inner Cown alive.
+
+        @when(x, writer)
+        def read_back(x, _):
+            holder = noticeboard()["slot_holder"]
+            send("assert", (holder.label, "first"))
+            with holder.cown as v:
+                send("assert", (v, 12345))
+
+        receive_asserts(count=2)
+
+    def test_slot_subclass_cown_survives_writer(self):
+        """Cowns reachable through an MRO chain of ``__slots__`` are pinned.
+
+        Extends the previous test to classes that declare slots at
+        different levels of the MRO, exercising the MRO walk rather
+        than only the leaf type's ``__slots__``.
+        """
+        x = Cown(0)
+
+        @when(x)
+        def writer(x):
+            holder = SlotSubclass(Cown(7777), "sub", Cown(8888))
+            notice_write("slot_sub", holder)
+            notice_sync()
+
+        @when(x, writer)
+        def read_back(x, _):
+            holder = noticeboard()["slot_sub"]
+            send("assert", (holder.label, "sub"))
+            with holder.cown as v:
+                send("assert", (v, 7777))
+            with holder.extra as v:
+                send("assert", (v, 8888))
+
+        receive_asserts(count=3)
+
+
+class TestNoticeboardSnapshotImmutable:
+    """The cached snapshot is read-only; user code cannot corrupt it."""
+
+    @classmethod
+    def setup_class(cls):
+        start()
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+
+    def setup_method(self):
+        _core.noticeboard_clear()
+
+    def test_snapshot_is_mappingproxy(self):
+        """noticeboard() returns a read-only mapping proxy."""
+        x = Cown(0)
+
+        @when(x)
+        def setup_then_check(x):
+            notice_write("k", "v")
+            notice_sync()
+
+        @when(x, setup_then_check)
+        def check(x, _):
+            snap = noticeboard()
+            # Avoid importing MappingProxyType inside the behavior — the
+            # transpiler would capture the symbol and pickling the
+            # ``mappingproxy`` builtin class fails. Compare by type name
+            # instead.
+            send("assert", (type(snap).__name__, "mappingproxy"))
+
+        receive_asserts()
+
+    def test_snapshot_rejects_mutation(self):
+        """Attempting to mutate the snapshot raises TypeError."""
+        x = Cown(0)
+
+        @when(x)
+        def writer(x):
+            notice_write("k", "v")
+            notice_sync()
+
+        @when(x, writer)
+        def check(x, _):
+            snap = noticeboard()
+            try:
+                snap["k"] = "new"  # type: ignore[index]
+                send("assert", ("no-error", "TypeError"))
+            except TypeError:
+                send("assert", ("TypeError", "TypeError"))
+
+        receive_asserts()
+
+
+class TestNoticeboardThreadOnly:
+    """Direct mutation entry points reject calls from non-noticeboard threads."""
+
+    @classmethod
+    def setup_class(cls):
+        """Start the runtime so that NB_NOTICEBOARD_TID is registered."""
+        # A trivial behavior is enough to spin up the runtime. After
+        # this point any direct C-level write/delete from the main
+        # thread must be rejected.
+        x = Cown(0)
+
+        @when(x)
+        def _noop(x):
+            send("assert", (1, 1))
+
+        receive_asserts()
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+
+    def test_main_thread_write_direct_rejected(self):
+        """noticeboard_write_direct raises if called from the main thread."""
+        with pytest.raises(RuntimeError, match="noticeboard thread"):
+            _core.noticeboard_write_direct("k", "v", [])
+
+    def test_main_thread_delete_rejected(self):
+        """noticeboard_delete raises if called from the main thread."""
+        with pytest.raises(RuntimeError, match="noticeboard thread"):
+            _core.noticeboard_delete("k")
+
+
+class TestNoticeDelete:
+    """Tests for notice_delete and the REMOVED sentinel."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def setup_method(self):
+        """Clear the noticeboard before each test."""
+        _core.noticeboard_clear()
+
+    def test_delete_existing_key(self):
+        """notice_delete removes an existing key from the noticeboard."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("doomed", 99)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_delete("doomed")
+            notice_sync()
+
+        @when(x, step2)
+        def check(x, _):
+            snap = noticeboard()
+            send("assert", ("doomed" not in snap, True))
+
+        receive_asserts()
+
+    def test_delete_absent_key_is_noop(self):
+        """notice_delete on a missing key is a silent no-op."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("keeper", "safe")
+            notice_delete("nonexistent")
+            notice_sync()
+
+        @when(x, step1)
+        def check(x, _):
+            send("assert", (notice_read("keeper"), "safe"))
+
+        receive_asserts()
+
+    def test_update_fn_returns_removed(self):
+        """When fn returns REMOVED, the entry is deleted."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("target", 42)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_update("target", _return_removed)
+            notice_sync()
+
+        @when(x, step2)
+        def check(x, _):
+            snap = noticeboard()
+            send("assert", ("target" not in snap, True))
+
+        receive_asserts()
+
+    def test_update_conditional_remove(self):
+        """REMOVED only triggers when fn actually returns it."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("val", 50)
+            notice_sync()
+
+        # 50 <= 100, so fn returns 51
+        @when(x, step1)
+        def step2(x, _):
+            notice_update("val", _conditionally_remove)
+            notice_sync()
+
+        @when(x, step2)
+        def check1(x, _):
+            send("assert", (notice_read("val"), 51))
+
+        receive_asserts()
+
+    def test_update_conditional_remove_triggers(self):
+        """REMOVED triggers when value exceeds threshold."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("val", 200)
+            notice_sync()
+
+        # 200 > 100, so fn returns REMOVED
+        @when(x, step1)
+        def step2(x, _):
+            notice_update("val", _conditionally_remove)
+            notice_sync()
+
+        @when(x, step2)
+        def check(x, _):
+            snap = noticeboard()
+            send("assert", ("val" not in snap, True))
+
+        receive_asserts()
+
+    def test_removed_then_update_uses_default(self):
+        """After deletion, notice_update uses the default value."""
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("counter", 10)
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            notice_delete("counter")
+            notice_sync()
+
+        @when(x, step2)
+        def step3(x, _):
+            notice_update("counter", _increment, default=0)
+            notice_sync()
+
+        @when(x, step3)
+        def check(x, _):
+            send("assert", (notice_read("counter"), 1))
+
+        receive_asserts()
+
+    def test_delete_frees_capacity(self):
+        """Deleting an entry frees a slot for a new entry."""
+        x = Cown(0)
+
+        @when(x)
+        def fill(x):
+            for i in range(64):
+                notice_write(f"k{i}", i)
+            notice_sync()
+
+        @when(x, fill)
+        def delete_one(x, _):
+            notice_delete("k0")
+            notice_sync()
+
+        @when(x, delete_one)
+        def add_new(x, _):
+            notice_write("new_key", "hello")
+            notice_sync()
+
+        @when(x, add_new)
+        def check(x, _):
+            snap = noticeboard()
+            present = "new_key" in snap and "k0" not in snap
+            send("assert", (present, True))
+
+        receive_asserts()
+
+
+class TestNoticeDeletePreRuntime:
+    """Tests that notice_delete validates before runtime start."""
+
+    @classmethod
+    def setup_class(cls):
+        """Ensure runtime is stopped before this class runs.
+
+        See TestNoticeboardPreRuntime for rationale.
+        """
+        wait()
+
+    def test_notice_delete_before_start(self):
+        """notice_delete raises RuntimeError before the runtime is started."""
+        with pytest.raises(RuntimeError, match="cannot delete from the noticeboard"):
+            notice_delete("key")
+            notice_sync()
+
+
+class TestNoticeDeleteValidation:
+    """Tests for notice_delete input validation (runtime must be running)."""
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_notice_delete_non_string_key(self):
+        """notice_delete raises TypeError for non-string key."""
+        x = Cown(0)  # triggers runtime start
+
+        @when(x)
+        def _(x):
+            pass
+
+        with pytest.raises(TypeError, match="noticeboard key must be a str"):
+            notice_delete(123)
+            notice_sync()
+
+
+class TestRemovedSentinel:
+    """Tests for the REMOVED sentinel object."""
+
+    def test_removed_is_not_none(self):
+        """REMOVED is distinct from None."""
+        assert REMOVED is not None
+
+    def test_removed_repr(self):
+        """REMOVED has a clear repr."""
+        assert repr(REMOVED) == "REMOVED"
+
+    def test_removed_identity(self):
+        """REMOVED is a singleton."""
+        from bocpy import REMOVED as REMOVED2
+        assert REMOVED is REMOVED2
+
+    def test_removed_is_picklable(self):
+        """REMOVED survives pickle round-trip as identity."""
+        import pickle
+        restored = pickle.loads(pickle.dumps(REMOVED))
+        assert restored is REMOVED
+
+
+class TestNoticeboardVersioning:
+    """Tests for the version-counter-based snapshot cache.
+
+    These tests confirm that the version counter eliminates redundant
+    snapshot rebuilds without breaking the no-polling invariant
+    (see ``test_frozen_snapshot`` and friends in ``TestNoticeboard``).
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        """Ensure runtime is drained after suite."""
+        wait()
+
+    def test_self_write_invisible_within_behavior(self):
+        """A behavior that writes the noticeboard does NOT see its own write.
+
+        This is the no-polling invariant on the writer side: even after
+        the version bump, the writer's thread keeps returning the cached
+        dict for the rest of this behavior.
+        """
+        x = Cown(0)
+
+        @when(x)
+        def step1(x):
+            notice_write("self", "before")
+            notice_sync()
+
+        @when(x, step1)
+        def step2(x, _):
+            before = notice_read("self")
+            notice_write("self", "after")
+            notice_sync()
+            after_same_behavior = notice_read("self")
+            send("assert", (before, "before"))
+            send("assert", (after_same_behavior, "before"))
+
+        @when(x, step2)
+        def step3(x, _):
+            # New behavior — must see the committed write.
+            send("assert", (notice_read("self"), "after"))
+
+        receive_asserts(3)
+
+    def test_snapshot_reused_when_no_writes_intervene(self):
+        """Version is unchanged across read-only behaviors.
+
+        With no writes in flight, the version counter must stay constant
+        no matter how many read-only behaviors run.
+        """
+        from bocpy import noticeboard_version
+
+        x = Cown(0)
+
+        @when(x)
+        def seed(x):
+            notice_write("k", 1)
+            notice_sync()
+
+        # Drain the seed behavior by chaining a subsequent read; this
+        # ensures the write has landed before we sample the version.
+        @when(x, seed)
+        def warm(x, _):
+            send("assert", (notice_read("k"), 1))
+
+        receive_asserts()
+
+        # Now run N read-only behaviors and watch the version.
+        before = noticeboard_version()
+        n = 20
+
+        for _ in range(n):
+            @when(x)
+            def reader(x):
+                send("assert", (notice_read("k"), 1))
+
+        receive_asserts(n)
+
+        after = noticeboard_version()
+        assert after == before, (
+            f"version moved from {before} to {after} across {n} "
+            f"read-only behaviors; no writes were issued")
+
+    def test_writes_advance_version(self):
+        """Each notice_write strictly increases the version counter."""
+        x = Cown(0)
+
+        @when(x)
+        def seed(x):
+            notice_write("vk", 0)
+            notice_sync()
+            # Reading noticeboard_version() from the test thread would
+            # race the noticeboard mutator thread; the result-cown of
+            # this behavior carries the sample safely into `check`.
+            return noticeboard_version()
+
+        n = 5
+        for _ in range(n):
+            @when(x)
+            def writer(x):
+                notice_write("vk", 1)
+                notice_sync()
+
+        @when(x)
+        def sample(x):
+            # Runs after every writer because all share `x`. Every
+            # writer's notice_sync() committed before its behavior
+            # released x, so the version we read here reflects all of
+            # them.
+            return noticeboard_version()
+
+        @when(seed, sample)
+        def check(before, after):
+            # `before` and `after` are the result-cowns of the upstream
+            # behaviors; their values are the noticeboard_version() ints.
+            send("assert", (after.value - before.value, n))
+
+        receive_asserts()
+
+    def test_cross_behavior_visibility_preserved(self):
+        """Sanity: write in A is visible in B (no regression vs baseline)."""
+        x = Cown(0)
+
+        @when(x)
+        def writer(x):
+            notice_write("xv", "from_A")
+            notice_sync()
+
+        @when(x, writer)
+        def reader(x, _):
+            send("assert", (notice_read("xv"), "from_A"))
+
+        receive_asserts()
+
+
+class TestNoticeboardVersionAPI:
+    """Public-API surface tests for ``noticeboard_version``."""
+
+    def test_returns_int(self):
+        """The version is an int."""
+        from bocpy import noticeboard_version
+        v = noticeboard_version()
+        assert isinstance(v, int)
+        assert v >= 0
+
+    def test_monotonic(self):
+        """The version never decreases between consecutive reads."""
+        from bocpy import noticeboard_version
+        a = noticeboard_version()
+        b = noticeboard_version()
+        assert b >= a
diff --git a/test/test_scheduling_stress.py b/test/test_scheduling_stress.py
new file mode 100644
index 0000000..70632cd
--- /dev/null
+++ b/test/test_scheduling_stress.py
@@ -0,0 +1,590 @@
+"""Scheduling stress tests for the BOC runtime.
+
+These tests exercise the distributed-scheduling hot path under load.
+They use **only** BOC primitives — no OS threads — because mixing OS threads
+with @when behaviors is brittle (workers run in sub-interpreters and the
+test thread cannot directly observe per-cown state).
+
+Each test ships its results out via send/receive so the test thread can
+synchronize with completion.
+"""
+
+import os
+from unittest import mock
+
+from bocpy import _core
+from bocpy import Cown, drain, receive, send, TIMEOUT, wait, when
+import bocpy.behaviors as _behaviors
+import pytest
+
+
+RECEIVE_TIMEOUT = 30
+
+
+# ---------------------------------------------------------------------------
+# Helpers (module-level so workers can import them)
+# ---------------------------------------------------------------------------
+
+
+def _drain_done():
+    """Drop any leftover 'done' messages between tests."""
+    drain("done")
+
+
+def _collect_done(expected: int, timeout: int = RECEIVE_TIMEOUT):
+    """Block until `expected` 'done' messages arrive; return their payloads.
+
+    Fails the test with a clear message on timeout instead of hanging.
+    """
+    payloads = []
+    timed_out = False
+    try:
+        for _ in range(expected):
+            tag, payload = receive("done", timeout)
+            if tag == TIMEOUT:
+                timed_out = True
+                break
+            payloads.append(payload)
+    finally:
+        drain("done")
+    assert not timed_out, (
+        f"Timed out waiting for 'done' messages: got {len(payloads)} of "
+        f"{expected}. A behavior likely failed to schedule or run."
+    )
+    return payloads
+
+
+class Counter:
+    """Plain integer counter wrapped in a Cown.
+
+    No locking is needed: BOC guarantees exclusive access to a cown's value
+    inside a behavior, so a per-cown int is a sound oracle for fan-out tests.
+    """
+
+    __slots__ = ("count",)
+
+    def __init__(self):
+        """Initialize the counter at zero."""
+        self.count = 0
+
+
+# ---------------------------------------------------------------------------
+# Fan-out: N behaviors over M cowns, disjoint and overlapping
+# ---------------------------------------------------------------------------
+
+
+class TestSchedulingFanOut:
+    """N behaviors fan out across M cowns; each cown's count is an oracle."""
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    @pytest.mark.parametrize("n,m", [(1000, 32), (200, 4), (500, 1)])
+    def test_disjoint_fan_out(self, n: int, m: int):
+        """N behaviors target round-robin across M cowns; sum must equal N."""
+        cowns = [Cown(Counter()) for _ in range(m)]
+
+        for i in range(n):
+            target = cowns[i % m]
+
+            @when(target)
+            def _(c):
+                c.value.count += 1
+
+        # Read each counter back through a behavior and report it.
+        for idx, c in enumerate(cowns):
+            @when(c)
+            def _(c):
+                send("done", (idx, c.value.count))  # noqa: B023
+
+        results = _collect_done(m)
+
+        per_cown = {idx: count for idx, count in results}
+        assert sum(per_cown.values()) == n, per_cown
+        # Each cown should see exactly its round-robin share.
+        for idx in range(m):
+            expected_share = n // m + (1 if idx < n % m else 0)
+            assert per_cown[idx] == expected_share, (idx, per_cown)
+
+    @pytest.mark.parametrize("n,m", [(500, 8), (1000, 16)])
+    def test_overlapping_fan_out(self, n: int, m: int):
+        """Each behavior locks two adjacent cowns; both increment.
+
+        Sum of all counters must equal 2 * N.
+        """
+        cowns = [Cown(Counter()) for _ in range(m)]
+
+        for i in range(n):
+            a = cowns[i % m]
+            b = cowns[(i + 1) % m]
+
+            @when(a, b)
+            def _(a, b):
+                a.value.count += 1
+                b.value.count += 1
+
+        for idx, c in enumerate(cowns):
+            @when(c)
+            def _(c):
+                send("done", (idx, c.value.count))  # noqa: B023
+
+        results = _collect_done(m)
+        total = sum(count for _, count in results)
+        assert total == 2 * n, results
+
+
+# ---------------------------------------------------------------------------
+# Sustained load: long-running schedule that must drain via wait()
+# ---------------------------------------------------------------------------
+
+
+class TestSchedulingSustainedLoad:
+    """Schedule a large bounded workload and ensure it completes."""
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def test_bounded_completion(self):
+        """Schedule many behaviors; each reports done; wait collects them all.
+
+        This is the bounded-workload variant. The full ≥30 s sustained-load
+        run is gated by the BOCPY_STRESS_LONG environment variable so CI
+        stays fast; set it locally to exercise long runs.
+        """
+        n = 2000 if not os.environ.get("BOCPY_STRESS_LONG") else 100_000
+        cowns = [Cown(Counter()) for _ in range(8)]
+
+        for i in range(n):
+            target = cowns[i % len(cowns)]
+
+            @when(target)
+            def _(c):
+                c.value.count += 1
+                send("done", 1)
+
+        # Use a generous timeout proportional to n; wait fails noisily if a
+        # behavior is dropped.
+        timeout = max(RECEIVE_TIMEOUT, n // 100)
+        payloads = _collect_done(n, timeout=timeout)
+        assert len(payloads) == n
+
+
+# ---------------------------------------------------------------------------
+# Dedup regression: @when(c, c) must run exactly once per scheduling
+# ---------------------------------------------------------------------------
+
+
+class TestSchedulingDedup:
+    """A repeated cown in @when must not double-acquire or double-run."""
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def test_when_same_cown_twice_runs_once(self):
+        """@when(c, c) schedules exactly one behavior invocation."""
+        c = Cown(Counter())
+
+        @when(c, c)
+        def _(a, b):
+            # a and b are separate Python wrappers but back the same cown,
+            # so they observe the same underlying value object.
+            a.value.count += 1
+            send("done", a.value is b.value)
+
+        payloads = _collect_done(1)
+        # Both parameters should expose the same underlying value.
+        assert payloads == [True]
+
+        @when(c)
+        def _(c):
+            send("done", c.value.count)
+
+        [count] = _collect_done(1)
+        assert count == 1, f"dedup failed: counter={count}"
+
+    def test_when_repeated_cown_many_times(self):
+        """Scheduling N copies of @when(c, c) yields exactly N increments."""
+        c = Cown(Counter())
+        n = 100
+
+        for _ in range(n):
+            @when(c, c)
+            def _(a, b):
+                a.value.count += 1
+
+        @when(c)
+        def _(c):
+            send("done", c.value.count)
+
+        [count] = _collect_done(1)
+        assert count == n, f"expected {n}, got {count}"
+
+
+# ---------------------------------------------------------------------------
+# Drain-with-recycle-flush: terminator + recycle invariant after wait()
+# ---------------------------------------------------------------------------
+
+
+class TestSchedulingDrainRecycleFlush:
+    """Verify the terminator and recycle queue invariants after ``wait()``.
+
+    After a normal drain via ``wait()``, the C-level terminator counter
+    must return to zero and a forced recycle-queue flush must be a no-op
+    (no double-frees, no live entries left).
+
+    An earlier draft of this test also wanted a per-BOCBehavior refcount
+    assertion,
+    but that counter is only exposed under the compile-time
+    ``BOC_REF_TRACKING`` build flag. The terminator counter is a strict
+    superset for the leak-detection purpose: every behavior takes one
+    terminator hold via ``whencall`` and releases it on the worker thread
+    after ``behavior_release_all``, so a behavior that is leaked (or whose
+    release is dropped) keeps the count above zero.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def test_terminator_returns_to_zero_after_wait(self):
+        """Schedule N disjoint behaviors; wait(); count must be 0."""
+        n = 256
+        cowns = [Cown(Counter()) for _ in range(8)]
+
+        for i in range(n):
+            target = cowns[i % len(cowns)]
+
+            @when(target)
+            def _(c):
+                c.value.count += 1
+                send("done", 1)
+
+        payloads = _collect_done(n)
+        assert len(payloads) == n
+        # wait() drains and stops; terminator_count() should observe a
+        # quiesced runtime. A non-zero value indicates a leaked hold.
+        wait()
+        assert _core.terminator_count() == 0
+
+    def test_recycle_after_wait_is_idempotent(self):
+        """Forced recycle-queue flush after wait() must not crash or leak."""
+        cowns = [Cown(Counter()) for _ in range(4)]
+
+        for c in cowns:
+            @when(c)
+            def _(c):
+                c.value.count += 1
+                send("done", 1)
+
+        _collect_done(len(cowns))
+        wait()
+        # Two flushes back-to-back: the second must be a no-op.
+        _core.recycle()
+        _core.recycle()
+        assert _core.terminator_count() == 0
+
+
+# ---------------------------------------------------------------------------
+# whencall rollback: a failed behavior_schedule must release the terminator
+# ---------------------------------------------------------------------------
+
+
+class TestWhencallRollback:
+    """Verify that a failed ``behavior_schedule`` releases its terminator hold.
+
+    The ``whencall`` helper takes a terminator hold via ``terminator_inc``
+    before it dispatches to ``behavior_schedule``. If the schedule call
+    raises (which is normally the unreachable post-prepare branch, but is
+    reachable defensively if a future C-level invariant is violated), the
+    Python ``try/except`` MUST drop the hold via ``terminator_dec`` so
+    ``wait()`` can complete.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def _baseline(self):
+        # Drive the runtime to a quiesced state with no outstanding holds.
+        wait()
+        # Trigger a fresh start without scheduling anything. start()
+        # leaves the terminator at (count=1, seeded=1) -- the seed
+        # contribution that wait()/stop() drops via terminator_seed_dec.
+        # We do not schedule a probe behavior here because the worker's
+        # release/decrement happens after the behavior body returns and
+        # there is no synchronisation point that proves the decrement
+        # has landed before the test thread snapshots the count.
+        from bocpy import start as _start_runtime
+        _start_runtime()
+
+    def test_rollback_after_schedule_raises(self):
+        """A raising ``BehaviorCapsule.schedule`` must leave terminator_count at 0."""
+        self._baseline()
+
+        # After _baseline the runtime is alive (start() ran) but no
+        # behaviors are in flight. The terminator still carries the
+        # seed contribution (count == 1, seeded == 1) until stop().
+        # whencall increments above the seed and a clean rollback must
+        # bring count back to exactly the pre-call value.
+        before = _core.terminator_count()
+
+        sentinel = RuntimeError("synthetic schedule failure")
+        fake_capsule = mock.MagicMock()
+        fake_capsule.schedule.side_effect = sentinel
+        with mock.patch.object(
+            _behaviors._core, "BehaviorCapsule",
+            return_value=fake_capsule,
+        ):
+            c = Cown(Counter())
+            with pytest.raises(RuntimeError) as info:
+                @when(c)
+                def _(c):
+                    c.value.count += 1
+            assert info.value is sentinel
+
+        # The mocked failure must not leave a dangling terminator hold:
+        # whencall caught the raise and called terminator_dec.
+        assert _core.terminator_count() == before
+        # And the runtime should still be usable for fresh behaviors.
+        c2 = Cown(Counter())
+
+        @when(c2)
+        def _(c):
+            c.value.count += 1
+            send("done", 1)
+
+        _collect_done(1)
+        wait()
+        assert _core.terminator_count() == 0
+
+
+# ---------------------------------------------------------------------------
+# stop()-vs-schedule race: a closed terminator must reject new whencalls
+# ---------------------------------------------------------------------------
+
+
+class TestStopVsScheduleRace:
+    """Verify that ``stop()`` fences subsequent ``whencall`` attempts.
+
+    ``stop()`` (called by ``wait()``) closes the terminator and any
+    subsequent ``terminator_inc`` MUST return -1 so ``whencall`` raises
+    ``RuntimeError("runtime is shutting down")`` rather than racing
+    teardown. The runtime must then be restartable on the next ``@when``.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def test_terminator_inc_refuses_after_close(self):
+        """``terminator_inc`` returns -1 once ``terminator_close`` has run."""
+        # wait() quiesces the runtime and runs terminator_close internally,
+        # leaving (count=0, seeded=0, closed=1). A direct terminator_inc
+        # call from the test thread must therefore be refused.
+        wait()
+        rc = _core.terminator_inc()
+        assert rc < 0, f"terminator_inc returned {rc}, expected -1"
+
+        # The runtime must still be restartable on the next @when. The
+        # Behaviors.start() path runs terminator_reset which raises drift
+        # only if our refused inc somehow took effect (it must not have).
+        c = Cown(Counter())
+
+        @when(c)
+        def _(c):
+            send("done", 1)
+
+        _collect_done(1)
+        wait()
+        assert _core.terminator_count() == 0
+
+    def test_whencall_raises_after_close(self):
+        """``@when`` directly after a refused inc must surface RuntimeError.
+
+        We monkey-patch ``terminator_inc`` to return -1 (the refusal
+        sentinel), since once a real ``terminator_close`` has fenced the
+        runtime the entire @when path is shut and there is no test hook
+        to reopen it without going through ``start()``. The patch
+        targets the same underlying C function via the Python module
+        binding the whencall helper actually consults.
+        """
+        # First make sure the runtime is alive so @when does not try to
+        # restart it during the patched call.
+        c0 = Cown(Counter())
+
+        @when(c0)
+        def _(c):
+            send("done", 1)
+
+        _collect_done(1)
+
+        with mock.patch.object(
+            _behaviors._core, "terminator_inc",
+            return_value=-1,
+        ):
+            c = Cown(Counter())
+            with pytest.raises(RuntimeError, match="shutting down"):
+                @when(c)
+                def _(c):
+                    c.value.count += 1
+
+        # whencall short-circuited at terminator_inc; no hold leaked,
+        # no behavior_schedule was called.
+        wait()
+        assert _core.terminator_count() == 0
+
+
+# ---------------------------------------------------------------------------
+# Worker error-path resilience: a failing behavior body must not strand
+# wait() or take a worker out of rotation.
+# ---------------------------------------------------------------------------
+
+
+class _Boom(Exception):
+    """Sentinel exception raised by the worker-resilience tests."""
+
+
+def _raise_boom(c):
+    """Behavior body that always raises ``_Boom``.
+
+    Module-level so the worker can import it via the transpiler export.
+    """
+    raise _Boom("synthetic body failure")
+
+
+class TestWorkerErrorPath:
+    """Verify worker resilience when behavior bodies raise.
+
+    A raising behavior body must:
+
+    * have its terminator hold dropped (``wait()`` returns),
+    * leave the worker in the receive loop (next @when on the same cown
+      runs to completion), and
+    * propagate the exception via the result Cown's ``.exception``.
+
+    These properties hold because ``run_behavior`` wraps the body in
+    its own ``try/except`` and ``do_work`` wraps each iteration in a
+    ``try/except`` so a failure cannot break the worker loop.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def test_raising_body_does_not_strand_wait(self):
+        """A single raising behavior must let ``wait()`` complete."""
+        c = Cown(Counter())
+
+        @when(c)
+        def _(c):
+            _raise_boom(c)
+
+        wait()
+        assert _core.terminator_count() == 0
+
+    def test_raising_body_sets_exception_on_result(self):
+        """The result Cown must carry the body's exception."""
+        c = Cown(Counter())
+
+        @when(c)
+        def result(c):
+            _raise_boom(c)
+
+        wait()
+        assert result.exception is True
+        assert isinstance(result.value, _Boom)
+
+    def test_workers_survive_many_raising_behaviors(self):
+        """N raising behaviors must not take any worker out of rotation.
+
+        Schedule far more raising behaviors than workers, then schedule
+        a follow-up batch of well-behaved behaviors that emit on
+        ``done``. If any worker had broken out of its loop, we would
+        miss messages and ``_collect_done`` would time out.
+        """
+        n_raising = 200
+        n_followup = 50
+
+        raising_cowns = [Cown(Counter()) for _ in range(n_raising)]
+        for c in raising_cowns:
+            @when(c)
+            def _(c):
+                _raise_boom(c)
+
+        followup_cowns = [Cown(Counter()) for _ in range(n_followup)]
+        for i, c in enumerate(followup_cowns):
+            @when(c)
+            def _(c):
+                send("done", i)  # noqa: B023
+
+        payloads = _collect_done(n_followup)
+        assert sorted(payloads) == list(range(n_followup))
+        wait()
+        assert _core.terminator_count() == 0
+
+
+# ---------------------------------------------------------------------------
+# Noticeboard startup handshake: a failed set_noticeboard_thread() must be
+# surfaced on the calling thread, not silently strand the runtime.
+# ---------------------------------------------------------------------------
+
+
+class TestNoticeboardStartupHandshake:
+    """Verify that a failed noticeboard claim surfaces on the starter thread.
+
+    ``start_noticeboard`` waits until the thread either claims the
+    C-level single-writer slot or captures the failure exception. A
+    failed claim must propagate as ``RuntimeError`` rather than leave
+    the runtime in a half-started state where ``notice_*`` writes
+    enqueue forever with no consumer.
+    """
+
+    @classmethod
+    def teardown_class(cls):
+        wait()
+        _drain_done()
+
+    def test_failed_claim_raises_on_start(self):
+        """``start()`` must raise if ``set_noticeboard_thread`` raises."""
+        # Quiesce any prior runtime so the next @when triggers a fresh start.
+        wait()
+
+        sentinel = RuntimeError("synthetic claim failure")
+        with mock.patch.object(
+            _behaviors._core, "set_noticeboard_thread",
+            side_effect=sentinel,
+        ):
+            c = Cown(Counter())
+            with pytest.raises(RuntimeError, match="noticeboard thread"):
+                @when(c)
+                def _(c):
+                    c.value.count += 1
+
+        # The failed start must reset the global runtime slot so the
+        # next @when triggers a fresh start() rather than reusing the
+        # half-initialised Behaviors instance whose noticeboard thread
+        # is already dead.
+        assert _behaviors.BEHAVIORS is None
+
+        # The runtime must be re-startable once the synthetic failure is
+        # withdrawn. A successful @when proves the next start_noticeboard
+        # claimed the slot cleanly.
+        c2 = Cown(Counter())
+
+        @when(c2)
+        def _(c):
+            send("done", 1)
+
+        _collect_done(1)
+        wait()
+        assert _core.terminator_count() == 0
diff --git a/test/test_transpiler.py b/test/test_transpiler.py
new file mode 100644
index 0000000..e34791d
--- /dev/null
+++ b/test/test_transpiler.py
@@ -0,0 +1,553 @@
+"""Tests for the transpiler module."""
+
+import ast
+import os
+import textwrap
+
+from bocpy.transpiler import BOCModuleTransformer, CapturedVariableFinder, export_module
+
+
+# ── CapturedVariableFinder ──────────────────────────────────────────────
+
+
+class TestCapturedParams:
+    """Function parameters must never appear as captured variables."""
+
+    @staticmethod
+    def _captures(source, known_vars=frozenset()):
+        tree = ast.parse(textwrap.dedent(source))
+        finder = CapturedVariableFinder(set(known_vars))
+        finder.visit(tree.body[0])
+        return finder.captured_vars
+
+    def test_positional_params_excluded(self):
+        assert self._captures("""\
+            def f(a, b):
+                return a + b
+        """) == set()
+
+    def test_vararg_excluded(self):
+        assert self._captures("""\
+            def f(*args):
+                return args
+        """) == set()
+
+    def test_kwarg_excluded(self):
+        assert self._captures("""\
+            def f(**kwargs):
+                return kwargs
+        """) == set()
+
+    def test_mixed_params_excluded(self):
+        assert self._captures("""\
+            def f(a, *args, **kwargs):
+                return a, args, kwargs
+        """) == set()
+
+
+class TestCapturedLocals:
+    """Assignments and nested function names are local, not captured."""
+
+    @staticmethod
+    def _captures(source, known_vars=frozenset()):
+        tree = ast.parse(textwrap.dedent(source))
+        finder = CapturedVariableFinder(set(known_vars))
+        finder.visit(tree.body[0])
+        return finder.captured_vars
+
+    def test_assignment_target_excluded(self):
+        assert self._captures("""\
+            def f():
+                x = 1
+                return x
+        """) == set()
+
+    def test_nested_function_name_excluded(self):
+        assert self._captures("""\
+            def f():
+                def helper():
+                    pass
+                return helper
+        """) == set()
+
+
+class TestCapturedFreeVars:
+    """Free variables that are not params, locals, or known are captured."""
+
+    @staticmethod
+    def _captures(source, known_vars=frozenset()):
+        tree = ast.parse(textwrap.dedent(source))
+        finder = CapturedVariableFinder(set(known_vars))
+        finder.visit(tree.body[0])
+        return finder.captured_vars
+
+    def test_single_capture(self):
+        assert self._captures("""\
+            def f():
+                return outer
+        """) == {"outer"}
+
+    def test_multiple_captures(self):
+        assert self._captures("""\
+            def f(a):
+                return a + x + y
+        """) == {"x", "y"}
+
+    def test_known_var_not_captured(self):
+        assert self._captures("""\
+            def f():
+                return known
+        """, known_vars={"known"}) == set()
+
+    def test_mixed_locals_and_captures(self):
+        caps = self._captures("""\
+            def f(a):
+                x = 1
+                def h():
+                    pass
+                return a + x + h + captured
+        """)
+        assert caps == {"captured"}
+
+
+class TestCapturedClear:
+    """The clear() method resets state so the finder can be reused."""
+
+    def test_clear_resets_between_visits(self):
+        finder = CapturedVariableFinder(set())
+
+        tree1 = ast.parse("def f():\n    return a")
+        finder.visit(tree1.body[0])
+        assert finder.captured_vars == {"a"}
+
+        finder.clear()
+
+        tree2 = ast.parse("def g():\n    return b")
+        finder.visit(tree2.body[0])
+        assert finder.captured_vars == {"b"}
+        assert "a" not in finder.captured_vars
+
+
+# ── BOCModuleTransformer ────────────────────────────────────────────────
+
+
+class TestModuleTransformerImports:
+    """Import handling: recording names and whencall injection."""
+
+    @staticmethod
+    def _transform(source):
+        tree = ast.parse(textwrap.dedent(source))
+        t = BOCModuleTransformer()
+        t.visit(tree)
+        return t, tree
+
+    def test_import_recorded(self):
+        t, _ = self._transform("import os")
+        assert "os" in t.imports
+
+    def test_from_import_recorded(self):
+        t, _ = self._transform("from sys import path")
+        assert "path" in t.imports
+
+    def test_whencall_injected_when_missing(self):
+        t, tree = self._transform("from bocpy import when, Cown")
+        aliases = [a.name for a in tree.body[0].names]
+        assert "whencall" in aliases
+        assert "whencall" in t.imports
+
+    def test_non_bocpy_import_not_modified(self):
+        _, tree = self._transform("from collections import OrderedDict")
+        aliases = [a.name for a in tree.body[0].names]
+        assert "whencall" not in aliases
+
+    def test_whencall_not_duplicated_when_present(self):
+        _, tree = self._transform("from bocpy import when, whencall, Cown")
+        aliases = [a.name for a in tree.body[0].names]
+        assert aliases.count("whencall") == 1
+
+    def test_whencall_injected_when_aliased(self):
+        t, tree = self._transform("from bocpy import whencall as wc, Cown")
+        aliases = [(a.name, a.asname) for a in tree.body[0].names]
+        # Original aliased import kept, plus bare whencall injected
+        assert ("whencall", "wc") in aliases
+        assert ("whencall", None) in aliases
+        assert "wc" in t.imports
+        assert "whencall" in t.imports
+
+
+class TestModuleTransformerDeclarations:
+    """Classes and functions are recorded; @when functions excluded."""
+
+    @staticmethod
+    def _transform(source):
+        tree = ast.parse(textwrap.dedent(source))
+        t = BOCModuleTransformer()
+        t.visit(tree)
+        return t, tree
+
+    def test_class_recorded(self):
+        t, _ = self._transform("""\
+            class Foo:
+                pass
+        """)
+        assert "Foo" in t.classes
+
+    def test_non_when_function_recorded(self):
+        t, _ = self._transform("""\
+            def helper():
+                pass
+        """)
+        assert "helper" in t.functions
+
+    def test_when_function_not_recorded(self):
+        t, _ = self._transform("""\
+            from bocpy import when, Cown
+
+            @when(x)
+            def behavior(x):
+                pass
+        """)
+        assert "behavior" not in t.functions
+
+    def test_known_vars_is_union(self):
+        t, _ = self._transform("""\
+            import os
+            from sys import path
+
+            class Foo:
+                pass
+
+            def bar():
+                pass
+        """)
+        assert t.known_vars() == {"os", "path", "Foo", "bar"}
+
+
+class TestModuleTransformerFiltering:
+    """Only imports, classes, functions, and eligible assignments survive."""
+
+    @staticmethod
+    def _transform(source):
+        tree = ast.parse(textwrap.dedent(source))
+        t = BOCModuleTransformer()
+        t.visit(tree)
+        return t, tree
+
+    def test_constant_assignment_preserved(self):
+        _, tree = self._transform("x = 42")
+        assert len(tree.body) == 1
+
+    def test_uppercase_non_constant_preserved(self):
+        _, tree = self._transform("CONFIG = some_call()")
+        code = ast.unparse(tree)
+        assert "CONFIG" in code
+
+    def test_lowercase_non_constant_filtered(self):
+        _, tree = self._transform("config = some_call()")
+        assert len(tree.body) == 0
+
+    def test_multi_target_non_constant_filtered(self):
+        _, tree = self._transform("a = b = some_call()")
+        assert len(tree.body) == 0
+
+    def test_multi_target_constant_preserved(self):
+        _, tree = self._transform("a = b = 42")
+        assert len(tree.body) == 1
+
+    def test_for_loop_filtered(self):
+        _, tree = self._transform("""\
+            for i in range(10):
+                pass
+        """)
+        assert len(tree.body) == 0
+
+    def test_bare_expression_filtered(self):
+        _, tree = self._transform('print("hello")')
+        assert len(tree.body) == 0
+
+
+# ── export_module (full pipeline) ───────────────────────────────────────
+
+
+class TestExportBehaviorNaming:
+    """Behaviors are renamed to __behavior__N with sequential numbering."""
+
+    @staticmethod
+    def _export(source, path="/tmp/test.py"):
+        tree = ast.parse(textwrap.dedent(source))
+        return export_module(tree, path)
+
+    def test_single_behavior_named_0(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def first(x):
+                return x.value
+        """)
+        names = [info.name for info in result.behaviors.values()]
+        assert names == ["__behavior__0"]
+        assert "def __behavior__0(" in result.code
+
+    def test_two_behaviors_sequential(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+            y = Cown(2)
+
+            @when(x)
+            def first(x):
+                return x.value
+
+            @when(y)
+            def second(y):
+                return y.value
+        """)
+        names = sorted(info.name for info in result.behaviors.values())
+        assert names == ["__behavior__0", "__behavior__1"]
+
+
+class TestExportCaptures:
+    """Captured variables are recorded and added as behavior parameters."""
+
+    @staticmethod
+    def _export(source, path="/tmp/test.py"):
+        tree = ast.parse(textwrap.dedent(source))
+        return export_module(tree, path)
+
+    def test_capture_appended_as_arg(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+            factor = 3
+
+            @when(x)
+            def scaled(x):
+                return x.value * factor
+        """)
+        info = list(result.behaviors.values())[0]
+        assert "factor" in info.captures
+        # factor must appear as a parameter in the generated behavior def
+        sig = result.code.split("def __behavior__0(")[1].split("):")[0]
+        assert "factor" in sig
+
+    def test_no_captures_when_none_needed(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def identity(x):
+                return x.value
+        """)
+        info = list(result.behaviors.values())[0]
+        assert info.captures == []
+
+
+class TestExportDecoratorStripping:
+    """Generated behavior functions must not carry any decorators."""
+
+    @staticmethod
+    def _export(source, path="/tmp/test.py"):
+        tree = ast.parse(textwrap.dedent(source))
+        return export_module(tree, path)
+
+    def test_no_decorator_on_behavior(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def f(x):
+                return x.value
+        """)
+        gen_tree = ast.parse(result.code)
+        for node in ast.walk(gen_tree):
+            if isinstance(node, ast.FunctionDef) and node.name.startswith("__behavior__"):
+                assert node.decorator_list == [], (
+                    f"{node.name} still has decorators"
+                )
+
+
+class TestExportFileRewrite:
+    """__file__ references inside behaviors are rewritten to the source path."""
+
+    @staticmethod
+    def _export(source, path="/tmp/test.py"):
+        tree = ast.parse(textwrap.dedent(source))
+        return export_module(tree, path)
+
+    def test_file_replaced_with_absolute_path(self):
+        path = "/some/test/file.py"
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def f(x):
+                return __file__
+        """, path=path)
+        # Walk the generated AST and confirm __file__ has been replaced
+        # with the absolute source path as a string constant. Substring
+        # matching against the unparsed source is platform-fragile because
+        # backslashes in Windows paths get escaped during unparse.
+        expected = os.path.abspath(path)
+        gen_tree = ast.parse(result.code)
+        constants = [
+            n.value for n in ast.walk(gen_tree)
+            if isinstance(n, ast.Constant) and n.value == expected
+        ]
+        assert constants, (
+            f"expected absolute path {expected!r} as a string constant in "
+            f"generated code:\n{result.code}"
+        )
+
+    def test_file_capture_does_not_become_parameter(self):
+        """__file__ must be inlined, not added to the behavior's args list.
+
+        Regression: the rewriter previously added every captured free
+        variable (including __file__) as an extra positional parameter.
+        After visit() inlined __file__ to a string Constant the result was
+        an invalid signature like ``def __behavior__0(x, '/path'):``,
+        which only failed at worker import time.
+        """
+        path = "/some/test/file.py"
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def f(x):
+                return __file__
+        """, path=path)
+        gen_tree = ast.parse(result.code)
+        behaviors = [
+            n for n in ast.walk(gen_tree)
+            if isinstance(n, ast.FunctionDef) and n.name.startswith("__behavior__")
+        ]
+        assert behaviors, "no behavior function found in generated code"
+        for b in behaviors:
+            arg_names = [a.arg for a in b.args.args]
+            assert "__file__" not in arg_names, (
+                f"{b.name} should not receive __file__ as a parameter; "
+                f"got args={arg_names}"
+            )
+            assert arg_names == ["x"], (
+                f"{b.name} expected args ['x'], got {arg_names}"
+            )
+
+
+class TestExportNestedWhen:
+    """Nested @when inside a behavior produces multiple behavior functions."""
+
+    @staticmethod
+    def _export(source, path="/tmp/test.py"):
+        tree = ast.parse(textwrap.dedent(source))
+        return export_module(tree, path)
+
+    def test_nested_produces_two_behaviors(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def outer(x):
+                @when(x)
+                def inner(x):
+                    return x.value
+                return inner
+        """)
+        assert len(result.behaviors) == 2
+
+
+class TestExportMetadata:
+    """ExportResult carries class, function, and behavior metadata."""
+
+    @staticmethod
+    def _export(source, path="/tmp/test.py"):
+        tree = ast.parse(textwrap.dedent(source))
+        return export_module(tree, path)
+
+    def test_classes_and_functions_reported(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            class MyClass:
+                pass
+
+            def helper():
+                pass
+        """)
+        assert "MyClass" in result.classes
+        assert "helper" in result.functions
+
+    def test_behavior_keyed_by_line_number(self):
+        result = self._export("""\
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def f(x):
+                return x.value
+        """)
+        assert len(result.behaviors) == 1
+        line = next(iter(result.behaviors.keys()))
+        assert isinstance(line, int)
+        assert line > 0
+
+
+# ── Import alias tests ──────────────────────────────────────────────────
+
+
+class TestImportAlias:
+    """Aliased imports must not appear as captured variables."""
+
+    def test_import_as_not_captured(self):
+        """``import X as Y`` — Y should be known, not captured."""
+        source = textwrap.dedent("""\
+            import collections as col
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def use_alias(x):
+                return col.OrderedDict()
+        """)
+        tree = ast.parse(source)
+        result = export_module(tree)
+
+        for info in result.behaviors.values():
+            assert "col" not in info.captures, (
+                f"'col' should not be captured; captures = {info.captures}"
+            )
+
+    def test_from_import_as_not_captured(self):
+        """``from X import Y as Z`` — Z should be known, not captured."""
+        source = textwrap.dedent("""\
+            from collections import OrderedDict as OD
+            from bocpy import when, whencall, Cown
+
+            x = Cown(1)
+
+            @when(x)
+            def use_alias(x):
+                return OD()
+        """)
+        tree = ast.parse(source)
+        result = export_module(tree)
+
+        for info in result.behaviors.values():
+            assert "OD" not in info.captures, (
+                f"'OD' should not be captured; captures = {info.captures}"
+            )