feat: add monitor/demonitor for unidirectional actor death observation#165
feat: add monitor/demonitor for unidirectional actor death observation#165ElFantasma wants to merge 4 commits intomainfrom
Conversation
🤖 Kimi Code ReviewOverall this is a solid implementation of Erlang-style monitors with proper compile-time safety via the Critical Issues1. Memory Leak in Monitor Table (Both Async and Thread)Files: Monitor table entries are never cleaned up when the target actor exits. They are only removed on explicit Fix: The spawned watcher task should remove its entry after delivering (or skipping) the // In tasks/actor.rs around line 270-282
rt::spawn(async move {
let reason = target.wait_exit_async().await;
let should_send = active.load(Ordering::Acquire);
// Clean up the entry
actor_ref.monitors.lock()
.unwrap_or_else(|p| p.into_inner())
.remove(&monitor_ref);
if should_send {
let _ = actor_ref.send(Down { monitor_ref, reason });
}
});2. Race Condition Window (Documented but Worsened by Missing Cleanup)Files: While the doc comment acknowledges "best-effort" cancellation, the current implementation widens the race window because the Recommendation: Combine the cleanup with the atomic check using Minor Issues3. Unused
|
🤖 Claude Code ReviewPR #165 —
|
Greptile SummaryThis PR introduces unidirectional actor monitoring (
Confidence Score: 3/5The core design is sound, but the MonitorTable grows without bound in any actor that monitors more than a handful of targets over its lifetime — a direct fit for the intended supervision use case. After a watcher delivers (or skips) a Down notification its entry is never removed from the monitors HashMap; only an explicit demonitor call removes entries. A supervisor monitoring repeatedly-restarting children will accumulate one dead entry per monitored lifetime, which compounds quickly in the exact pattern this feature is designed for. Both concurrency/src/tasks/actor.rs and concurrency/src/threads/actor.rs carry the leak. The fix is symmetric in both files: capture a clone of the MonitorTable inside the watcher closure and call .remove(&monitor_ref) after the delivery check.
|
| Filename | Overview |
|---|---|
| concurrency/src/monitor.rs | New file defining MonitorRef (global atomic u64 counter) and Down message. Clean, minimal, correct. |
| concurrency/src/tasks/actor.rs | Adds monitor/demonitor to tasks-mode Context. MonitorTable entries are leaked after a monitor fires — only demonitor removes them. |
| concurrency/src/threads/actor.rs | Adds monitor/demonitor to threads-mode Context. Same MonitorTable leak as tasks mode; additionally spawns one OS thread per monitor. |
| concurrency/src/lib.rs | Re-exports Down and MonitorRef from the new monitor module. |
| examples/exit_reason/src/main.rs | Adds Scenario 8 demonstrating monitor on two workers (clean stop + panic). Straightforward and correct. |
Sequence Diagram
sequenceDiagram
participant S as Supervisor (monitoring actor)
participant C as Context
participant MT as MonitorTable
participant W as Watcher task/thread
participant T as Target actor
S->>C: "ctx.monitor(&target_handle)"
C->>MT: "insert(MonitorRef, active=true)"
C->>W: spawn(wait for target + deliver Down)
C-->>S: MonitorRef
alt target dies naturally
T-->>W: exit reason available
W->>MT: (entry NOT removed after firing)
W->>C: "check active.load() == true"
W->>S: "send Down { monitor_ref, reason }"
else demonitor called first
S->>C: ctx.demonitor(monitor_ref)
C->>MT: remove(monitor_ref), active.store(false)
T-->>W: exit reason available
W->>C: "check active.load() == false"
note over W: Down suppressed
end
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
concurrency/src/tasks/actor.rs:278-291
**MonitorTable entries never removed after a monitor fires**
After the watcher task completes — whether it delivered `Down` or was skipped because `active` was false — the `MonitorRef → Arc<AtomicBool>` entry stays in the `monitors` HashMap forever. Only an explicit `demonitor` call removes entries. For a supervisor that monitors hundreds of short-lived children across restarts, this produces unbounded growth of the table. The fix is to capture an `Arc<Mutex<HashMap<…>>>` clone in the watcher and call `.remove(&monitor_ref)` after the delivery check, mirroring the cleanup that `demonitor` performs.
### Issue 2 of 3
concurrency/src/threads/actor.rs:240-252
**Same MonitorTable leak applies to threads mode**
The threads-mode watcher thread has the same issue: the `MonitorRef` entry is never removed from the `monitors` table after the watcher finishes (regardless of whether it delivered `Down`). The leak affects both execution modes identically.
### Issue 3 of 3
concurrency/src/threads/actor.rs:233-246
**One unbounded OS thread per monitor**
Each `monitor()` call in threads mode spawns a dedicated OS thread (via `rt::spawn`) that blocks indefinitely on `wait_exit_blocking()`. OS threads carry a fixed stack reservation (typically 2–8 MB on Linux). A supervisor monitoring 50 long-lived children simultaneously holds 50 blocked OS threads. The PR description calls this acceptable for "typical supervision patterns," but this constraint is not expressed in any API-level documentation or enforced at runtime. At minimum, the doc-comment on `monitor` in threads mode should note that each active monitor occupies one OS thread for the duration of the target's lifetime.
Reviews (1): Last reviewed commit: "feat: add monitor/demonitor for unidirec..." | Re-trigger Greptile
| let reason = target.wait_exit_async().await; | ||
| if active.load(Ordering::Acquire) { | ||
| let _ = actor_ref.send(Down { | ||
| monitor_ref, | ||
| reason, | ||
| }); | ||
| } | ||
| }); | ||
|
|
||
| monitor_ref | ||
| } | ||
|
|
||
| /// Cancel a previously-set monitor. | ||
| /// |
There was a problem hiding this comment.
MonitorTable entries never removed after a monitor fires
After the watcher task completes — whether it delivered Down or was skipped because active was false — the MonitorRef → Arc<AtomicBool> entry stays in the monitors HashMap forever. Only an explicit demonitor call removes entries. For a supervisor that monitors hundreds of short-lived children across restarts, this produces unbounded growth of the table. The fix is to capture an Arc<Mutex<HashMap<…>>> clone in the watcher and call .remove(&monitor_ref) after the delivery check, mirroring the cleanup that demonitor performs.
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 278-291
Comment:
**MonitorTable entries never removed after a monitor fires**
After the watcher task completes — whether it delivered `Down` or was skipped because `active` was false — the `MonitorRef → Arc<AtomicBool>` entry stays in the `monitors` HashMap forever. Only an explicit `demonitor` call removes entries. For a supervisor that monitors hundreds of short-lived children across restarts, this produces unbounded growth of the table. The fix is to capture an `Arc<Mutex<HashMap<…>>>` clone in the watcher and call `.remove(&monitor_ref)` after the delivery check, mirroring the cleanup that `demonitor` performs.
How can I resolve this? If you propose a fix, please make it concise.| let reason = target.wait_exit_blocking(); | ||
| if active.load(Ordering::Acquire) { | ||
| let _ = actor_ref.send(Down { | ||
| monitor_ref, | ||
| reason, | ||
| }); | ||
| } | ||
| }); | ||
|
|
||
| monitor_ref | ||
| } | ||
|
|
||
| /// Cancel a previously-set monitor. |
There was a problem hiding this comment.
Same MonitorTable leak applies to threads mode
The threads-mode watcher thread has the same issue: the MonitorRef entry is never removed from the monitors table after the watcher finishes (regardless of whether it delivered Down). The leak affects both execution modes identically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/threads/actor.rs
Line: 240-252
Comment:
**Same MonitorTable leak applies to threads mode**
The threads-mode watcher thread has the same issue: the `MonitorRef` entry is never removed from the `monitors` table after the watcher finishes (regardless of whether it delivered `Down`). The leak affects both execution modes identically.
How can I resolve this? If you propose a fix, please make it concise.| .unwrap_or_else(|p| p.into_inner()) | ||
| .insert(monitor_ref, active.clone()); | ||
|
|
||
| let target = target.clone(); | ||
| let actor_ref = self.actor_ref(); | ||
|
|
||
| rt::spawn(move || { | ||
| let reason = target.wait_exit_blocking(); | ||
| if active.load(Ordering::Acquire) { | ||
| let _ = actor_ref.send(Down { | ||
| monitor_ref, | ||
| reason, | ||
| }); | ||
| } |
There was a problem hiding this comment.
One unbounded OS thread per monitor
Each monitor() call in threads mode spawns a dedicated OS thread (via rt::spawn) that blocks indefinitely on wait_exit_blocking(). OS threads carry a fixed stack reservation (typically 2–8 MB on Linux). A supervisor monitoring 50 long-lived children simultaneously holds 50 blocked OS threads. The PR description calls this acceptable for "typical supervision patterns," but this constraint is not expressed in any API-level documentation or enforced at runtime. At minimum, the doc-comment on monitor in threads mode should note that each active monitor occupies one OS thread for the duration of the target's lifetime.
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/threads/actor.rs
Line: 233-246
Comment:
**One unbounded OS thread per monitor**
Each `monitor()` call in threads mode spawns a dedicated OS thread (via `rt::spawn`) that blocks indefinitely on `wait_exit_blocking()`. OS threads carry a fixed stack reservation (typically 2–8 MB on Linux). A supervisor monitoring 50 long-lived children simultaneously holds 50 blocked OS threads. The PR description calls this acceptable for "typical supervision patterns," but this constraint is not expressed in any API-level documentation or enforced at runtime. At minimum, the doc-comment on `monitor` in threads mode should note that each active monitor occupies one OS thread for the duration of the target's lifetime.
How can I resolve this? If you propose a fix, please make it concise.
🤖 Codex Code ReviewFindings
The new tests only cover target-exits-first. I’d add a regression where the watcher stops before the target, because that is where issue 1 shows up. I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Bot Review ResponsesMost findings already addressed in ee7ebc5; remaining items addressed in 4067f09. Already fixed (ee7ebc5)
Fixed. Watcher now removes its entry from the
Fixed. Added Now fixed (4067f09)
Fixed. Added doc note on threads-mode
Fixed. Removed the dead field.
Fixed. Added Discussed, deliberately not changing
After discussion, decided against fixing this. The cost is minimal (one tokio task or OS thread holding a few
Acknowledged. Consistent with existing test patterns in this codebase. Could be improved with a polling helper but not blocking for this PR.
Consistent with the rest of the codebase. The spawn/wait primitives differ between modes, making meaningful sharing awkward.
|
Summary
MonitorRefandDowntypes in newconcurrency/src/monitor.rsContext::monitor(&child_handle) -> MonitorRefspawns a watcher that deliversDown { monitor_ref, reason }to the monitoring actor when the target stopsContext::demonitor(monitor_ref)cancels the monitor (flag-based: watcher checks before delivering)DownimmediatelyMonitorRefexit_reasonexample with Scenario 8 demonstrating an Observer monitoring two workersPhase 3c of the supervision-trees roadmap. Building block alongside ChildHandle (#164) and ExitReason (#163) for the supervisor work to come.
Design notes
MonitorTable(Arc<Mutex<HashMap<MonitorRef, Arc<AtomicBool>>>>) shared acrossContextandActorRefclonesChildHandle::wait_exit_*, then atomically checks the active flag before sendingDowndemonitorremoves the entry from the table and clears the flag — the watcher still runs to completion but won't deliverWhy flag-based cancellation
Tokio tasks and OS threads cannot be cleanly cancelled from outside while waiting on a blocking operation (the threads-mode
wait_exit_blockingblocks on a Condvar). Using a "deliver if active" flag keeps the implementation symmetric across modes and avoids the runtime-flavor pitfalls we hit in PR #164. The watcher resource is held until the target eventually dies, but this is acceptable for typical supervision patterns where children eventually exit.Test plan
cargo run -p exit_reason(Scenario 8 demonstrates monitor)