Releases: microsoft/bocpy
Releases · microsoft/bocpy
v0.5.0
Highlights
This release delivers a Verona-RT-style work-stealing scheduler, a global noticeboard (shared key-value store), removal of the central scheduler thread in favour of direct dispatch, and a major C source refactor into per-subsystem translation units with a portable atomics layer.
New Features
- Work-stealing scheduler — the single behavior queue is replaced with a distributed scheduler. Each worker owns an MPMC behavior queue, pops locally first, and steals from peers when idle. Idle workers park on per-worker condition variables and are signalled directly by producer/victim.
- Per-worker fairness tokens — a token node advances through each worker's queue so long-running behaviors cannot monopolise dispatch slots; also drives cooperative shutdown.
- Noticeboard — a shared key-value store (up to 64 keys) readable/writable without acquiring cowns. Writes are non-blocking; reads return a cached per-behavior snapshot. Includes
notice_write,notice_read,notice_update,notice_delete,notice_sync,noticeboard_version, and theREMOVEDsentinel. - Distributed scheduler — two-phase locking, request linking, and dispatch run directly on the caller's thread in C; cown release runs on the executing worker. MCS-style intrusive linked list per cown for zero-bounce handoff.
Cown.exceptionproperty — indicates whether the held value is from an unhandled exception.compat.h/compat.cportability layer — uniformBOCMutex,BOCCond,boc_atomic_*_explicit, monotonic-time, and sleep primitives across MSVC, pthreads, and C11<threads.h>.xidata.hcross-interpreter shim — centralised_PyXIData_*/_PyCrossInterpreterData_*version ladders for CPython 3.12–3.15 (including free-threaded builds).fanout_benchmarkexample — fan-out/fan-in benchmark exercising scheduler throughput under heavy producer load.- Prime factor example (
examples/prime_factor.py) — parallel factorisation via Pollard's rho with noticeboard-coordinated early termination. - Benchmark harness (
examples/benchmark.py) — micro-benchmarks for scheduling throughput, message-queue latency, and noticeboard contention.
Bug Fixes
- Transpiler aliased imports —
visit_Import/visit_ImportFromnow track alias names (import X as Y), preventing spurious "name not found" errors and duplicatewhencallinjection. - Global variable capture —
@whenclosure capture falls back toframe.f_globalswhen a name is not in any local scope, fixingNameErrorfor module-level variables.
Improvements
- In-memory transpiled-module loading — workers
execthe transpiled source from a string literal instead of writing to disk, eliminating filesystem round-trips and leftover.pyfiles. - Nested
@whencapture — the transpiler recurses into nested@when-decorated functions when computing outer captures, so child behaviors can close over the outer frame. - C extension split —
_core.creduced from ~5,000 to ~3,500 lines by extractingsched.{c,h},noticeboard.{c,h},terminator.{c,h},tags.{c,h},cown.h,compat.{c,h}, andxidata.h. - Direct dispatch on cown release —
behavior_release_allhands resolved successors directly to workers viaboc_sched_dispatch, removing one queue hop per handoff. - Cooperative worker shutdown —
boc_sched_worker_request_stop_all/boc_sched_unpause_allprovide a clean stop/drain protocol. - Matrix docstrings — all
MatrixC methods now carry built-in docstrings. - Examples package relocated — moved to top-level
examples/directory (still importable asbocpy.examples). - Filtered PyPI README —
setup.pystrips<!-- pypi-skip-start -->regions before publishing. - Documentation refresh — expanded coverage of noticeboard, distributed scheduler, and new APIs.
Internal Test Modules (opt-in via BOCPY_BUILD_INTERNAL_TESTS=1)
_internal_test_atomics— correctness tests forcompat.htyped-atomics._internal_test_bq— torture tests for the MPMC behavior queue._internal_test_wsq— tests for work-stealing primitives (fast pop, slow pop, steal, park/unpark).
Test Suite
test_noticeboard.py— snapshot semantics,notice_updateatomicity,REMOVED,notice_sync, version monotonicity.test_scheduler_integration.py,test_scheduler_stats.py,test_scheduler_steal.py— end-to-end and per-primitive scheduler tests.test_compat_atomics.py— portable atomics smoke tests.test_stop_retry_composition.py—stop()/start()/wait()retry composition.test_scheduling_stress.py— expanded with fan-out, work-stealing, and shutdown stress scenarios.test_transpiler.py— AST extraction, capture rewriting, aliased imports, module export.
Full changelog: v0.3.1...v0.5.0
v0.3.1
CownCapsule serialization support for nested cowns.
Bug Fixes
- Removed the ownership check in
_cown_sharedthat prevented a
CownCapsulefrom being serialized to XIData when it was the value
of anotherCown. The check was unnecessary —_cown_sharedonly
stores a pointer and ownership is enforced at acquire time.
Improvements
- Added
CownCapsule.__reduce__withCOWN_INCREFpinning so that a
CownCapsuleembedded in a container (dict, list, etc.) can survive
the pickle round-trip used byobject_to_xidata. A module-level
reconstructor (_cown_capsule_from_pointer) inherits the pin without
a redundantCOWN_INCREF, and validates the process ID on unpickle to
guard against cross-process misuse.
v0.3.0
Improvements
- Added
CownCapsule.disown()— abandons a cown's value without
serializing it and resets ownership toNO_OWNER. Used during worker
cleanup to safely discard orphan cowns before the owning interpreter
is destroyed, preventing dangling Python object references. - Rewrote
receiveto use a two-phase spin-then-park strategy for
single-tag untimed receives. Phase 1 spins forBOC_SPIN_COUNT
iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
busy-wait CPU burn. Timed receives and multi-tag receives use
spin-then-backoff with exponential sleep (1 µs → 1 ms cap). - Added platform-abstracted condvar primitives (
BOCParkMutex/
BOCParkCond) with implementations for Windows (SRWLOCK /
CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads). - Each
BOCQueuenow carries awaiterscounter,park_mutex, and
park_cond. Producers signal parked receivers after enqueue;
drainandset_tagsbroadcast to wake all parked threads. - Replaced the fixed
thrd_sleepinsendwith asched_yield/
SwitchToThread, reducing send-side latency. - Refactored the monolithic
_core_receiveintoreceive_single_tag
andreceive_multi_tag, each with its own backoff/parking logic. - Moved the
BOC_QUEUE_DISABLEDcheck earlier inget_queue_for_tag
so callers skip disabled queues instead of returning NULL after
tag resolution. - Added Windows-compatible
atomic_load_explicit/
atomic_fetch_add_explicit/atomic_fetch_sub_explicitmacros
usingInterlockedExchangeAdd64. - Declared
Py_mod_gil = Py_MOD_GIL_NOT_USEDin both_coreand
_mathC extensions so that importing bocpy on a free-threaded
Python build (3.13t+) does not re-enable the GIL. - Replaced
PyDict_GetItem(borrowed reference) with
PyDict_GetItemRef(strong reference) inBOCRecycleQueue_recycle
on Python 3.13+, improving forward-compatibility with free-threaded
builds.
Bug Fixes
- Fixed a deadlock when the same cown is passed multiple times to
@when
(e.g.@when(c, c)). Duplicate requests for the same cown caused the
MCS-queue-based two-phase locking to spin-wait on itself. Requests are
now deduplicated by target cown inBehavior.__init__, with
compensatingresolve_onecalls to maintain the behavior count
invariant.
Tests
TestLostWakeStress: single-producer random delays, bursty producer,
and repeated single-message wake to detect lost-wake races.TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
delayed arrival, per-tag FIFO ordering, timeout, and interleaved
producers.TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
zero-timeout immediacy.- Added tests for duplicate cowns in
@when: same cown twice, thrice,
non-adjacent duplicates, duplicates within a group, and mutation
aliasing semantics.
CI
- Added a
free-threadedCI job that tests against Python 3.13t and
3.14t on Linux, with explicit assertions that the GIL remains disabled
after import.
Full Changelog: v0.2.2...v0.3.0
v0.2.2
Improvements
- Added an ASAN/UBSAN CI job that builds CPython 3.14.2 from source with AddressSanitizer and UndefinedBehaviorSanitizer, then runs the full test suite against instrumented builds of bocpy.
- Updated GitHub Actions to latest versions (
actions/checkout@v6,actions/setup-python@v5).
Bug Fixes
- Fixed a false positive warning message for deallocation of xidata on the main
interpreter after module shutdown. - Changed the clear logic when recycling
v0.2.0
Bugfix release including some minor improvements.
Improvements
- Examples are now included in the package, with script entrypoints for each.
- The
drainlow-level API function is now exposed at the package level wait()will now acquire frame-localCownobjects before shutting down the workers
Dev Tools
- Added an internal cown and behavior reference tracking utility
Bug Fixes
- Fixed a reference counting bug with cown lists
- Fixed an issue where the boids example did not run on windows due a font
setting.
v0.1.0 - Initial Release
Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>