A Python template to quickstart any project with a production-ready workflow, quality tooling, and AI-assisted development.
Features flow through 6 steps with a WIP limit of 1 feature at a time. The filesystem enforces WIP:
docs/features/backlog/<feature-name>/— features waiting to be worked ondocs/features/in-progress/<feature-name>/— exactly one feature being built right nowdocs/features/completed/<feature-name>/— accepted and shipped features
STEP 1: SCOPE (product-owner) → discovery + Gherkin stories + criteria
STEP 2: ARCH (developer) → design module structure, get PO approval
STEP 3: TEST FIRST (developer) → sync stubs, write failing tests
STEP 4: IMPLEMENT (developer) → Red-Green-Refactor, commit per green test
STEP 5: VERIFY (reviewer) → run all commands, review code
STEP 6: ACCEPT (product-owner) → demo, validate, move folder to completed/
PO picks the next feature from backlog. Developer never self-selects.
Verification is adversarial. The reviewer's job is to try to break the feature, not to confirm it works. The default hypothesis is "it might be broken despite green checks; prove otherwise."
- Product Owner (PO) — AI agent. Interviews the stakeholder, writes discovery docs, Gherkin features, and acceptance criteria. Accepts or rejects deliveries.
- Stakeholder — Human. Answers PO's questions, provides domain knowledge, says "baseline" when discovery is complete.
- Developer — AI agent. Architecture, test bodies, implementation, git. Never edits
.featurefiles. Escalates spec gaps to PO. - Reviewer — AI agent. Adversarial verification. Reports spec gaps to PO.
- product-owner — defines scope (4 phases), picks features, accepts deliveries
- developer — architecture, tests, code, git, releases (Steps 2-4 + release)
- reviewer — runs commands and reviews code at Step 5, produces APPROVED/REJECTED report
- setup-project — one-time setup to initialize a new project from this template
| Skill | Used By | Step |
|---|---|---|
session-workflow |
all agents | every session |
scope |
product-owner | 1 |
tdd |
developer | 3 |
implementation |
developer | 4 |
verify |
reviewer | 5 |
code-quality |
developer | pre-handoff (redirects to verify) |
pr-management |
developer | 6 |
git-release |
developer | 6 |
create-skill |
developer | meta |
Session protocol: Every agent loads skill session-workflow at session start. Load additional skills as needed for the current step.
PO creates docs/features/discovery.md. Asks stakeholder 7 standard questions (Who/What/Why/When/Success/Failure/Out-of-scope). Silent pre-mortem generates follow-up questions. All questions presented at once. Autonomous baseline when all questions are answered. PO identifies feature list and creates backlog/<name>/discovery.md per feature.
PO derives targeted questions from feature entities: extract nouns/verbs from project discovery, populate the Entities table, then generate questions from gaps, ambiguities, and boundary conditions. Silent pre-mortem before the first interview round. Present all questions to the stakeholder at once; iterate with follow-up rounds (pre-mortem after each) until stakeholder says "baseline" to freeze discovery.
One .feature file per user story. Feature: block with user story header only — no Example: blocks yet. Commit: feat(stories): write user stories for <name>
Silent pre-mortem per story. Write Example: blocks with @id:<8-char-hex> tags. Each Example must be observably distinct; if a single .feature file spans multiple concerns, split into separate .feature files (a feature folder can contain multiple .feature files). Commit: feat(criteria): write acceptance criteria for <name>
Before moving to Phase 3, check: does this feature span >2 distinct concerns OR have >8 candidate Examples? If yes, split into separate features in backlog/ before writing stories. Each feature should address a single cohesive concern.
Baseline is frozen: no .feature changes after criteria are written. Change = @deprecated tag + new Example.
docs/features/
discovery.md ← project-level (Status + Questions only)
backlog/<feature-name>/
discovery.md ← Status + Entities + Rules + Constraints + Questions
<story-slug>.feature ← one per user story (Gherkin)
in-progress/<feature-name>/ ← whole folder moves here at Step 2
completed/<feature-name>/ ← whole folder moves here at Step 6
tests/
features/<feature-name>/
<story-slug>_test.py ← one per .feature, stubs from gen-tests
unit/
<anything>_test.py ← developer-authored extras
Feature: Bounce physics
As a game engine
I want balls to bounce off walls
So that gameplay feels physical
@id:a3f2b1c4
Example: Ball bounces off top wall
Given a ball moving upward reaches y=0
When the physics engine processes the next frame
Then the ball velocity y-component becomes positive
@deprecated @id:b5c6d7e8
Example: Old behavior no longer needed
Given ...
When ...
Then ...@id:<8-char-hex>— generated withuv run task gen-id@deprecated— marks superseded criteria;gen-testsadds@pytest.mark.deprecatedto the mapped testExample:keyword (notScenario:)- Each Example must be observably distinct from every other
uv run task gen-tests # sync all features
uv run task gen-tests -- --check # dry run
uv run task gen-tests -- --orphans # list orphaned tests- backlog / in-progress: full write (create stubs, update docstrings, rename functions)
- completed: only toggle
@pytest.mark.deprecated(no docstring changes) - Orphaned tests (no matching
@id) get@pytest.mark.skip(reason="orphan: ...")
tests/features/<feature-name>/<story-slug>_test.py
def test_<feature_slug>_<8char_hex>() -> None:@pytest.mark.unit
def test_bounce_physics_a3f2b1c4() -> None:
"""
Given: A ball moving upward reaches y=0
When: The physics engine processes the next frame
Then: The ball velocity y-component becomes positive
"""
# Given
# When
# Then
raise NotImplementedError@pytest.mark.unit— isolated, one function/class, no external state@pytest.mark.integration— multiple components, external state@pytest.mark.slow— takes > 50ms; additionally applied alongsideunitorintegration@pytest.mark.deprecated— auto-skipped by conftest hook; added bygen-tests
Every test gets exactly one of unit or integration. Slow tests additionally get slow.
# Install dependencies
uv sync --all-extras
# Run the application (for humans)
uv run task run
# Run the application with timeout (for agents — prevents hanging)
timeout 10s uv run task run
# Run tests (fast, no coverage)
uv run task test-fast
# Run full test suite with coverage
uv run task test
# Run slow tests only
uv run task test-slow
# Lint and format
uv run task lint
# Type checking
uv run task static-check
# Generate an 8-char hex ID
uv run task gen-id
# Sync test stubs from .feature files
uv run task gen-tests
# Serve documentation
uv run task doc-serve- Principles (in priority order): YAGNI > KISS > DRY > SOLID > Object Calisthenics
- Linting: ruff, Google docstring convention,
noqaforbidden - Type checking: pyright, 0 errors required
- Coverage: 100% (measured against your actual package)
- Function length: ≤ 20 lines
- Class length: ≤ 50 lines
- Max nesting: 2 levels
- Instance variables: ≤ 2 per class
- Semantic alignment: tests must operate at the same abstraction level as the acceptance criteria they cover
- Integration tests: multi-component features require at least one
@pytest.mark.integrationtest exercising the public entry point
During Step 4 (Implementation), correctness priorities are:
- Design correctness — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
- One test green — the specific test under work passes, plus
test-faststill passes - Reviewer code-design check — reviewer verifies design + semantic alignment (no lint/pyright/coverage)
- Commit — only after reviewer APPROVED
- Quality tooling —
lint,static-check, fulltestwith coverage run only at developer handoff (before Step 5)
Design correctness is far more important than lint/pyright/coverage compliance. A well-designed codebase with minor lint issues is better than a lint-clean codebase with poor design.
- Automated checks (lint, typecheck, coverage) verify syntax-level correctness — the code is well-formed.
- Human review (semantic alignment, code review, manual testing) verifies semantic-level correctness — the code does what the user needs.
- Both are required. All-green automated checks are necessary but not sufficient for APPROVED.
- Reviewer defaults to REJECTED unless correctness is proven.
- PO adds
@deprecatedtag to Example in.featurefile - Run
uv run task gen-tests— script adds@pytest.mark.deprecatedto mapped test - Deprecated tests auto-skip via conftest hook
- Feature is done when all non-deprecated tests pass
- No special folder — features move to
completed/normally
Version format: v{major}.{minor}.{YYYYMMDD}
- Minor bump for new features; major bump for breaking changes
- Same-day second release: increment minor, keep same date
- Each release gets a unique adjective-animal name
Use @developer /skill git-release for the full release process.
Every session: load skill session-workflow. Read TODO.md first, update it at the end.
TODO.md is a 15-line bookmark — not a project journal:
# Current Work
Feature: <name>
Step: <1-6> (<step name>)
Source: docs/features/in-progress/<name>/discovery.md
## Progress
- [x] `<@id:hex>`: <description> ← done
- [~] `<@id:hex>`: <description> ← in progress
- [ ] `<@id:hex>`: <description> ← next
- [-] `<@id:hex>`: <description> ← cancelled
## Next
<One actionable sentence>To initialize a new project from this template:
@setup-projectThe setup agent will ask for your project name, GitHub username, author info, and configure all template placeholders.