Python Project Template

A Python template to quickstart any project with a production-ready workflow, quality tooling, and AI-assisted development.

Workflow Overview

Features flow through 6 steps with a WIP limit of 1 feature at a time. The filesystem enforces WIP:

docs/features/backlog/<feature-name>/ — features waiting to be worked on
docs/features/in-progress/<feature-name>/ — exactly one feature being built right now
docs/features/completed/<feature-name>/ — accepted and shipped features

STEP 1: SCOPE          (product-owner)  → discovery + Gherkin stories + criteria
STEP 2: ARCH           (developer)      → design module structure, get PO approval
STEP 3: TEST FIRST     (developer)      → sync stubs, write failing tests
STEP 4: IMPLEMENT      (developer)      → Red-Green-Refactor, commit per green test
STEP 5: VERIFY         (reviewer)       → run all commands, review code
STEP 6: ACCEPT         (product-owner)  → demo, validate, move folder to completed/

PO picks the next feature from backlog. Developer never self-selects.

Verification is adversarial. The reviewer's job is to try to break the feature, not to confirm it works. The default hypothesis is "it might be broken despite green checks; prove otherwise."

Roles

Product Owner (PO) — AI agent. Interviews the stakeholder, writes discovery docs, Gherkin features, and acceptance criteria. Accepts or rejects deliveries.
Stakeholder — Human. Answers PO's questions, provides domain knowledge, says "baseline" when discovery is complete.
Developer — AI agent. Architecture, test bodies, implementation, git. Never edits .feature files. Escalates spec gaps to PO.
Reviewer — AI agent. Adversarial verification. Reports spec gaps to PO.

Agents

product-owner — defines scope (4 phases), picks features, accepts deliveries
developer — architecture, tests, code, git, releases (Steps 2-4 + release)
reviewer — runs commands and reviews code at Step 5, produces APPROVED/REJECTED report
setup-project — one-time setup to initialize a new project from this template

Skills

Skill	Used By	Step
`session-workflow`	all agents	every session
`scope`	product-owner	1
`tdd`	developer	3
`implementation`	developer	4
`verify`	reviewer	5
`code-quality`	developer	pre-handoff (redirects to `verify`)
`pr-management`	developer	6
`git-release`	developer	6
`create-skill`	developer	meta

Session protocol: Every agent loads skill session-workflow at session start. Load additional skills as needed for the current step.

Step 1 — SCOPE (4 Phases)

Phase 1 — Project Discovery (once per project)

PO creates docs/features/discovery.md. Asks stakeholder 7 standard questions (Who/What/Why/When/Success/Failure/Out-of-scope). Silent pre-mortem generates follow-up questions. All questions presented at once. Autonomous baseline when all questions are answered. PO identifies feature list and creates backlog/<name>/discovery.md per feature.

Phase 2 — Feature Discovery (per feature)

PO derives targeted questions from feature entities: extract nouns/verbs from project discovery, populate the Entities table, then generate questions from gaps, ambiguities, and boundary conditions. Silent pre-mortem before the first interview round. Present all questions to the stakeholder at once; iterate with follow-up rounds (pre-mortem after each) until stakeholder says "baseline" to freeze discovery.

Phase 3 — Stories (PO alone)

One .feature file per user story. Feature: block with user story header only — no Example: blocks yet. Commit: feat(stories): write user stories for <name>

Phase 4 — Criteria (PO alone)

Silent pre-mortem per story. Write Example: blocks with @id:<8-char-hex> tags. Each Example must be observably distinct; if a single .feature file spans multiple concerns, split into separate .feature files (a feature folder can contain multiple .feature files). Commit: feat(criteria): write acceptance criteria for <name>

Feature Decomposition Threshold

Before moving to Phase 3, check: does this feature span >2 distinct concerns OR have >8 candidate Examples? If yes, split into separate features in backlog/ before writing stories. Each feature should address a single cohesive concern.

Baseline is frozen: no .feature changes after criteria are written. Change = @deprecated tag + new Example.

Filesystem Structure

docs/features/
  discovery.md                        ← project-level (Status + Questions only)
  backlog/<feature-name>/
    discovery.md                      ← Status + Entities + Rules + Constraints + Questions
    <story-slug>.feature              ← one per user story (Gherkin)
  in-progress/<feature-name>/         ← whole folder moves here at Step 2
  completed/<feature-name>/           ← whole folder moves here at Step 6

tests/
  features/<feature-name>/
    <story-slug>_test.py              ← one per .feature, stubs from gen-tests
  unit/
    <anything>_test.py                ← developer-authored extras

Gherkin Format

Feature: Bounce physics
  As a game engine
  I want balls to bounce off walls
  So that gameplay feels physical

  @id:a3f2b1c4
  Example: Ball bounces off top wall
    Given a ball moving upward reaches y=0
    When the physics engine processes the next frame
    Then the ball velocity y-component becomes positive

  @deprecated @id:b5c6d7e8
  Example: Old behavior no longer needed
    Given ...
    When ...
    Then ...

@id:<8-char-hex> — generated with uv run task gen-id
@deprecated — marks superseded criteria; gen-tests adds @pytest.mark.deprecated to the mapped test
Example: keyword (not Scenario:)
Each Example must be observably distinct from every other

Test Conventions

Test Stub Generation

uv run task gen-tests              # sync all features
uv run task gen-tests -- --check   # dry run
uv run task gen-tests -- --orphans # list orphaned tests

backlog / in-progress: full write (create stubs, update docstrings, rename functions)
completed: only toggle @pytest.mark.deprecated (no docstring changes)
Orphaned tests (no matching @id) get @pytest.mark.skip(reason="orphan: ...")

Test File Layout

tests/features/<feature-name>/<story-slug>_test.py

Function Naming

def test_<feature_slug>_<8char_hex>() -> None:

Docstring Format (mandatory)

@pytest.mark.unit
def test_bounce_physics_a3f2b1c4() -> None:
    """
    Given: A ball moving upward reaches y=0
    When: The physics engine processes the next frame
    Then: The ball velocity y-component becomes positive
    """
    # Given
    # When
    # Then
    raise NotImplementedError

Markers (4 total)

@pytest.mark.unit — isolated, one function/class, no external state
@pytest.mark.integration — multiple components, external state
@pytest.mark.slow — takes > 50ms; additionally applied alongside unit or integration
@pytest.mark.deprecated — auto-skipped by conftest hook; added by gen-tests

Every test gets exactly one of unit or integration. Slow tests additionally get slow.

Development Commands

# Install dependencies
uv sync --all-extras

# Run the application (for humans)
uv run task run

# Run the application with timeout (for agents — prevents hanging)
timeout 10s uv run task run

# Run tests (fast, no coverage)
uv run task test-fast

# Run full test suite with coverage
uv run task test

# Run slow tests only
uv run task test-slow

# Lint and format
uv run task lint

# Type checking
uv run task static-check

# Generate an 8-char hex ID
uv run task gen-id

# Sync test stubs from .feature files
uv run task gen-tests

# Serve documentation
uv run task doc-serve

Code Quality Standards

Principles (in priority order): YAGNI > KISS > DRY > SOLID > Object Calisthenics
Linting: ruff, Google docstring convention, noqa forbidden
Type checking: pyright, 0 errors required
Coverage: 100% (measured against your actual package)
Function length: ≤ 20 lines
Class length: ≤ 50 lines
Max nesting: 2 levels
Instance variables: ≤ 2 per class
Semantic alignment: tests must operate at the same abstraction level as the acceptance criteria they cover
Integration tests: multi-component features require at least one @pytest.mark.integration test exercising the public entry point

Developer Quality Gate Priority Order

During Step 4 (Implementation), correctness priorities are:

Design correctness — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
One test green — the specific test under work passes, plus test-fast still passes
Reviewer code-design check — reviewer verifies design + semantic alignment (no lint/pyright/coverage)
Commit — only after reviewer APPROVED
Quality tooling — lint, static-check, full test with coverage run only at developer handoff (before Step 5)

Design correctness is far more important than lint/pyright/coverage compliance. A well-designed codebase with minor lint issues is better than a lint-clean codebase with poor design.

Verification Philosophy

Automated checks (lint, typecheck, coverage) verify syntax-level correctness — the code is well-formed.
Human review (semantic alignment, code review, manual testing) verifies semantic-level correctness — the code does what the user needs.
Both are required. All-green automated checks are necessary but not sufficient for APPROVED.
Reviewer defaults to REJECTED unless correctness is proven.

Deprecation Process

PO adds @deprecated tag to Example in .feature file
Run uv run task gen-tests — script adds @pytest.mark.deprecated to mapped test
Deprecated tests auto-skip via conftest hook
Feature is done when all non-deprecated tests pass
No special folder — features move to completed/ normally

Release Management

Version format: v{major}.{minor}.{YYYYMMDD}

Minor bump for new features; major bump for breaking changes
Same-day second release: increment minor, keep same date
Each release gets a unique adjective-animal name

Use @developer /skill git-release for the full release process.

Session Management

Every session: load skill session-workflow. Read TODO.md first, update it at the end.

TODO.md is a 15-line bookmark — not a project journal:

# Current Work

Feature: <name>
Step: <1-6> (<step name>)
Source: docs/features/in-progress/<name>/discovery.md

## Progress
- [x] `<@id:hex>`: <description>          ← done
- [~] `<@id:hex>`: <description>          ← in progress
- [ ] `<@id:hex>`: <description>          ← next
- [-] `<@id:hex>`: <description>          ← cancelled

## Next
<One actionable sentence>

Setup

To initialize a new project from this template:

@setup-project

The setup agent will ask for your project name, GitHub username, author info, and configure all template placeholders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Project Template

Workflow Overview

Roles

Agents

Skills

Step 1 — SCOPE (4 Phases)

Phase 1 — Project Discovery (once per project)

Phase 2 — Feature Discovery (per feature)

Phase 3 — Stories (PO alone)

Phase 4 — Criteria (PO alone)

Feature Decomposition Threshold

Filesystem Structure

Gherkin Format

Test Conventions

Test Stub Generation

Test File Layout

Function Naming

Docstring Format (mandatory)

Markers (4 total)

Development Commands

Code Quality Standards

Developer Quality Gate Priority Order

Verification Philosophy

Deprecation Process

Release Management

Session Management

Setup

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Python Project Template

Workflow Overview

Roles

Agents

Skills

Step 1 — SCOPE (4 Phases)

Phase 1 — Project Discovery (once per project)

Phase 2 — Feature Discovery (per feature)

Phase 3 — Stories (PO alone)

Phase 4 — Criteria (PO alone)

Feature Decomposition Threshold

Filesystem Structure

Gherkin Format

Test Conventions

Test Stub Generation

Test File Layout

Function Naming

Docstring Format (mandatory)

Markers (4 total)

Development Commands

Code Quality Standards

Developer Quality Gate Priority Order

Verification Philosophy

Deprecation Process

Release Management

Session Management

Setup