Open-Paws · samtuckerdavis · Apr 25, 2026 · Apr 25, 2026
diff --git a/.claude/rules/accessibility.md b/.claude/rules/accessibility.md
diff --git a/.claude/rules/context-repo.md b/.claude/rules/context-repo.md
@@ -0,0 +1,111 @@
+# Context Repo — Org-Wide Read Safety
+
+Activate this rule when working on `github.com/Open-Paws/context` or proposing any change that may be merged into it. The context repo is the org's single source of truth, readable by every staff member and every AI agent across the Open Paws ecosystem. That reach is what makes it valuable — and what makes misclassification costly. Material that leaks into the context repo propagates to every future agent session across the org, with no clean retraction.
+
+## The Org-Wide Read Test
+
+Before proposing, writing, reviewing, or merging any content in the context repo, ask:
+
+> Imagine this change merged. Every staff member including brand-new contractors, every intern in the May cohort, every AI agent in every repo, every persona QA agent, every scheduled Claude Code task — all of them can now read this. Is that okay?
+
+If not clearly okay, the content does not belong in the context repo. Redirect to a private location. There is no "technically private but in the context repo" category.
+
+## What Must Not Go In
+
+Non-exhaustive. Reject at plan review and redirect:
+
+- Individual personal information — salaries, compensation, health, neurodivergence disclosures, recovery status, family, relationships
+- HR / performance matters — performance feedback, interpersonal conflicts, hiring/firing discussions, contract negotiations with specific people
+- Active sensitive funder dynamics — criticism from specific funders, donor red flags, active grant negotiation positions, internal funder assessments
+- Legal matters in progress — contract disputes, IP issues, regulatory inquiries, anything a lawyer is or should be involved in
+- Unannounced partnerships or programs — anything whose premature leak would damage. Once announced, it can move in
+- Security-sensitive operational details — threat models, unpatched vulnerabilities, defense-in-depth specifics that would help an attacker
+- Credentials, secrets, API keys — never, regardless of perceived convenience
+- Personal context about individuals beyond what they've publicly chosen to share — history, politics, family situation, country-of-origin dynamics
+- Active campaign intelligence that could tip off opposition — specific corporate targets mid-campaign, undercover operation plans, timing-sensitive material
+- Sam's personal notes, journal, or private decision-making — belongs in the personal workspace repo
+- Anything treated as confidential in its originating context — Slack DMs, CryptPad docs, private channels — even if innocuous out of context
+
+Rule of thumb: gossip, personnel, legal, or "don't forward this email" — not context-repo material.
+
+## What Belongs In
+
+- Org identity, mission, public frame
+- Settled decisions, after they're settled and shareable
+- Current priorities every staff member should be aligned on
+- Program structures, playbooks, frameworks, methodology
+- Technical conventions and architecture principles that apply across repos
+- Published or ready-to-publish work
+- Glossaries, onboarding material, routing tables
+- Links out to where more detail lives
+- Decision *outcomes* and *rationale*, without embedding the sensitive discussion that produced them
+
+Test: a fresh contractor or new agent session reading cold should get (a) valuable context and (b) see nothing that would make a staff member uncomfortable. Both must be yes.
+
+## Where Sensitive Material Actually Lives
+
+| Material | Correct location |
+|---|---|
+| Personal notes, journal, draft strategic thinking | Sam's personal workspace repo |
+| Individual grant tracker, funder contact details, internal funder assessments | `private/grants/` in personal repo, or locked CryptPad |
+| HR / people matters | Outside version control — direct comms, locked docs, legal counsel where appropriate |
+| Active campaign intelligence | Per-campaign private repos or CryptPad with explicit access list |
+| Credentials | Password manager / secrets manager |
+| Early-stage partnership discussions | DM or locked doc until announced; summary can move in post-announcement |
+| Staff-specific feedback | 1:1 docs outside the context repo |
+
+If no private home exists for the rejected material, file a separate issue to establish one — in the personal workspace or ops repo, not the context repo.
+
+## Pipeline Additions For Context-Repo Changes
+
+**STAGE 2 (Triage) — classify every issue with a `sensitivity:` label:**
+
+- `sensitivity:public-ok` — already public or trivially shareable
+- `sensitivity:staff-ok` — fine for all staff + all agents (the bar for this repo)
+- `sensitivity:private` — belongs elsewhere; redirect, do not advance
+
+Issues labeled `sensitivity:private` never enter the plan wave.
+
+**STAGE 4 (Plan Review) — run the org-wide read test explicitly.** If not clearly okay, reject with guidance on what to strip or redirect.
+
+**STAGE 13 (Adversarial) — add a 7th check:**
+
+7. **Confidentiality leak** — does this merge expose any individual's personal information, any sensitive relationship dynamic, any active negotiation, any unannounced plan, or any material that originated in a confidential context? If yes, `major+` severity, back to fix loop.
+
+Adversarial patterns to flag (content that "seems fine" but leaks in context):
+
+- Closed decision citing a specific funder's objection as the reason (strip identity, keep principle)
+- Priority justified by "we lost trust with X partner" (strip dynamic, keep strategic implication)
+- Program doc listing specific individuals as "struggling" or "not meeting expectations"
+- Playbook referencing active campaign targets by name before launch
+
+## Default Direction Is Out, Not In
+
+When uncertain, keep it out. Context flows from private to public, not back. Once merged, downstream agents have already consumed it.
+
+If material genuinely belongs but currently leaks something, **rewrite at a higher level of abstraction** — keep the principle, strip the specifics.
+
+- Good: "Decision: prefer multi-year unrestricted funding over single-year restricted"
+- Bad: "Decision: avoid Funder X because they demanded reporting we found unreasonable"
+
+Same principle, different safety profile.
+
+## Decision Tree
+
+```
+Is every fact in this change something I'd say out loud in an all-staff meeting
+with interns, contractors, and partner org reps present?
+│
+├── YES → proceed through normal pipeline
+│
+└── NO → which category?
+    │
+    ├── Personal / HR / relationship → personal repo or external tool
+    ├── Active sensitive operation → locked CryptPad with access list
+    ├── Legal / contractual → outside version control, with counsel
+    ├── Credentials → secrets manager
+    ├── Sensitive but abstractable → rewrite at higher level, retry pipeline
+    └── Unclear → ask Sam before filing the issue
+```
+
+Misclassification is costly in both directions. Too restrictive and the repo becomes useless. Too loose and it leaks. When genuinely unsure, ping rather than guess.
diff --git a/.claude/rules/cost-optimization.md b/.claude/rules/cost-optimization.md
@@ -4,7 +4,13 @@ Advocacy organizations operate on nonprofit budgets. Every dollar spent on AI co
 
 ## Model Routing — Right Model for Each Task
 
-Route tasks to the cheapest model capable of handling them well. Use cheaper, faster models for: test generation, boilerplate code, formatting assistance, simple refactoring, and documentation. Use mid-tier models for: debugging, multi-file changes, code review, and integration work. Reserve frontier models for: hard architectural problems, complex debugging, novel design challenges, and security-critical code review. Aider achieves comparable benchmark scores at 3x fewer tokens than some alternatives — consider token-efficient tools for routine workflows.
+Route tasks to the cheapest model capable of handling them well.
+
+- **Cheap tier — Claude Haiku 4.5 (`claude-haiku-4-5-20251001`)**: test generation, boilerplate code, formatting assistance, simple refactoring, mechanical edits, documentation, glue code, log parsing, summarization of structured output. Default for desloppify-driven mechanical work. Default for first-pass scout / triage when the observation is well-structured.
+- **Mid tier — Claude Sonnet 4.6 (`claude-sonnet-4-6`)**: debugging, multi-file changes, code review, integration work, plan authoring against a clear spec, test review, persona-QA narrative writing.
+- **Frontier — Claude Opus 4.7 (`claude-opus-4-7`, default on this stack)**: hard architectural problems, complex debugging, novel design challenges, security-critical code review, adversarial audit, the strategic / Chat-Gary surface.
+
+Aider achieves comparable benchmark scores at 3× fewer tokens than some alternatives — consider token-efficient tools for routine workflows. The single biggest cost win in this stack is **routing cheap things to Haiku rather than reaching for Opus by default**.
 
 ## Token Budget Discipline
 

diff --git a/.claude/rules/desloppify.md b/.claude/rules/desloppify.md
@@ -1,23 +1,64 @@
 # Code Quality — desloppify
 
-Run desloppify to systematically identify and fix code quality issues. Install and configure before scanning (requires Python 3.11+):
+Run desloppify to systematically identify and fix code quality issues. Install from the **Open Paws fork** (Python 3.11+):
 
 ```bash
-pip install --upgrade "desloppify[full]"
+# Install from this fork — NEVER from PyPI / upstream
+pip install "git+https://github.com/Open-Paws/desloppify.git#egg=desloppify[full]"
 desloppify update-skill claude
 ```
 
-Add `.desloppify/` to `.gitignore` — it contains local state that should not be committed. Before scanning, exclude directories that should not be analyzed (vendor, build output, generated code, worktrees) with `desloppify exclude <path>`. Share questionable candidates with the project owner before excluding.
+**Canonical install command.** This file is the single source of truth for how to install desloppify. Every other file in this stack (`skills/desloppify-playbook/SKILL.md`, `agents/desloppifier.md`, `$OP_CONTEXT_REPO/.claude/rules/desloppify.md`, every `Open-Paws/*/.claude/rules/desloppify.md`) links here rather than restating the command — duplication is how the multi-repo drift that existed pre-2026-04-25 happened. If you're editing an install command anywhere other than this file, stop.
+
+**OP fork only — never upstream.** The git install above pulls from `github.com/Open-Paws/desloppify`, which carries the movement conventions (no-speciesist-language rules, type-safety patterns, gateway response shape discipline, compassionate language enforcement, persona-QA browser testing) that upstream desloppify lacks. `pip install desloppify` from PyPI pulls upstream and is a hard-rule violation per `~/.claude/rules/pipeline-nevers.md`.
+
+Add `.desloppify/` to `.gitignore` — it contains local state that should not be committed. Before scanning, exclude generated / vendor / build dirs:
 
 ```bash
+# Exclude generated directories, then scan
+desloppify exclude node_modules dist
 desloppify scan --path .
-desloppify next
 ```
 
 `--path` is the directory to scan (use `.` for the whole project, or a subdirectory like `src/`). Your goal is to get the strict score as high as possible. The scoring resists gaming — the only way to improve it is to actually make the code better.
 
-**The loop:** run `next`. It is the execution queue from the living plan, not the whole backlog. It tells you what to fix now, which file, and the resolve command to run when done. Fix it, resolve it, run `next` again. This is your main job. Use `desloppify backlog` only when you need to inspect broader open work not currently driving execution.
+## The fix loop
+
+Run `next` → fix → resolve → repeat:
+
+```bash
+desloppify next          # get the top-priority item; shows which file and the resolve command
+# fix the code
+desloppify plan resolve  # mark it done
+desloppify next          # get the next item
+```
+
+It is the execution queue from the living plan, not the whole backlog. It tells you what to fix now, which file, and the resolve command to run when done. Use `desloppify backlog` only when you need to inspect broader open work not currently driving execution.
 
 Do not be lazy. Large refactors and small detailed fixes — do both with equal energy. No task is too big or too small. Fix things properly, not minimally.
 
 Use `plan` / `plan queue` to reorder priorities or cluster related issues. Rescan periodically. The scan output includes agent instructions — follow them, do not substitute your own analysis.
+
+## Persona-QA workflow (UI repos with persona-driven testing)
+
+```bash
+desloppify persona-qa --prepare --url https://example.com   # generate agent instructions
+# agent runs browser testing and captures findings in JSON
+desloppify persona-qa --import findings.json                 # merge into state
+desloppify persona-qa --status                               # per-persona summary
+desloppify next                                              # persona QA items now appear in the queue
+```
+
+## Baseline Capture Process
+
+**At plan time (STAGE 3):** Capture desloppify baseline against branch point:
+
+```bash
+desloppify status --json > .desloppify/baseline.json
+```
+
+Post baseline JSON as GitHub issue comment for durable storage (`.desloppify/` is gitignored).
+
+**Recovery if missing:** STAGE 9 uses `git merge-base HEAD main` to recapture against the branch point.
+
+**Score-cannot-regress gate.** STAGE 9 blocks merge if the strict score drops below baseline. Regression requires `override:allow-score-drop` label (human-only — agents cannot apply it).