This document describes the current relationship between the two repositories.
- OpenAI-compatible SDK calls
- native tool calling
- the ReAct execution loop
- tool registration and execution
- flat per-agent JSONL traces
- the base system prompt for lower-layer execution behavior
- Challenger / Solver / Judge role definitions
- role-specific prompt addenda
- research-project preparation
- workflow orchestration
- solver/judge improvement loops
- workflow-level trace summaries
- higher-level tests for multi-agent coordination
MarkScientist must not reimplement the lower-layer agent_base stack from ResearchHarness.
Instead:
- import
ResearchHarnessthrough a git submodule - subclass the ResearchHarness agent base
- pass role-specific prompt blocks into the lower-layer agent
- use ResearchHarness traces directly, or wrap them with a higher-level workflow record
MarkScientistno longer maintains its own model-backend abstraction layer- all agent execution flows go through ResearchHarness
- Challenger, Solver, and Judge all subclass the same lower-layer ResearchHarness base agent
- Challenger prepares the private
task/package, includingtask/task_info.jsonand hiddentask/target_study/checklist.json, and the harness derives the publicINSTRUCTIONS.mdtask contract from that private task. - Solver is responsible for producing
report/report.md - Judge reviews the report against the prepared checklist and returns JSON scoring feedback
- workflow trace files now wrap per-agent ResearchHarness traces instead of redefining the full flat-trace format