topcoder-platform · jmgasper · Apr 14, 2026 · Mar 30, 2026 · Mar 31, 2026 · Mar 31, 2026
diff --git a/.factory/init.sh b/.factory/init.sh
@@ -0,0 +1,23 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+source "$HOME/.config/nvm/nvm.sh"
+
+nvm use >/dev/null
+if [ ! -d node_modules ]; then
+  pnpm install
+fi
+
+if [ -d data-migration ]; then
+  (
+    cd data-migration
+    nvm use 18.19.0 >/dev/null
+    if [ ! -d node_modules ]; then
+      pnpm install
+    fi
+  )
+fi
+
+if [ ! -f .env.importer.local ]; then
+  echo "warning: .env.importer.local is missing; live validation will remain blocked" >&2
+fi
diff --git a/.factory/library/architecture.md b/.factory/library/architecture.md
@@ -0,0 +1,201 @@
+# Architecture
+
+How the historical marathon-match importer works at a high level.
+
+**What belongs here:** major components, branch behavior, data flow, invariants, and cross-service ownership.
+**What does NOT belong here:** step-by-step implementation tasks or validator commands.
+
+---
+
+## System Boundary
+
+The mission adds a reusable importer inside `challenge-api-v6/data-migration/` that reads legacy Informix JSON exports and reconciles them into the v6 challenge/resource/review stack.
+
+### Read surfaces
+
+- `/mnt/Informix` JSON exports (read-only)
+- existing v6 challenge data through the challenge DB / challenge-api schema
+- existing v6 resource data through the Resource API
+- existing v6 submission and review-summation data through the review DB / review-api schema
+
+### Write surfaces
+
+- Challenge and ChallengePhase records in the challenge DB
+- submitter Resource records through the Resource API
+- Submission and ReviewSummation records in the review DB
+
+## Import Pipeline
+
+### 1. Selection and planning
+
+The importer accepts an explicit round filter and builds a per-round plan. Each selected round is classified as one of:
+
+- `create` — no matching v6 marathon challenge exists
+- `reuse/backfill-only` — a v6 marathon challenge already exists and only linked records may be added
+- `skip` / `unresolved` — the round cannot be safely applied without more input
+
+Planning is required to surface traceability, counts, and entity-level deltas before writes occur.
+
+`--existing-state-file` is supplemental only. It may enrich counts for reporting, but it is not authoritative reuse evidence and must never override direct challenge-state discovery.
+
+### Existing-challenge match rule
+
+Safe reuse is authoritative, not fuzzy:
+
+1. first try an exact existing `challenge.legacyId == round.id` match
+2. if there is not exactly one such match, treat any name-based or heuristic candidates as planning diagnostics only
+3. before reusing a matched challenge, verify it is a safe historical MM target: Marathon Match type, Data Science track, and no conflicting duplicate standard phase rows
+4. if the round still is not matched unambiguously, or if the matched challenge fails those shape checks, emit `unresolved` and require an explicit override rather than auto-reusing a challenge
+
+This keeps backfill-only behavior deterministic and avoids silent challenge-level rewrites.
+
+If authoritative challenge-state discovery is unavailable, planning must fail closed as `unresolved` instead of silently falling back to create-path planning.
+
+### 2. Challenge reconciliation
+
+For each selected round:
+
+- if no v6 challenge exists, create one completed `Marathon Match` challenge on the `Data Science` track
+- if a v6 challenge is matched unambiguously and passes the reuse preconditions above, keep the same challenge id and preserve challenge-level fields
+
+Created challenges must use `challenge.legacyId = round.id`. Reused challenges are not challenge-level rewrite candidates; they must already be matched unambiguously by the rule above or remain `unresolved`.
+
+### 3. Phase materialization
+
+Canonical MM history in v6 is represented by exactly three standard phases:
+
+- `Registration`
+- `Submission`
+- `Review`
+
+For newly created historical challenges, these phases must exist and be closed. For reused challenges, already-present standard phase rows are preserved as-is and only absent standard phase rows may be added.
+
+### Timeline derivation rule
+
+When creating a historical challenge:
+
+- choose the canonical Marathon Match/Data Science timeline mapping used by the target environment by resolving exactly one valid template candidate; if zero or multiple candidates remain, stop with `unresolved`
+- derive `Registration` from the min/max eligible `round_registration.timestamp`
+- derive `Submission` from the earliest available legacy submission-open signal for the round, falling back to the earliest non-example submit timestamp when needed, and end it at the latest non-example submit timestamp
+- synthesize `Review` as a coherent closed interval starting at or after the imported submission end; if no explicit review timestamps exist, collapse it to a closed interval at the end of submission rather than inventing a separate open window
+
+If required timestamps are missing or contradictory enough that a coherent closed timeline cannot be produced, the round should remain `unresolved` instead of being half-created.
+
+Planning must perform this same canonical MM/Data Science timeline-mapping resolution before returning `decision=create`; dry-run must not promise creates that apply would later reject.
+
+### 4. Participant materialization
+
+Submitter resources come from legacy registrations, not just from members with submissions. The importer must create or reuse exactly one submitter-role resource per eligible registrant that resolves in the target environment.
+
+**Eligible registrant rule:** every distinct `round_registration.coder_id` for the selected round where `eligible == '1'`.
+
+**Identity normalization rule:** resolve each legacy `coder_id` once through the same normalized member lookup and reuse that normalized identity for Resource API writes, imported submissions, and imported review records so the same member cannot surface with conflicting cross-service identities.
+
+**Stable resource dedup key:** `(challengeId, memberId, roleId=submitter)`.
+
+### Missing-member skip policy
+
+If the target dev environment does not contain a legacy member, classify that member as `missing-member` for the current run and:
+
+- skip resource creation for that member
+- skip that member's non-example submissions
+- skip that member's final and provisional review materialization
+- continue importing other members for the round
+- write a deterministic skipped-file artifact for later manual processing
+
+The skipped artifact should be stable enough for rerun comparison and manual recovery, including at least the legacy round id, member id, skip reason, and affected surfaces.
+
+### Approved completed-challenge resource workflow
+
+If the Resource API refuses submitter creation on a completed historical challenge, the user has approved a temporary status-transition workflow solely for submitter-resource backfill:
+
+- capture the original challenge status first
+- transition only as much as needed to satisfy the Resource API write constraint
+- create the missing submitter resources through the Resource API
+- restore the challenge to its original completed state before the importer finishes
+
+This workflow is a narrow exception for historical resource backfill only; it does not authorize general challenge-level rewrites.
+
+### 5. Submission materialization
+
+Only non-example legacy submissions are imported. The importer must preserve the full non-example history for members that resolve in the target environment, and explicitly skip/report missing-member rows instead of creating partial participant footprints.
+
+**Stable submission identity invariant:** imported `Submission.legacySubmissionId` must be a deterministic composite derived from legacy submission identity so round-wide and rerun validation can compare exact sets. The contract assumes `legacySubmissionId` is the stable external identity for imported submissions.
+
+### 6. Score materialization
+
+Two score streams are imported:
+
+- **provisional history** — one provisional review summation per imported non-example submission, using `long_submission.submission_points`
+- **final result** — one final review summation per imported member, attached to that member's latest imported non-example submission
+
+Final-score derivation uses legacy final-result fields with the agreed precedence:
+
+1. `long_comp_result.system_point_total`
+2. `long_comp_result.point_total`
+3. the ranking score from legacy state data used for final ordering
+
+If a legacy finalist has no imported non-example submission to attach to, the importer must skip that final score explicitly rather than create an orphan final review summation. Missing-member skips should be reported distinctly from other skip reasons.
+
+**Stable review-summation dedup keys:**
+
+- provisional: exactly one provisional review summation per imported submission (`submissionId + provisional`)
+- final: exactly one final review summation on the member's latest imported non-example submission (`submissionId + final`)
+
+## Reuse / Backfill Rules
+
+These are core safety invariants:
+
+- existing v6 marathon challenges are source of truth for challenge-level fields
+- backfill may add missing linked records only
+- already-present standard phase rows on reused challenges are preserved
+- reruns must not duplicate challenges, phases, resources, submissions, or review summations
+- example submissions and example review summations are never imported
+
+## Apply / Resume Behavior
+
+Cross-service writes are not a single distributed transaction. The importer therefore must be round-scoped and restart-safe:
+
+- plan a round before applying it
+- read before write on every owned surface
+- treat rerun reconciliation as the recovery path after partial failure
+- never assume a round is absent just because a previous apply stopped mid-flight
+
+The observable result of rerunning a partially imported round should be reconciliation to the same steady state, not duplication or destructive rewrite.
+
+If a temporary status-transition workflow is used during participant backfill, reruns must still converge to the same final completed state.
+
+## Data Ownership Invariants
+
+### Challenge DB
+
+Owns:
+
+- challenge identity and completion state
+- phase rows and challenge timeline shape
+
+### Resource API
+
+Owns:
+
+- submitter resource creation/reuse
+- externally visible `(memberId, roleId)` participant footprint
+
+### Review DB / Review API
+
+Owns:
+
+- imported submissions
+- provisional review summations per submission
+- final review summations attached to the latest imported non-example submission per member
+
+## Validation-Oriented Invariants
+
+The validation contract relies on these high-level invariants being preserved:
+
+- round `10815` is the primary missing-historical create-path fixture
+- a score-rich Marathon Match fixture is selected during score-feature work for final-ranking validation
+- round `14272` is the second selected round for multi-round blast-radius checks
+- imported submission identity is externally testable via `legacySubmissionId`
+- reused-round verification depends on comparing both identity sets and externally visible field snapshots
+- for member-owned surfaces, validation now reconciles `imported subset + skipped missing-member subset = legacy total`
diff --git a/.factory/library/environment.md b/.factory/library/environment.md
@@ -0,0 +1,65 @@
+# Environment
+
+Environment variables, external dependencies, and setup notes.
+
+**What belongs here:** required env vars, external API URLs, credentials/setup expectations, Node/runtime requirements, read-only source locations.
+**What does NOT belong here:** service start/stop commands or ports to manage locally (use `.factory/services.yaml`).
+
+---
+
+## Required Environment
+
+The importer must load `challenge-api-v6/.env.importer.local` for local/dev execution.
+
+Required values:
+
+- `DATABASE_URL` — challenge DB used by `challenge-api-v6`
+- `MEMBER_DB_URL` — member lookup DB connection string for target-member resolution during missing-member planning/validation; defaults to `DATABASE_URL` only when that DB can also resolve member data
+- `MEMBER_DB_SCHEMA` — schema used for member lookup tables (default behavior is code-defined; validators should set it explicitly when member data is not reachable through the challenge schema)
+- `REVIEW_DB_URL` — review DB used for submissions and review summations
+- `RESOURCES_API_URL` — base URL for Resource API writes and reads
+- `AUTH0_URL`
+- `AUTH0_AUDIENCE`
+- `AUTH0_CLIENT_ID`
+- `AUTH0_CLIENT_SECRET`
+
+Optional / useful values:
+
+- `DATA_DIRECTORY=/mnt/Informix`
+- importer-scoped attribution values such as `CREATED_BY` / `UPDATED_BY`
+
+## Canonical API Endpoints For Validation
+
+- Challenge API base URL: `https://api.topcoder-dev.com/v6/challenges`
+- Resource API base URL: read from `RESOURCES_API_URL` in `.env.importer.local`
+
+Workers and validators should use these canonical endpoints rather than probing localhost guesses when validating against the populated dev environment.
+
+## Runtime Boundaries
+
+- `/mnt/Informix` is a read-only legacy data source.
+- Existing v6 marathon matches are backfill-only at the challenge level.
+- Do not commit secrets from `.env.importer.local`.
+- The validation target is the existing dev environment referenced by the env file; workers should not assume they are allowed to start replacement local services.
+
+## Node / Tooling Versions
+
+- Repo root (`challenge-api-v6`): Node `22.19.0`
+- `challenge-api-v6/data-migration`: Node `18.19.0`
+- `pnpm` is installed and available (`10.32.1` during planning)
+
+Workers switching between repo root and `data-migration/` must switch Node versions in the same shell command.
+
+## Existing Local Processes Observed During Planning
+
+These are informational boundaries for worker safety:
+
+- port `3100` already has a running process; do not kill or repurpose it unless the user later explicitly asks
+- local postgres is already listening on `54329`; only use it if the env file points there
+
+## Source Data Notes
+
+- Marathon matches come from legacy `round` rows with `round_type_id='13'`.
+- Primary join path: `round -> long_component_state -> long_submission -> long_comp_result`.
+- `round_registration_*.json` is the source of submitter resources.
+- `user_*.json` resolves `coder_id` identities.