Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
5c02556
Track MM follow-up patch-mode requirements
jmgasper Apr 9, 2026
ee7db41
Guard targeted rerun challenge-id overrides
jmgasper Apr 9, 2026
5122aae
Backfill MM descriptions from legacy problem text
jmgasper Apr 9, 2026
d1c4bfd
Backfill targeted-rerun submission archives and URLs
jmgasper Apr 9, 2026
e4d4642
Merge branch 'develop' of github.com:topcoder-platform/challenge-api-…
jmgasper Apr 14, 2026
409d0b8
Track component-text fallback guidance
jmgasper Apr 14, 2026
b383a4d
Add component_text markdown fallback for MM descriptions
jmgasper Apr 14, 2026
fb72fa0
Add followup-content-archives scrutiny synthesis
jmgasper Apr 14, 2026
179714c
Make targeted-rerun description patch idempotent
jmgasper Apr 14, 2026
1506567
Record followup-content-archives scrutiny rerun
jmgasper Apr 14, 2026
945b6c1
Record follow-up user-testing blocker
jmgasper Apr 14, 2026
2bca9b0
Record followup-content-archives user-testing rerun
jmgasper Apr 15, 2026
661c08d
Record followup-content-archives no-source user-testing rerun
jmgasper Apr 15, 2026
f3b0a88
Harden provisional-score import for malformed legacy rows
jmgasper Apr 15, 2026
1d898ce
Add misc-importer-hardening scrutiny synthesis
jmgasper Apr 15, 2026
85737c9
Record misc-importer-hardening user-testing synthesis
jmgasper Apr 15, 2026
d1d5afa
Record misc-importer-hardening user-testing rerun
jmgasper Apr 15, 2026
2ba779c
Document historical MM rerun steps
jmgasper Apr 15, 2026
a745e88
Clarify historical MM rerun commands
jmgasper Apr 15, 2026
ba2a75f
Record misc-readme-docs scrutiny synthesis
jmgasper Apr 15, 2026
5970cae
Archive misc-readme-docs scrutiny rerun
jmgasper Apr 15, 2026
2eeb2bf
Record misc-readme-docs user-testing synthesis
jmgasper Apr 15, 2026
04e1930
Update for section 3 work
jmgasper Apr 15, 2026
aa882a7
Fix targeted re-run import of MMs
jmgasper Apr 16, 2026
ce53f01
Formatting updates
jmgasper Apr 16, 2026
1a49309
Fix up submission details
jmgasper Apr 16, 2026
e51028f
Fix update for submissions
jmgasper Apr 16, 2026
5c7e08c
Fix for handling rating indicator flag
jmgasper Apr 17, 2026
81367e7
Fixes for submission handling when importing historical MMs
jmgasper Apr 17, 2026
84c5c41
Updates to how we pull scores, and allow for re-runs to update scores
jmgasper Apr 17, 2026
53b73c5
Ignore BA checks on the Topgear BA
jmgasper Apr 19, 2026
642c429
Add winners to finished challenge import
jmgasper Apr 20, 2026
a54db6e
Additional score finding tweak
jmgasper Apr 20, 2026
7a5b179
Changes for winner reconcilliation
jmgasper Apr 20, 2026
7dc731b
UPdates for re-run winners / scores
jmgasper Apr 20, 2026
f11cad5
Historical score fix for MM1 and winning submission score
jmgasper Apr 20, 2026
a5b421a
Provisional vs. system test
jmgasper Apr 20, 2026
ca04106
Additional scoring fixes
jmgasper Apr 20, 2026
53579dd
Better handling of provisional flag
jmgasper Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .factory/library/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ The mission adds a reusable importer inside `challenge-api-v6/data-migration/` t
- existing v6 challenge data through the challenge DB / challenge-api schema
- existing v6 resource data through the Resource API
- existing v6 submission and review-summation data through the review DB / review-api schema
- local env configuration from `.env.importer.local`, including `SUBMISSION_ARCHIVE_DIR`

### Write surfaces

- Challenge and ChallengePhase records in the challenge DB
- submitter Resource records through the Resource API
- Submission and ReviewSummation records in the review DB
- local submission archive zip files under `SUBMISSION_ARCHIVE_DIR`

## Import Pipeline

Expand Down Expand Up @@ -60,6 +62,17 @@ For each selected round:

Created challenges must use `challenge.legacyId = round.id`. Reused challenges are not challenge-level rewrite candidates; they must already be matched unambiguously by the rule above or remain `unresolved`.

### Challenge description sourcing

Challenge description content comes from the legacy `round -> round_component -> component -> problem` mapping:

- when a selected round maps to a legacy `problem` row with non-empty `problem_text`, persist that raw HTML as the v6 challenge description
- when `problem_text` is absent or unusable but the mapped `component.component_text` exists, convert that XML into best-effort Markdown and persist the converted Markdown as the v6 challenge description
- the XML fallback should keep user-facing content and avoid storing raw XML wrappers or wholesale hidden/internal test cases; if test cases are rendered, keep public/example-style cases only
- when neither a usable `problem_text` nor a usable `component_text` conversion is available, retain the existing placeholder/fallback description behavior
- on standard reuse/backfill runs, preserve existing challenge-level fields other than the approved follow-up description patch
- on targeted rerun patch mode, description overwrite is allowed only when the caller provides an explicit existing challenge-id override

### 3. Phase materialization

Canonical MM history in v6 is represented by exactly three standard phases:
Expand Down Expand Up @@ -122,6 +135,16 @@ Only non-example legacy submissions are imported. The importer must preserve the

**Stable submission identity invariant:** imported `Submission.legacySubmissionId` must be a deterministic composite derived from legacy submission identity so round-wide and rerun validation can compare exact sets. The contract assumes `legacySubmissionId` is the stable external identity for imported submissions.

### Submission archive backfill

Imported/reused submissions also participate in a deterministic archive backfill flow:

- load legacy submission text from the same submission identity used for `legacySubmissionId`, preferring the main long-submission text field and only falling back to secondary legacy text fields when needed
- build a deterministic archive filename from stable submission identity so reruns converge on the same local file and the same `submission.url`
- write a zip file containing a single text file with the recovered legacy submission text under `SUBMISSION_ARCHIVE_DIR`
- set `submission.url` to the delayed-upload target format `https://s3.amazonaws.com/topcoder-submissions/<archive-file-name>`
- on reruns, treat archive generation plus URL update as reconciliation work: recreate/refresh only as needed without duplicating submission rows

### 6. Score materialization

Two score streams are imported:
Expand All @@ -148,6 +171,7 @@ These are core safety invariants:

- existing v6 marathon challenges are source of truth for challenge-level fields
- backfill may add missing linked records only
- the approved follow-up patch mode may additionally overwrite challenge `description` and submission archive/url data, but nothing else
- already-present standard phase rows on reused challenges are preserved
- reruns must not duplicate challenges, phases, resources, submissions, or review summations
- example submissions and example review summations are never imported
Expand All @@ -165,6 +189,12 @@ The observable result of rerunning a partially imported round should be reconcil

If a temporary status-transition workflow is used during participant backfill, reruns must still converge to the same final completed state.

Targeted rerun patch mode is deliberately narrow and explicit:

- it requires an explicit existing challenge-id override
- it may patch only the challenge description plus submission archive/url data for the selected round
- it must not recreate submissions or mutate resource/review/phase state outside the approved patch surfaces

## Data Ownership Invariants

### Challenge DB
Expand All @@ -188,6 +218,14 @@ Owns:
- imported submissions
- provisional review summations per submission
- final review summations attached to the latest imported non-example submission per member
- the `submission.url` field pointing at the deterministic archive path

### Local filesystem (`SUBMISSION_ARCHIVE_DIR`)

Owns:

- generated zip archives for legacy submission text
- deterministic archive filenames used to derive `submission.url`

## Validation-Oriented Invariants

Expand All @@ -197,5 +235,7 @@ The validation contract relies on these high-level invariants being preserved:
- a score-rich Marathon Match fixture is selected during score-feature work for final-ranking validation
- round `14272` is the second selected round for multi-round blast-radius checks
- imported submission identity is externally testable via `legacySubmissionId`
- imported description sourcing is externally testable via raw HTML challenge description reads
- imported archive backfill is externally testable via `submission.url` plus local zip inspection
- reused-round verification depends on comparing both identity sets and externally visible field snapshots
- for member-owned surfaces, validation now reconciles `imported subset + skipped missing-member subset = legacy total`
7 changes: 7 additions & 0 deletions .factory/library/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Required values:
- `MEMBER_DB_SCHEMA` — schema used for member lookup tables (default behavior is code-defined; validators should set it explicitly when member data is not reachable through the challenge schema)
- `REVIEW_DB_URL` — review DB used for submissions and review summations
- `RESOURCES_API_URL` — base URL for Resource API writes and reads
- `SUBMISSION_ARCHIVE_DIR` — local directory where submission archive zip files are created during submission URL backfill / targeted reruns
- `AUTH0_URL`
- `AUTH0_AUDIENCE`
- `AUTH0_CLIENT_ID`
Expand All @@ -28,6 +29,8 @@ Optional / useful values:
- `DATA_DIRECTORY=/mnt/Informix`
- importer-scoped attribution values such as `CREATED_BY` / `UPDATED_BY`

`SUBMISSION_ARCHIVE_DIR` must point at a writable local folder. Generated archives are local-only for this mission; workers must not upload them or assume the S3 path in `submission.url` is live.

## Canonical API Endpoints For Validation

- Challenge API base URL: `https://api.topcoder-dev.com/v6/challenges`
Expand All @@ -39,6 +42,7 @@ Workers and validators should use these canonical endpoints rather than probing

- `/mnt/Informix` is a read-only legacy data source.
- Existing v6 marathon matches are backfill-only at the challenge level.
- Follow-up targeted rerun mode may overwrite only challenge descriptions plus submission archive/url data, and only when explicitly invoked with an existing challenge-id override.
- Do not commit secrets from `.env.importer.local`.
- The validation target is the existing dev environment referenced by the env file; workers should not assume they are allowed to start replacement local services.

Expand All @@ -61,5 +65,8 @@ These are informational boundaries for worker safety:

- Marathon matches come from legacy `round` rows with `round_type_id='13'`.
- Primary join path: `round -> long_component_state -> long_submission -> long_comp_result`.
- Challenge description backfill uses the legacy `round -> round_component -> component -> problem` mapping.
- Description-source precedence is: raw `problem.problem_text` HTML first, then best-effort Markdown converted from `component.component_text` XML, then placeholder/preserve behavior only when neither source is usable.
- `round_registration_*.json` is the source of submitter resources.
- Submission archive content comes from legacy submission text fields associated with the imported non-example submissions.
- `user_*.json` resolves `coder_id` identities.
19 changes: 19 additions & 0 deletions .factory/library/legacy-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ Legacy source facts that workers should reuse instead of rediscovering.
## Primary Files

- `round_1.json`
- `round_component_1.json`
- `round_registration_*.json`
- `component_1.json`
- `problem_1.json`
- `long_component_state_1.json`
- `long_submission_*.json`
- `long_comp_result_*.json`
Expand All @@ -27,6 +30,10 @@ Use this legacy relationship when deriving participant/submission/final-score da

- `round -> long_component_state -> long_submission -> long_comp_result`

Use this legacy relationship when deriving challenge descriptions:

- `round -> round_component -> component -> problem`

## Resource Source

- submitter resources come from `round_registration_*.json`
Expand All @@ -38,6 +45,16 @@ Use this legacy relationship when deriving participant/submission/final-score da
- import full **non-example** history only
- example submissions are excluded from imported submissions and imported score history
- imported `Submission.legacySubmissionId` must be deterministic and stable across reruns
- submission archive text should prefer the primary legacy submission body field from `long_submission`; only fall back to secondary legacy submission text when that preferred body is absent
- generated archive filenames and derived `submission.url` values must remain deterministic across reruns

## Description Rules

- when the mapped `problem.problem_text` is non-empty, use that raw HTML as the challenge description
- when `problem_text` is empty/unusable but `component.component_text` exists, convert that XML into best-effort Markdown for the challenge description
- do not store raw XML wrappers in the description, and do not dump hidden/internal test cases wholesale; if test cases are rendered from XML fallback, keep public/example-style content only
- when neither source is usable or the round does not map cleanly, fall back to the importer's placeholder/preserve behavior
- component-level description lookup must stay round-scoped through `round_component`; `component_id` values can be reused across multiple rounds

### Named participant fixture

Expand Down Expand Up @@ -65,7 +82,9 @@ Use this legacy relationship when deriving participant/submission/final-score da
## Fixture Rounds

- `10815`: `836` eligible registrations, `1445` non-example submissions, `2424` example submissions, `267` submitters with non-example history, and fallback-heavy final-score behavior; in the current target-member snapshot this round plans `283` final candidates split into `266` importable finals, `2` missing-member final skips, and `15` explicit `finalist-without-attachable-submission` skips. Treat this as the selected unattachable-finalists fixture for score validation.
- `10758`: Marathon Match round with `round_component.component_id=6775`, `problem_id=7542`, empty `problem.problem_text`, and populated `component.component_text` for `RobotRouting`. Use this as the primary create-path fixture for XML-to-Markdown description fallback validation. The component XML is large and includes many hidden/internal test cases, so conversion rules must stay user-facing.
- `17948`: selected score-rich Marathon Match fixture for final-score validation. Current planning/apply evidence for this round yields `81` legacy final candidates with `45` importable finals, `36` `missing-member` final skips, and `0` explicit `finalist-without-attachable-submission` skips. Imported finals on this fixture are `system_point_total`-backed and preserve legacy placement order when sorted by aggregate score descending after excluding missing-member finalists.
- `10015`: already-imported Marathon Match fixture observed with a placeholder v6 description despite having legacy problem text available; use this fixture for description overwrite and targeted rerun validation when it remains available in the shared dev environment.
- `13897`: remains a useful large MM backfill fixture, but it is **not** the selected score-rich placement fixture because it currently includes `33` explicit `finalist-without-attachable-submission` skips.
- `14272`: second selected-round filter fixture; current validation guidance treats it as an unresolved/non-Marathon-Match round rather than an importable Marathon Match target
- `10089` and `10722` remain non-Marathon in current planning and should not be used as Marathon Match score fixtures.
Expand Down
Loading
Loading