Skip to content

Auto-retry live-API tests on transient upstream errors#277

Merged
thodson-usgs merged 8 commits into
DOI-USGS:mainfrom
thodson-usgs:fix/live-test-retries
May 15, 2026
Merged

Auto-retry live-API tests on transient upstream errors#277
thodson-usgs merged 8 commits into
DOI-USGS:mainfrom
thodson-usgs:fix/live-test-retries

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 15, 2026

Summary

After #273 merged, the next CI run on main failed with a transient HTTP 502: Bad Gateway from the live USGS Water Data API surfacing through test_get_field_measurements (run). The 502 was real upstream behavior — pre-#273, _walk_pages would have silently swallowed it and the test would have passed on truncated data; post-#273, _raise_for_non_200 correctly propagates it.

That's the intended behavior for users, but it means every existing test that hits the live API is now flaky against any transient blip the USGS API has. There are ~25 such tests in waterdata_test.py. This PR makes the CI test suite tolerant of those transients — no library code changes, no new tests, only test infrastructure.

Fix

tests/waterdata_test.py:

  • Adds a module-level pytestmark = pytest.mark.flaky(...) declaring the retry policy explicitly at the file scope where live tests live. Opt-in is per-file, not auto-applied by a global collection hook.
  • Configures pytest-rerunfailures to retry up to twice on a 5-second backoff, but only when the failure trace matches a narrow set of transient-upstream patterns: RuntimeError: 429: / 5\d\d: (from _raise_for_non_200), ConnectionError, and Timeout shapes from the requests stack.
  • Library bugs that raise other exception types fail on the first attempt — the only_rerun set deliberately omits generic RuntimeError.
  • The marker is attached to every test in the module, but because the only_rerun regex matches only traces from real network round-trips, mocked tests (those using requests_mock or mock.patch) are no-ops for the rerun by construction.

pyproject.toml: adds pytest-rerunfailures to the test optional-deps set.

Also bumps actions/checkout@v4 → @v6 and actions/setup-python@v5 → @v6 across all three workflows ahead of the GitHub Actions Node 20 deprecation (warning surfaced on the same CI run).

Design note

An earlier revision of this PR placed the retry logic in tests/conftest.py with a pytest_collection_modifyitems hook that auto-applied the marker based on a _PUBLIC_GETTERS source-inspection heuristic. That approach was rejected during review in favor of explicit module-level opt-in (commit 047130f): the test author who imports the live API declares the marker at the top of the file. Trade-off is intentional — small amount of "marker on a mocked test" in exchange for not maintaining implicit auto-marking infrastructure.

What this PR is NOT

  • No library behavior changes — only test infrastructure.
  • No new tests — the existing tests are the thing being kept green; adding tests for the marker itself would be infrastructure-on-infrastructure.
  • No NEWS entry — CI plumbing isn't user-facing.

Test plan

  • pip install -e '.[test]' resolves cleanly with the new dep
  • pytest --collect-only shows the marker attached to 61 parametrize-expanded items in waterdata_test.py and zero items outside it
  • All offline tests in waterdata_test.py pass unchanged
  • CI on this branch validates end-to-end against real CI infrastructure

🤖 Generated with Claude Code

thodson-usgs and others added 3 commits May 15, 2026 09:57
DOI-USGS#273 surfaced a transient HTTP 502 from upstream on its merge-to-main
CI run that pre-PR-273 would have been silently swallowed by
_walk_pages and turned into an empty DataFrame. The status-code-aware
behavior is correct for users, but it makes every test that hits the
live USGS Water Data API susceptible to flaking on a transient blip.

This PR adds:

- pytest-rerunfailures to the `test` optional dependency set.
- tests/conftest.py that (a) registers a `live` pytest marker; (b)
  auto-applies it to every test that does not take `requests_mock` as
  a fixture (the existing mock-driven convention in this repo); and
  (c) configures live-marked tests to retry up to twice on a 5-second
  backoff — but ONLY when the failure trace matches one of a narrow
  set of transient-upstream patterns: `429:` / `5xx:` prefixes from
  `_raise_for_non_200`, `ConnectionError` shapes, and timeout strings
  from the requests/urllib3 stack.

Library bugs raising other exception types are not retried — the
`only_rerun` pattern set deliberately matches on the failure-text
shape produced by `_raise_for_non_200` rather than on generic
`RuntimeError`, so a real bug fails on the first attempt.

Verified: pytest collection identifies 59 of 61 tests in
waterdata_test.py as live (the two `*_mock_*` tests correctly stay
unmarked). Offline tests (`_get_args`, `_check_*`, `_normalize_str_*`,
`_construct_api_requests_*`) pass unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify-review consensus across three reviewers (reuse / quality /
efficiency) on the conftest.py from the previous commit:

- Drop the `pytest.mark.live` marker double-application: no `-m live`
  selector is wired anywhere in CI; only `pytest.mark.flaky` carries
  behavior. The `live` marker plus its `pytest_configure` registration
  was dead metadata.
- Tighten `_TRANSIENT_PATTERNS`: drop `r"timed out"` (subsumed by the
  `Timeout` arm) and `r"Bad Gateway|Service Unavailable|Gateway Timeout"`
  (subsumed by the `RuntimeError:\s*5\d\d:` prefix from
  `_raise_for_non_200`). Three load-bearing patterns left.
- Drop defensive `getattr(item, "fixturenames", ())` — pytest items
  reliably expose `fixturenames`; the default-tuple fallback is dead
  defense.
- Shorten the module docstring: keep the WHY paragraph (PR DOI-USGS#273 →
  surfaced errors → CI flake risk) and drop the WHAT bullets that
  narrate the code below.
- Add a one-line justification for the 5-second backoff (USGS
  upstream typically recovers from brief 5xx within that window).
- Tighten the NEWS.md entry: drop conftest implementation detail
  (registers a marker, applies via collection hook, …) and keep the
  user-relevant claim.

Skipped from the review (reviewer can decide):
- Inverting the auto-mark heuristic to opt-in `@pytest.mark.live`
  would force decorating ~35 test functions; the current
  "everything not using requests_mock is live" is zero-cost on
  success and self-extending for new tests.
- Centralizing the transient-error regex set in `dataretrieval/
  waterdata/utils.py` would couple library code to test
  infrastructure for one pattern.

Separately, bump GitHub Actions to Node-24-compatible versions
(@v6 across all three workflows) ahead of the deprecation
warning surfaced on the previous CI run: actions/checkout@v4 → v6,
actions/setup-python@v5 → v6. Node 20 is deprecated and will be
removed from runners on 2026-09-16; the default flips on
2026-06-02. NEWS.md entry updated to mention the bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR exists only to keep existing tests green against transient
upstream blips; no user-facing behavior changes. NEWS entries are for
library behavior changes, not CI plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce CI flakiness from transient upstream failures in live USGS Water Data API tests by automatically re-running only those tests when failures match narrow “transient” error signatures. It also updates GitHub Actions workflow action versions.

Changes:

  • Add tests/conftest.py to mark non-requests_mock tests as flaky and auto-rerun them (with delay) only on transient upstream/network error patterns.
  • Add pytest-rerunfailures to the test optional dependency set.
  • Bump actions/checkout and actions/setup-python versions across workflows.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/conftest.py Adds collection hook to auto-mark “live” tests for conditional reruns on transient errors.
pyproject.toml Adds pytest-rerunfailures to test extras.
.github/workflows/sphinx-docs.yml Updates checkout/setup-python action versions.
.github/workflows/python-publish.yml Updates checkout/setup-python action versions (but see review comment re: YAML validity).
.github/workflows/python-package.yml Updates checkout/setup-python action versions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/python-publish.yml
thodson-usgs and others added 4 commits May 15, 2026 10:12
The "auto-mark every test that doesn't take requests_mock" heuristic
was overkill — out of 244 collected tests it marked 200, including
every pure unit test of internal helpers (rdb parsing, _to_str, CQL
filter splitting, NWIS_Metadata setters, validation logic, etc).
Marker over-application is cheap on the success path but conveys
the wrong intent and is one more thing to keep in sync.

Refined heuristic: a test is live iff
  (a) it does not take requests_mock as a fixture, AND
  (b) its function-body source references one of the package's public
      user-facing getters (the _PUBLIC_GETTERS set — all `get_*` /
      `what_*` entry points exposed by waterdata, wqp, samples, nadp,
      streamstats, nldi, nwis, and ngwmn modules).

The set is maintained explicitly because there's no programmatic way
for the conftest to know "what counts as a user-facing entry point";
adding a new public getter means adding it here. A regex with `\b`
word boundaries avoids matching internal helpers that happen to
share a prefix (e.g. `_get_args` won't match `get_args`).

Dry-run against the current test inventory:
  - Old heuristic marked: 200 / 244
  - New heuristic marked:  89 / 244
  - Now-unmarked unit tests: 111

Spot-checked the 111 de-marked tests — all are genuinely offline:
rdb_test.py parsers, utils_test.py to_str/BaseMetadata/datetime
helpers, waterdata_filters_test.py CQL-filter logic, nldi_test.py
validators, nwis_test.py metadata setters, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/ngwmn_test.py is local-only work not yet in the repo. The ngwmn
module itself is committed, but no upstream tests reference it through
the conftest's selection path, so listing get_data/get_levels/get_sites
in _PUBLIC_GETTERS adds maintenance surface for no upstream benefit.
Add these back if/when ngwmn tests land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR DOI-USGS#273 fixed pagination loops in dataretrieval.waterdata.utils
specifically; other modules (wqp, samples, nadp, streamstats, nldi,
nwis) reach the network through different paths and don't share that
regression. Narrow _PUBLIC_GETTERS to waterdata's 16 user-facing
getters and add a tests/waterdata file-path scope check, which also
prevents name-collisions with legacy nwis getters (get_ratings,
get_daily) from spuriously marking nwis tests.

After this change, the retry marker is applied to 55 tests across
tests/waterdata_test.py, tests/waterdata_nearest_test.py,
tests/waterdata_filters_test.py, and tests/waterdata_ratings_test.py,
with zero matches outside that directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous conftest hook auto-applied pytest.mark.flaky to tests
by inspecting function source for references to a central
_PUBLIC_GETTERS frozenset. This was implicit and required keeping
that set in sync with the public API. Switch to explicit opt-in: drop
conftest.py and declare the marker at the top of waterdata_test.py
via pytestmark. The other waterdata_*_test.py files are entirely
mocked or pure unit tests and need no retry.

The retry config (reruns count, delay, transient-error trace
patterns) is co-located with the tests it protects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread tests/waterdata_test.py Outdated
Make explicit that the marker is attached to every test in the
module, but the only_rerun patterns are tied to real-network failure
traces, so mocked tests are no-ops for the rerun by construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread tests/waterdata_test.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@thodson-usgs thodson-usgs marked this pull request as ready for review May 15, 2026 18:49
@thodson-usgs thodson-usgs merged commit c3bc2b8 into DOI-USGS:main May 15, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants