Auto-retry live-API tests on transient upstream errors#277
Merged
Conversation
DOI-USGS#273 surfaced a transient HTTP 502 from upstream on its merge-to-main CI run that pre-PR-273 would have been silently swallowed by _walk_pages and turned into an empty DataFrame. The status-code-aware behavior is correct for users, but it makes every test that hits the live USGS Water Data API susceptible to flaking on a transient blip. This PR adds: - pytest-rerunfailures to the `test` optional dependency set. - tests/conftest.py that (a) registers a `live` pytest marker; (b) auto-applies it to every test that does not take `requests_mock` as a fixture (the existing mock-driven convention in this repo); and (c) configures live-marked tests to retry up to twice on a 5-second backoff — but ONLY when the failure trace matches one of a narrow set of transient-upstream patterns: `429:` / `5xx:` prefixes from `_raise_for_non_200`, `ConnectionError` shapes, and timeout strings from the requests/urllib3 stack. Library bugs raising other exception types are not retried — the `only_rerun` pattern set deliberately matches on the failure-text shape produced by `_raise_for_non_200` rather than on generic `RuntimeError`, so a real bug fails on the first attempt. Verified: pytest collection identifies 59 of 61 tests in waterdata_test.py as live (the two `*_mock_*` tests correctly stay unmarked). Offline tests (`_get_args`, `_check_*`, `_normalize_str_*`, `_construct_api_requests_*`) pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify-review consensus across three reviewers (reuse / quality / efficiency) on the conftest.py from the previous commit: - Drop the `pytest.mark.live` marker double-application: no `-m live` selector is wired anywhere in CI; only `pytest.mark.flaky` carries behavior. The `live` marker plus its `pytest_configure` registration was dead metadata. - Tighten `_TRANSIENT_PATTERNS`: drop `r"timed out"` (subsumed by the `Timeout` arm) and `r"Bad Gateway|Service Unavailable|Gateway Timeout"` (subsumed by the `RuntimeError:\s*5\d\d:` prefix from `_raise_for_non_200`). Three load-bearing patterns left. - Drop defensive `getattr(item, "fixturenames", ())` — pytest items reliably expose `fixturenames`; the default-tuple fallback is dead defense. - Shorten the module docstring: keep the WHY paragraph (PR DOI-USGS#273 → surfaced errors → CI flake risk) and drop the WHAT bullets that narrate the code below. - Add a one-line justification for the 5-second backoff (USGS upstream typically recovers from brief 5xx within that window). - Tighten the NEWS.md entry: drop conftest implementation detail (registers a marker, applies via collection hook, …) and keep the user-relevant claim. Skipped from the review (reviewer can decide): - Inverting the auto-mark heuristic to opt-in `@pytest.mark.live` would force decorating ~35 test functions; the current "everything not using requests_mock is live" is zero-cost on success and self-extending for new tests. - Centralizing the transient-error regex set in `dataretrieval/ waterdata/utils.py` would couple library code to test infrastructure for one pattern. Separately, bump GitHub Actions to Node-24-compatible versions (@v6 across all three workflows) ahead of the deprecation warning surfaced on the previous CI run: actions/checkout@v4 → v6, actions/setup-python@v5 → v6. Node 20 is deprecated and will be removed from runners on 2026-09-16; the default flips on 2026-06-02. NEWS.md entry updated to mention the bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR exists only to keep existing tests green against transient upstream blips; no user-facing behavior changes. NEWS entries are for library behavior changes, not CI plumbing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce CI flakiness from transient upstream failures in live USGS Water Data API tests by automatically re-running only those tests when failures match narrow “transient” error signatures. It also updates GitHub Actions workflow action versions.
Changes:
- Add
tests/conftest.pyto mark non-requests_mocktests asflakyand auto-rerun them (with delay) only on transient upstream/network error patterns. - Add
pytest-rerunfailuresto thetestoptional dependency set. - Bump
actions/checkoutandactions/setup-pythonversions across workflows.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/conftest.py | Adds collection hook to auto-mark “live” tests for conditional reruns on transient errors. |
| pyproject.toml | Adds pytest-rerunfailures to test extras. |
| .github/workflows/sphinx-docs.yml | Updates checkout/setup-python action versions. |
| .github/workflows/python-publish.yml | Updates checkout/setup-python action versions (but see review comment re: YAML validity). |
| .github/workflows/python-package.yml | Updates checkout/setup-python action versions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The "auto-mark every test that doesn't take requests_mock" heuristic
was overkill — out of 244 collected tests it marked 200, including
every pure unit test of internal helpers (rdb parsing, _to_str, CQL
filter splitting, NWIS_Metadata setters, validation logic, etc).
Marker over-application is cheap on the success path but conveys
the wrong intent and is one more thing to keep in sync.
Refined heuristic: a test is live iff
(a) it does not take requests_mock as a fixture, AND
(b) its function-body source references one of the package's public
user-facing getters (the _PUBLIC_GETTERS set — all `get_*` /
`what_*` entry points exposed by waterdata, wqp, samples, nadp,
streamstats, nldi, nwis, and ngwmn modules).
The set is maintained explicitly because there's no programmatic way
for the conftest to know "what counts as a user-facing entry point";
adding a new public getter means adding it here. A regex with `\b`
word boundaries avoids matching internal helpers that happen to
share a prefix (e.g. `_get_args` won't match `get_args`).
Dry-run against the current test inventory:
- Old heuristic marked: 200 / 244
- New heuristic marked: 89 / 244
- Now-unmarked unit tests: 111
Spot-checked the 111 de-marked tests — all are genuinely offline:
rdb_test.py parsers, utils_test.py to_str/BaseMetadata/datetime
helpers, waterdata_filters_test.py CQL-filter logic, nldi_test.py
validators, nwis_test.py metadata setters, etc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/ngwmn_test.py is local-only work not yet in the repo. The ngwmn module itself is committed, but no upstream tests reference it through the conftest's selection path, so listing get_data/get_levels/get_sites in _PUBLIC_GETTERS adds maintenance surface for no upstream benefit. Add these back if/when ngwmn tests land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR DOI-USGS#273 fixed pagination loops in dataretrieval.waterdata.utils specifically; other modules (wqp, samples, nadp, streamstats, nldi, nwis) reach the network through different paths and don't share that regression. Narrow _PUBLIC_GETTERS to waterdata's 16 user-facing getters and add a tests/waterdata file-path scope check, which also prevents name-collisions with legacy nwis getters (get_ratings, get_daily) from spuriously marking nwis tests. After this change, the retry marker is applied to 55 tests across tests/waterdata_test.py, tests/waterdata_nearest_test.py, tests/waterdata_filters_test.py, and tests/waterdata_ratings_test.py, with zero matches outside that directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous conftest hook auto-applied pytest.mark.flaky to tests by inspecting function source for references to a central _PUBLIC_GETTERS frozenset. This was implicit and required keeping that set in sync with the public API. Switch to explicit opt-in: drop conftest.py and declare the marker at the top of waterdata_test.py via pytestmark. The other waterdata_*_test.py files are entirely mocked or pure unit tests and need no retry. The retry config (reruns count, delay, transient-error trace patterns) is co-located with the tests it protects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make explicit that the marker is attached to every test in the module, but the only_rerun patterns are tied to real-network failure traces, so mocked tests are no-ops for the rerun by construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After #273 merged, the next CI run on
mainfailed with a transientHTTP 502: Bad Gatewayfrom the live USGS Water Data API surfacing throughtest_get_field_measurements(run). The 502 was real upstream behavior — pre-#273,_walk_pageswould have silently swallowed it and the test would have passed on truncated data; post-#273,_raise_for_non_200correctly propagates it.That's the intended behavior for users, but it means every existing test that hits the live API is now flaky against any transient blip the USGS API has. There are ~25 such tests in
waterdata_test.py. This PR makes the CI test suite tolerant of those transients — no library code changes, no new tests, only test infrastructure.Fix
tests/waterdata_test.py:pytestmark = pytest.mark.flaky(...)declaring the retry policy explicitly at the file scope where live tests live. Opt-in is per-file, not auto-applied by a global collection hook.pytest-rerunfailuresto retry up to twice on a 5-second backoff, but only when the failure trace matches a narrow set of transient-upstream patterns:RuntimeError: 429:/5\d\d:(from_raise_for_non_200),ConnectionError, andTimeoutshapes from therequestsstack.only_rerunset deliberately omits genericRuntimeError.only_rerunregex matches only traces from real network round-trips, mocked tests (those usingrequests_mockormock.patch) are no-ops for the rerun by construction.pyproject.toml: addspytest-rerunfailuresto thetestoptional-deps set.Also bumps
actions/checkout@v4 → @v6andactions/setup-python@v5 → @v6across all three workflows ahead of the GitHub Actions Node 20 deprecation (warning surfaced on the same CI run).Design note
An earlier revision of this PR placed the retry logic in
tests/conftest.pywith apytest_collection_modifyitemshook that auto-applied the marker based on a_PUBLIC_GETTERSsource-inspection heuristic. That approach was rejected during review in favor of explicit module-level opt-in (commit 047130f): the test author who imports the live API declares the marker at the top of the file. Trade-off is intentional — small amount of "marker on a mocked test" in exchange for not maintaining implicit auto-marking infrastructure.What this PR is NOT
Test plan
pip install -e '.[test]'resolves cleanly with the new deppytest --collect-onlyshows the marker attached to 61 parametrize-expanded items inwaterdata_test.pyand zero items outside itwaterdata_test.pypass unchanged🤖 Generated with Claude Code