Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
cd70929
feat(waterdata): Add multi-value GET-parameter chunker for OGC API
thodson-usgs May 17, 2026
46335b6
fix(waterdata): Reject smuggled lists for scalar-contract chunker inputs
thodson-usgs May 18, 2026
9bc342b
refactor(waterdata): Unify list and filter chunkers into one joint pl…
thodson-usgs May 18, 2026
10858e9
refactor(waterdata): Share a single URL-byte sizing primitive across …
thodson-usgs May 18, 2026
4e82722
refactor(waterdata): Tighten the joint chunker
thodson-usgs May 18, 2026
ee550be
refactor(waterdata): Polish — extract _resolve_max_chunks, tidy iter_…
thodson-usgs May 18, 2026
f1588ae
docs(waterdata): Frame _NEVER_CHUNK as exceptions to a default-chunk …
thodson-usgs May 18, 2026
493e4eb
test(waterdata): Add offline stress test for the joint chunker
thodson-usgs May 18, 2026
1348304
perf(test): Cut stress test wall-clock 55% — capture URL bytes inline…
thodson-usgs May 18, 2026
f16555d
refactor(waterdata): Address PR #283 review — relocate chunker helper…
thodson-usgs May 18, 2026
eeba277
docs(tests): Drop stale "two-decorator design" references in test prose
thodson-usgs May 18, 2026
01e579e
refactor(waterdata): Replace static max_chunks/safety_floor with dyna…
thodson-usgs May 19, 2026
592c207
test(waterdata): Split chunker tests into tests/waterdata_chunking_te…
thodson-usgs May 19, 2026
f615db8
test(waterdata): Drop tests/stress_chunker.py — invariants now covere…
thodson-usgs May 19, 2026
c475452
refactor(waterdata): /simplify pass — typed RateLimited exception, dr…
thodson-usgs May 19, 2026
24fd158
refactor(waterdata): Extract ChunkPlan + _ChunkExecution; unify passt…
thodson-usgs May 19, 2026
5d931fa
refactor(waterdata): /simplify pass on ChunkPlan — skip work on the p…
thodson-usgs May 19, 2026
7850186
refactor(waterdata): Hone OO shape — ChunkPlan.__init__, _ChunkExecut…
thodson-usgs May 19, 2026
bf8ac69
feat(waterdata): Add QuotaExhausted.resume() — pick up after a mid-ca…
thodson-usgs May 19, 2026
79ba407
test(waterdata): Add resume-equivalence integration test for the chunker
thodson-usgs May 19, 2026
1bc80d1
refactor(waterdata): Unify list-dim and filter chunking under a singl…
thodson-usgs May 19, 2026
b7206d8
docs(waterdata): Sync chunker docs with the uniform-axis design; add …
thodson-usgs May 19, 2026
8c4dd19
docs(waterdata): Polish pass — numpy-style docstrings, unified termin…
thodson-usgs May 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. Every multi-value list parameter and the cql-text `filter` (split on its top-level `OR`s) is modeled as a chunkable axis; the planner greedy-halves the biggest chunk across all axes until each sub-request URL fits. After the first sub-request the chunker reads `x-ratelimit-remaining`; if the rest of the plan won't fit the window it raises `RequestExceedsQuota` reporting the deficit. Mid-call transient failures (429 or 5xx) surface as a `ChunkInterrupted` subclass — `QuotaExhausted` for 429, `ServiceInterrupted` for 5xx — carrying the partial result and a `.resume()` method that continues only the still-pending sub-requests once the underlying condition clears. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N axes. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the *last* page's / sub-request's headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.

**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).

**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
Expand Down
15 changes: 15 additions & 0 deletions dataretrieval/waterdata/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,21 @@ def get_daily(
... parameter_code="00060",
... last_modified="P7D",
... )

>>> # Chain queries: pull all stream sites in a state, then their
>>> # daily discharge for the last week. The site list can be hundreds
>>> # of values long — the request is transparently chunked across
>>> # multiple sub-requests so the URL stays under the server's byte
>>> # limit. Combined output looks like a single query.
>>> sites_df, _ = dataretrieval.waterdata.get_monitoring_locations(
... state_name="Ohio",
... site_type="Stream",
... )
>>> df, md = dataretrieval.waterdata.get_daily(
... monitoring_location_id=sites_df["monitoring_location_id"].tolist(),
... parameter_code="00060",
... time="P7D",
... )
"""
service = "daily"
output_id = "daily_id"
Expand Down
Loading
Loading