fix: correct pii_screening config-key typo + documentation-audit findings by jqnatividad · Pull Request #324 · dathere/datapusher-plus

jqnatividad · 2026-05-20T03:08:25Z

Summary

A documentation audit of datapusher-plus (~70 claims checked across README.md, CONFIG.md, docs/) surfaced one functional bug and several doc-vs-code inconsistencies. This PR fixes them in two focused commits.

Commit 1 — `fix:` the `pii_screening` config-key typo (functional bug)

config.py read PII_SCREENING from ckanext.datastore_plus.pii_screening — a namespace typo (datastore_plus instead of datapusher_plus). The documented key ckanext.datapusher_plus.pii_screening (correctly declared in config_declaration.yaml, documented in the README) never populated PII_SCREENING — it silently stayed at its False fallback, so PII screening could not be enabled as documented.

Fixed the key string in config.py.
Added tests/test_pii_screening_config_key.py (3 tests): documented key now drives PII_SCREENING, the old typo'd key no longer does, and an AST-based drift guard pins the config.py key string to the config_declaration.yaml declaration.

Commit 2 — `docs:` correct documentation-audit findings

config.py preview_rows fallback "1000" → "0" to agree with config_declaration.yaml (default 0). Under CKAN 2.10+ declarative config the declared default already wins, so 0 was the effective default all along — no runtime behavior change, just removes a stale never-reached fallback.
README — removed the stale describeGPT_api_key example line (no such key; v3.0 moved AI credentials to qsv's config / OPENAI_API_KEY).
README — formats example: dropped xlsxb (typo for xlsb) and xlsm to match config.py's actual FORMATS default. (Note: format_converter.py can convert .xlsm/.xlsb, but the FORMATS gate omits them — adding them is a separate feature decision.)
README — added an "illustrative example, not defaults" note above both ckan.ini config blocks, which silently contradicted actual defaults for several keys (preview_rows, chunk_size, dedup, auto_index_threshold, ignore_file_hash, auto_alias_unique).
CONFIG.md — added the two missing resource_form.html DRUF template-override entries (DP+ ships 5, the doc listed 3).

Deliberately out of scope

Value-by-value reconciliation of the README example config blocks. Some non-default values (e.g. ignore_file_hash = true, dedup = false) may be deliberate install-guide recommendations rather than stale drift — left to a maintainer pass. The added note resolves the immediate "silently contradicts defaults" honesty problem.
Adding xlsm/xlsb to the FORMATS default — a feature decision, not a doc fix.

Test plan

tests/test_pii_screening_config_key.py — 3 new tests, all passing.
Full Python unit suite: 274 → 277 passing.
preview_rows change verified behavior-preserving (existing tests monkeypatch PREVIEW_ROWS explicitly; declared default already 0).

🤖 Generated with Claude Code

A documentation audit found that config.py read PII_SCREENING from ``ckanext.datastore_plus.pii_screening`` — a typo: ``datastore_plus`` instead of ``datapusher_plus``. Every other DP+ setting, the ``config_declaration.yaml`` declaration, and the README all use the ``datapusher_plus`` namespace. Practical effect: operators enabling PII screening via the documented key ``ckanext.datapusher_plus.pii_screening`` had no effect — ``PII_SCREENING`` silently stayed at its ``False`` fallback, so the PII-screening feature could not be turned on as documented. Only someone who happened to set the (undocumented, typo'd) key ``ckanext.datastore_plus.pii_screening`` would have enabled it. Fix: read the documented ``ckanext.datapusher_plus.pii_screening`` key. Adds tests/test_pii_screening_config_key.py with 3 regression tests: the documented key now drives PII_SCREENING, the old typo'd key no longer does, and an AST-based drift guard that pins the config.py key string to the config_declaration.yaml declaration so a reintroduced namespace typo fails CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to the documentation audit. Addresses the doc-side findings (the pii_screening code bug is fixed separately): * **config.py — preview_rows fallback**: aligned the inline ``tk.config.get`` fallback from ``"1000"`` to ``"0"`` so it agrees with ``config_declaration.yaml`` (default ``0``). Under CKAN 2.10+ declarative config the declared default already wins, so ``0`` was the effective runtime default all along — the ``"1000"`` fallback was stale, never-reached, and contradicted the declaration. No runtime behavior change. * **README.md — stale describeGPT_api_key**: removed the ``ckanext.datapusher_plus.describeGPT_api_key`` example line. No such config key exists; v3.0 moved AI-suggestion credentials out of ckan.ini into qsv's own config (``describegpt_config_path``) / the ``OPENAI_API_KEY`` env var. The README's own v3.0 config reference table already omitted it. * **README.md — formats example**: dropped ``xlsxb`` (a typo — the extension is ``xlsb``) and ``xlsm`` from both ``formats`` example lines so they match ``config.py``'s actual FORMATS default. Note: ``format_converter.py`` *can* convert ``.xlsm`` / ``.xlsb`` (they're in ``SPREADSHEET_EXTENSIONS``), but the ``FORMATS`` gate doesn't list them, so DP+ won't pick those files up today — adding them to FORMATS is a separate feature decision, not a doc fix. * **README.md — illustrative-values note**: both ckan.ini example config blocks silently contradicted the actual defaults for several keys (preview_rows, chunk_size, dedup, auto_index_threshold, ignore_file_hash, auto_alias_unique). Added a note above each block clarifying the values are illustrative examples, not defaults, and pointing to config_declaration.yaml as authoritative. The individual stale values were left as-is because some non-defaults may be deliberate install-guide recommendations — a value-by-value reconciliation is left to a maintainer pass. * **CONFIG.md — DRUF template overrides**: added the two ``resource_form.html`` override entries (plain + scheming) that were missing from the list — DP+ ships 5 DRUF override templates, the doc listed only 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Fixes a mis-namespaced CKAN config key that prevented pii_screening from being enabled as documented, and updates documentation/config fallbacks found during a documentation audit.

Changes:

Correct PII_SCREENING to read from ckanext.datapusher_plus.pii_screening and align preview_rows fallback with declarative config defaults.
Add regression tests to ensure the documented PII key works and to guard against declaration-vs-code drift.
Update README/CONFIG documentation examples and template-override listings to match current behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`tests/test_pii_screening_config_key.py`	Adds regression + drift-guard coverage for the PII config-key fix.
`ckanext/datapusher_plus/config.py`	Fixes PII config key typo; aligns `PREVIEW_ROWS` fallback with declared default.
`README.md`	Clarifies example-vs-default configs; removes stale/invalid config examples; fixes formats list.
`CONFIG.md`	Documents the missing DRUF template overrides.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot flagged that ``_reload_config`` uses ``importlib.reload`` on ``ckanext.datapusher_plus.config``, which mutates the module object in place. The recomputed constants (from a monkeypatched ``tk.config``) persist in ``sys.modules`` after the test — once ``monkeypatch`` restores ``tk.config`` it does NOT re-run ``config.py`` — so a non-default config snapshot could leak into later tests and make the suite order-dependent. Fix: added a ``_reset_config_module`` autouse fixture that reloads ``config`` on teardown. As an autouse fixture it is set up before the test's ``monkeypatch`` fixture, so its finalizer runs *after* ``monkeypatch`` has restored ``tk.config`` — the teardown reload therefore recomputes the module constants from the pristine test-environment config, leaving no order dependence. Also corrected the ``_reload_config`` docstring, which had cited ``test_file_hash_algorithm`` / the #61 tests as a reload precedent — both actually read ``tk.config`` live and do NOT reload. Verified: the config-adjacent suites (``test_file_hash_algorithm``, ``test_issue_61_download_always_whitelist``) run green alongside this file, and the full suite stays at 277 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…325) A documentation audit found CLAUDE.md materially stale — it still described the v2.0 in-process pipeline. 13 of 38 verified claims were false. Corrections: * Version v2.0.0 → 3.0.0a0 (per pyproject.toml). * Architecture section rewritten for v3.0: the v2 ``DataProcessingPipeline`` loop and the named ``pipeline.py`` / ``jobs_legacy.py`` files do not exist. Orchestration is ``jobs/prefect_flow.py`` with per-stage ``@task`` functions and the ``datapusher_plus_flow`` entry ``@flow``. Documented the full v3.0 jobs-module layout (prefect_flow, runtime_context, subflows, events, caching, blocks, artifacts, quarantine, file_persistence) and the previously-omitted ``ai_suggestions`` stage. * Key Modules: dropped the non-existent ``jobs_legacy.py``; added ``prefect_client.py``, ``dictionary_stash.py``, ``utils.py``. * Formula type ``suggest_formula`` → ``suggestion_formula`` (the actual scheming-YAML key — verified in jinja2_helpers.py / formula.py). * qsv ``v4.0.0+`` → ``v20.1.0+`` (MINIMUM_QSV_VERSION; two BREAKING bumps since 4.0.0). * ``preview_rows`` default 1000 → 0 (matches config_declaration.yaml and config.py after PR #324). * External Dependencies: RQ → Prefect 3.7+ (the v3.0 runtime; requirements.txt pins prefect, not rq). * CLI commands: added ``prefect-deploy`` and ``migrate-from-rq``. * Build & Test: ``pytest tests/test_unit.py`` (no such file) replaced with a real example + the unit/integration split. Verified-correct claims (plugin.py interface list, formula namespaces, models, job_exceptions, logic actions, Python versions) were left intact. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jqnatividad and others added 2 commits May 19, 2026 23:05

jqnatividad requested a review from Copilot May 20, 2026 03:21

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread tests/test_pii_screening_config_key.py

jqnatividad merged commit 5bac296 into main May 20, 2026
1 check passed

jqnatividad mentioned this pull request May 20, 2026

docs: bring CLAUDE.md up to date with the v3.0 Prefect architecture #325

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct pii_screening config-key typo + documentation-audit findings#324

fix: correct pii_screening config-key typo + documentation-audit findings#324
jqnatividad merged 3 commits into
mainfrom
fix-docs-audit-config-keys

jqnatividad commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jqnatividad commented May 20, 2026

Summary

Commit 1 — fix: the pii_screening config-key typo (functional bug)

Commit 2 — docs: correct documentation-audit findings

Deliberately out of scope

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Commit 1 — `fix:` the `pii_screening` config-key typo (functional bug)

Commit 2 — `docs:` correct documentation-audit findings