fix: correct pii_screening config-key typo + documentation-audit findings#324
Merged
Conversation
A documentation audit found that config.py read PII_SCREENING from ``ckanext.datastore_plus.pii_screening`` — a typo: ``datastore_plus`` instead of ``datapusher_plus``. Every other DP+ setting, the ``config_declaration.yaml`` declaration, and the README all use the ``datapusher_plus`` namespace. Practical effect: operators enabling PII screening via the documented key ``ckanext.datapusher_plus.pii_screening`` had no effect — ``PII_SCREENING`` silently stayed at its ``False`` fallback, so the PII-screening feature could not be turned on as documented. Only someone who happened to set the (undocumented, typo'd) key ``ckanext.datastore_plus.pii_screening`` would have enabled it. Fix: read the documented ``ckanext.datapusher_plus.pii_screening`` key. Adds tests/test_pii_screening_config_key.py with 3 regression tests: the documented key now drives PII_SCREENING, the old typo'd key no longer does, and an AST-based drift guard that pins the config.py key string to the config_declaration.yaml declaration so a reintroduced namespace typo fails CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the documentation audit. Addresses the doc-side findings (the pii_screening code bug is fixed separately): * **config.py — preview_rows fallback**: aligned the inline ``tk.config.get`` fallback from ``"1000"`` to ``"0"`` so it agrees with ``config_declaration.yaml`` (default ``0``). Under CKAN 2.10+ declarative config the declared default already wins, so ``0`` was the effective runtime default all along — the ``"1000"`` fallback was stale, never-reached, and contradicted the declaration. No runtime behavior change. * **README.md — stale describeGPT_api_key**: removed the ``ckanext.datapusher_plus.describeGPT_api_key`` example line. No such config key exists; v3.0 moved AI-suggestion credentials out of ckan.ini into qsv's own config (``describegpt_config_path``) / the ``OPENAI_API_KEY`` env var. The README's own v3.0 config reference table already omitted it. * **README.md — formats example**: dropped ``xlsxb`` (a typo — the extension is ``xlsb``) and ``xlsm`` from both ``formats`` example lines so they match ``config.py``'s actual FORMATS default. Note: ``format_converter.py`` *can* convert ``.xlsm`` / ``.xlsb`` (they're in ``SPREADSHEET_EXTENSIONS``), but the ``FORMATS`` gate doesn't list them, so DP+ won't pick those files up today — adding them to FORMATS is a separate feature decision, not a doc fix. * **README.md — illustrative-values note**: both ckan.ini example config blocks silently contradicted the actual defaults for several keys (preview_rows, chunk_size, dedup, auto_index_threshold, ignore_file_hash, auto_alias_unique). Added a note above each block clarifying the values are illustrative examples, not defaults, and pointing to config_declaration.yaml as authoritative. The individual stale values were left as-is because some non-defaults may be deliberate install-guide recommendations — a value-by-value reconciliation is left to a maintainer pass. * **CONFIG.md — DRUF template overrides**: added the two ``resource_form.html`` override entries (plain + scheming) that were missing from the list — DP+ ships 5 DRUF override templates, the doc listed only 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a mis-namespaced CKAN config key that prevented pii_screening from being enabled as documented, and updates documentation/config fallbacks found during a documentation audit.
Changes:
- Correct
PII_SCREENINGto read fromckanext.datapusher_plus.pii_screeningand alignpreview_rowsfallback with declarative config defaults. - Add regression tests to ensure the documented PII key works and to guard against declaration-vs-code drift.
- Update README/CONFIG documentation examples and template-override listings to match current behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
tests/test_pii_screening_config_key.py |
Adds regression + drift-guard coverage for the PII config-key fix. |
ckanext/datapusher_plus/config.py |
Fixes PII config key typo; aligns PREVIEW_ROWS fallback with declared default. |
README.md |
Clarifies example-vs-default configs; removes stale/invalid config examples; fixes formats list. |
CONFIG.md |
Documents the missing DRUF template overrides. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot flagged that ``_reload_config`` uses ``importlib.reload`` on ``ckanext.datapusher_plus.config``, which mutates the module object in place. The recomputed constants (from a monkeypatched ``tk.config``) persist in ``sys.modules`` after the test — once ``monkeypatch`` restores ``tk.config`` it does NOT re-run ``config.py`` — so a non-default config snapshot could leak into later tests and make the suite order-dependent. Fix: added a ``_reset_config_module`` autouse fixture that reloads ``config`` on teardown. As an autouse fixture it is set up before the test's ``monkeypatch`` fixture, so its finalizer runs *after* ``monkeypatch`` has restored ``tk.config`` — the teardown reload therefore recomputes the module constants from the pristine test-environment config, leaving no order dependence. Also corrected the ``_reload_config`` docstring, which had cited ``test_file_hash_algorithm`` / the #61 tests as a reload precedent — both actually read ``tk.config`` live and do NOT reload. Verified: the config-adjacent suites (``test_file_hash_algorithm``, ``test_issue_61_download_always_whitelist``) run green alongside this file, and the full suite stays at 277 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
jqnatividad
added a commit
that referenced
this pull request
May 20, 2026
…325) A documentation audit found CLAUDE.md materially stale — it still described the v2.0 in-process pipeline. 13 of 38 verified claims were false. Corrections: * Version v2.0.0 → 3.0.0a0 (per pyproject.toml). * Architecture section rewritten for v3.0: the v2 ``DataProcessingPipeline`` loop and the named ``pipeline.py`` / ``jobs_legacy.py`` files do not exist. Orchestration is ``jobs/prefect_flow.py`` with per-stage ``@task`` functions and the ``datapusher_plus_flow`` entry ``@flow``. Documented the full v3.0 jobs-module layout (prefect_flow, runtime_context, subflows, events, caching, blocks, artifacts, quarantine, file_persistence) and the previously-omitted ``ai_suggestions`` stage. * Key Modules: dropped the non-existent ``jobs_legacy.py``; added ``prefect_client.py``, ``dictionary_stash.py``, ``utils.py``. * Formula type ``suggest_formula`` → ``suggestion_formula`` (the actual scheming-YAML key — verified in jinja2_helpers.py / formula.py). * qsv ``v4.0.0+`` → ``v20.1.0+`` (MINIMUM_QSV_VERSION; two BREAKING bumps since 4.0.0). * ``preview_rows`` default 1000 → 0 (matches config_declaration.yaml and config.py after PR #324). * External Dependencies: RQ → Prefect 3.7+ (the v3.0 runtime; requirements.txt pins prefect, not rq). * CLI commands: added ``prefect-deploy`` and ``migrate-from-rq``. * Build & Test: ``pytest tests/test_unit.py`` (no such file) replaced with a real example + the unit/integration split. Verified-correct claims (plugin.py interface list, formula namespaces, models, job_exceptions, logic actions, Python versions) were left intact. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A documentation audit of
datapusher-plus(~70 claims checked across README.md, CONFIG.md, docs/) surfaced one functional bug and several doc-vs-code inconsistencies. This PR fixes them in two focused commits.Commit 1 —
fix:thepii_screeningconfig-key typo (functional bug)config.pyreadPII_SCREENINGfromckanext.datastore_plus.pii_screening— a namespace typo (datastore_plusinstead ofdatapusher_plus). The documented keyckanext.datapusher_plus.pii_screening(correctly declared inconfig_declaration.yaml, documented in the README) never populatedPII_SCREENING— it silently stayed at itsFalsefallback, so PII screening could not be enabled as documented.config.py.tests/test_pii_screening_config_key.py(3 tests): documented key now drivesPII_SCREENING, the old typo'd key no longer does, and an AST-based drift guard pins theconfig.pykey string to theconfig_declaration.yamldeclaration.Commit 2 —
docs:correct documentation-audit findingsconfig.pypreview_rowsfallback"1000"→"0"to agree withconfig_declaration.yaml(default0). Under CKAN 2.10+ declarative config the declared default already wins, so0was the effective default all along — no runtime behavior change, just removes a stale never-reached fallback.describeGPT_api_keyexample line (no such key; v3.0 moved AI credentials to qsv's config /OPENAI_API_KEY).formatsexample: droppedxlsxb(typo forxlsb) andxlsmto matchconfig.py's actualFORMATSdefault. (Note:format_converter.pycan convert.xlsm/.xlsb, but theFORMATSgate omits them — adding them is a separate feature decision.)preview_rows,chunk_size,dedup,auto_index_threshold,ignore_file_hash,auto_alias_unique).resource_form.htmlDRUF template-override entries (DP+ ships 5, the doc listed 3).Deliberately out of scope
ignore_file_hash = true,dedup = false) may be deliberate install-guide recommendations rather than stale drift — left to a maintainer pass. The added note resolves the immediate "silently contradicts defaults" honesty problem.xlsm/xlsbto theFORMATSdefault — a feature decision, not a doc fix.Test plan
tests/test_pii_screening_config_key.py— 3 new tests, all passing.preview_rowschange verified behavior-preserving (existing tests monkeypatchPREVIEW_ROWSexplicitly; declared default already0).🤖 Generated with Claude Code