Skip to content

fix: correct pii_screening config-key typo + documentation-audit findings#324

Merged
jqnatividad merged 3 commits into
mainfrom
fix-docs-audit-config-keys
May 20, 2026
Merged

fix: correct pii_screening config-key typo + documentation-audit findings#324
jqnatividad merged 3 commits into
mainfrom
fix-docs-audit-config-keys

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

A documentation audit of datapusher-plus (~70 claims checked across README.md, CONFIG.md, docs/) surfaced one functional bug and several doc-vs-code inconsistencies. This PR fixes them in two focused commits.

Commit 1 — fix: the pii_screening config-key typo (functional bug)

config.py read PII_SCREENING from ckanext.datastore_plus.pii_screening — a namespace typo (datastore_plus instead of datapusher_plus). The documented key ckanext.datapusher_plus.pii_screening (correctly declared in config_declaration.yaml, documented in the README) never populated PII_SCREENING — it silently stayed at its False fallback, so PII screening could not be enabled as documented.

  • Fixed the key string in config.py.
  • Added tests/test_pii_screening_config_key.py (3 tests): documented key now drives PII_SCREENING, the old typo'd key no longer does, and an AST-based drift guard pins the config.py key string to the config_declaration.yaml declaration.

Commit 2 — docs: correct documentation-audit findings

  • config.py preview_rows fallback "1000""0" to agree with config_declaration.yaml (default 0). Under CKAN 2.10+ declarative config the declared default already wins, so 0 was the effective default all along — no runtime behavior change, just removes a stale never-reached fallback.
  • README — removed the stale describeGPT_api_key example line (no such key; v3.0 moved AI credentials to qsv's config / OPENAI_API_KEY).
  • READMEformats example: dropped xlsxb (typo for xlsb) and xlsm to match config.py's actual FORMATS default. (Note: format_converter.py can convert .xlsm/.xlsb, but the FORMATS gate omits them — adding them is a separate feature decision.)
  • README — added an "illustrative example, not defaults" note above both ckan.ini config blocks, which silently contradicted actual defaults for several keys (preview_rows, chunk_size, dedup, auto_index_threshold, ignore_file_hash, auto_alias_unique).
  • CONFIG.md — added the two missing resource_form.html DRUF template-override entries (DP+ ships 5, the doc listed 3).

Deliberately out of scope

  • Value-by-value reconciliation of the README example config blocks. Some non-default values (e.g. ignore_file_hash = true, dedup = false) may be deliberate install-guide recommendations rather than stale drift — left to a maintainer pass. The added note resolves the immediate "silently contradicts defaults" honesty problem.
  • Adding xlsm/xlsb to the FORMATS default — a feature decision, not a doc fix.

Test plan

  • tests/test_pii_screening_config_key.py — 3 new tests, all passing.
  • Full Python unit suite: 274 → 277 passing.
  • preview_rows change verified behavior-preserving (existing tests monkeypatch PREVIEW_ROWS explicitly; declared default already 0).

🤖 Generated with Claude Code

jqnatividad and others added 2 commits May 19, 2026 23:05
A documentation audit found that config.py read PII_SCREENING from
``ckanext.datastore_plus.pii_screening`` — a typo: ``datastore_plus``
instead of ``datapusher_plus``. Every other DP+ setting, the
``config_declaration.yaml`` declaration, and the README all use the
``datapusher_plus`` namespace.

Practical effect: operators enabling PII screening via the
documented key ``ckanext.datapusher_plus.pii_screening`` had no
effect — ``PII_SCREENING`` silently stayed at its ``False`` fallback,
so the PII-screening feature could not be turned on as documented.
Only someone who happened to set the (undocumented, typo'd) key
``ckanext.datastore_plus.pii_screening`` would have enabled it.

Fix: read the documented ``ckanext.datapusher_plus.pii_screening``
key. Adds tests/test_pii_screening_config_key.py with 3 regression
tests: the documented key now drives PII_SCREENING, the old typo'd
key no longer does, and an AST-based drift guard that pins the
config.py key string to the config_declaration.yaml declaration so
a reintroduced namespace typo fails CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the documentation audit. Addresses the doc-side
findings (the pii_screening code bug is fixed separately):

* **config.py — preview_rows fallback**: aligned the inline
  ``tk.config.get`` fallback from ``"1000"`` to ``"0"`` so it
  agrees with ``config_declaration.yaml`` (default ``0``). Under
  CKAN 2.10+ declarative config the declared default already wins,
  so ``0`` was the effective runtime default all along — the
  ``"1000"`` fallback was stale, never-reached, and contradicted
  the declaration. No runtime behavior change.

* **README.md — stale describeGPT_api_key**: removed the
  ``ckanext.datapusher_plus.describeGPT_api_key`` example line.
  No such config key exists; v3.0 moved AI-suggestion credentials
  out of ckan.ini into qsv's own config
  (``describegpt_config_path``) / the ``OPENAI_API_KEY`` env var.
  The README's own v3.0 config reference table already omitted it.

* **README.md — formats example**: dropped ``xlsxb`` (a typo —
  the extension is ``xlsb``) and ``xlsm`` from both ``formats``
  example lines so they match ``config.py``'s actual FORMATS
  default. Note: ``format_converter.py`` *can* convert ``.xlsm`` /
  ``.xlsb`` (they're in ``SPREADSHEET_EXTENSIONS``), but the
  ``FORMATS`` gate doesn't list them, so DP+ won't pick those
  files up today — adding them to FORMATS is a separate feature
  decision, not a doc fix.

* **README.md — illustrative-values note**: both ckan.ini example
  config blocks silently contradicted the actual defaults for
  several keys (preview_rows, chunk_size, dedup,
  auto_index_threshold, ignore_file_hash, auto_alias_unique).
  Added a note above each block clarifying the values are
  illustrative examples, not defaults, and pointing to
  config_declaration.yaml as authoritative. The individual stale
  values were left as-is because some non-defaults may be
  deliberate install-guide recommendations — a value-by-value
  reconciliation is left to a maintainer pass.

* **CONFIG.md — DRUF template overrides**: added the two
  ``resource_form.html`` override entries (plain + scheming) that
  were missing from the list — DP+ ships 5 DRUF override
  templates, the doc listed only 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad requested a review from Copilot May 20, 2026 03:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a mis-namespaced CKAN config key that prevented pii_screening from being enabled as documented, and updates documentation/config fallbacks found during a documentation audit.

Changes:

  • Correct PII_SCREENING to read from ckanext.datapusher_plus.pii_screening and align preview_rows fallback with declarative config defaults.
  • Add regression tests to ensure the documented PII key works and to guard against declaration-vs-code drift.
  • Update README/CONFIG documentation examples and template-override listings to match current behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/test_pii_screening_config_key.py Adds regression + drift-guard coverage for the PII config-key fix.
ckanext/datapusher_plus/config.py Fixes PII config key typo; aligns PREVIEW_ROWS fallback with declared default.
README.md Clarifies example-vs-default configs; removes stale/invalid config examples; fixes formats list.
CONFIG.md Documents the missing DRUF template overrides.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_pii_screening_config_key.py
Copilot flagged that ``_reload_config`` uses ``importlib.reload`` on
``ckanext.datapusher_plus.config``, which mutates the module object
in place. The recomputed constants (from a monkeypatched
``tk.config``) persist in ``sys.modules`` after the test — once
``monkeypatch`` restores ``tk.config`` it does NOT re-run
``config.py`` — so a non-default config snapshot could leak into
later tests and make the suite order-dependent.

Fix: added a ``_reset_config_module`` autouse fixture that reloads
``config`` on teardown. As an autouse fixture it is set up before
the test's ``monkeypatch`` fixture, so its finalizer runs *after*
``monkeypatch`` has restored ``tk.config`` — the teardown reload
therefore recomputes the module constants from the pristine
test-environment config, leaving no order dependence.

Also corrected the ``_reload_config`` docstring, which had cited
``test_file_hash_algorithm`` / the #61 tests as a reload precedent
— both actually read ``tk.config`` live and do NOT reload.

Verified: the config-adjacent suites (``test_file_hash_algorithm``,
``test_issue_61_download_always_whitelist``) run green alongside
this file, and the full suite stays at 277 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 5bac296 into main May 20, 2026
1 check passed
jqnatividad added a commit that referenced this pull request May 20, 2026
…325)

A documentation audit found CLAUDE.md materially stale — it still
described the v2.0 in-process pipeline. 13 of 38 verified claims were
false. Corrections:

* Version v2.0.0 → 3.0.0a0 (per pyproject.toml).
* Architecture section rewritten for v3.0: the v2
  ``DataProcessingPipeline`` loop and the named ``pipeline.py`` /
  ``jobs_legacy.py`` files do not exist. Orchestration is
  ``jobs/prefect_flow.py`` with per-stage ``@task`` functions and the
  ``datapusher_plus_flow`` entry ``@flow``. Documented the full v3.0
  jobs-module layout (prefect_flow, runtime_context, subflows, events,
  caching, blocks, artifacts, quarantine, file_persistence) and the
  previously-omitted ``ai_suggestions`` stage.
* Key Modules: dropped the non-existent ``jobs_legacy.py``; added
  ``prefect_client.py``, ``dictionary_stash.py``, ``utils.py``.
* Formula type ``suggest_formula`` → ``suggestion_formula`` (the
  actual scheming-YAML key — verified in jinja2_helpers.py /
  formula.py).
* qsv ``v4.0.0+`` → ``v20.1.0+`` (MINIMUM_QSV_VERSION; two BREAKING
  bumps since 4.0.0).
* ``preview_rows`` default 1000 → 0 (matches config_declaration.yaml
  and config.py after PR #324).
* External Dependencies: RQ → Prefect 3.7+ (the v3.0 runtime;
  requirements.txt pins prefect, not rq).
* CLI commands: added ``prefect-deploy`` and ``migrate-from-rq``.
* Build & Test: ``pytest tests/test_unit.py`` (no such file) replaced
  with a real example + the unit/integration split.

Verified-correct claims (plugin.py interface list, formula
namespaces, models, job_exceptions, logic actions, Python versions)
were left intact.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants