Skip to content

Commit db98d56

Browse files
Add scan-level column masking to prevent CTE/subquery bypass
Apply column masks at the TableScan level via transform_up instead of only at the top-level Projection. This prevents SubqueryAlias and CTE nodes from changing the DFSchema qualifier, which would cause top-level masking to miss columns. Masks run before row filters so filters evaluate against raw (unmasked) data. Add integration tests for multi-table JOINs with scoped column deny, CTE mask bypass prevention, subquery mask enforcement, and combined mask+deny+filter scenarios. Update docs to reflect scan-level enforcement.
1 parent 58aa5b8 commit db98d56

6 files changed

Lines changed: 967 additions & 16 deletions

File tree

docs/permission-security-tests.md

Lines changed: 78 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,9 @@ WITH data AS (SELECT * FROM orders) SELECT * FROM data
6666

6767
**Vector**: User writes `SELECT ssn || '' FROM customers` to bypass masking of `ssn`.
6868

69-
**Defense**: Column masking works by replacing `col("ssn")` in the `Projection` node. If `ssn || ''` is used, the `ssn` reference still passes through the `Projection` as a sub-expression. The proxy replaces the `ssn` column reference inside the expression with the masked value.
69+
**Defense**: Column masking is enforced at the `TableScan` level — `apply_column_mask_at_scan` injects a `Projection` above each scan that replaces the masked column with the mask expression. For direct `SELECT ssn`, the mask is applied before any downstream node sees the raw value. However, if the user writes `SELECT ssn || '' FROM customers`, the `ssn` column reference in the compound expression resolves to the already-masked value from the scan-level Projection, so the concatenation operates on masked data.
7070

71-
**Note**: This is a known limitation for P0 — column masking replaces direct `col(name)` references in the projection. Compound expressions that reference the column are not masked. This is a P1 enhancement.
71+
**Note**: This is a known limitation for P0 — scan-level masking replaces the column at the source, but compound expressions that reference the column in the user's `SELECT` list operate on the masked value, not the original. The result is masked (not raw), but the transformation may produce unexpected output (e.g., `***-**-6789` concatenated with empty string). This is a P1 enhancement.
7272

7373
**Test**: Document the limitation. Verify `SELECT ssn FROM customers` returns masked value. Verify `SELECT ssn || '' FROM customers` is treated as a limitation/known gap.
7474

@@ -324,3 +324,79 @@ There is no ambiguous per-definition `action` field. `compute_user_visibility()`
324324

325325
**Test**: `deny_policy_row_filter_rejected` — error message must not contain the policy name. `tc_audit_02_denied_audit_status` — audit status is `"error"`, `error_message` does not contain the policy name. `tc_audit_04_status_filter``status=error` filter matches these entries.
326326

327+
---
328+
329+
### 27. Column deny scoping in multi-table JOINs
330+
331+
**Vector**: Three tables (`a`, `b`, `c`) share a column name (`name`). Denying `name` on `a` and `c` might accidentally also strip `b.name` if the deny logic uses unqualified matching.
332+
333+
**Defense**: Column deny is enforced at two levels: (1) visibility-level via `compute_user_visibility` / `build_user_context` — denied columns are removed from the per-user `SessionContext` schema at connect time, scoped per-table; (2) defense-in-depth via `apply_projection_qualified` — the top-level Projection uses DFSchema qualifiers to scope deny patterns to their source table.
334+
335+
**Test**: `tc_join_02_multi_table_join_shared_name` — JOIN 3 tables all with `name`. Deny `name` on `a` and `c`. `SELECT *` returns exactly one `name` column (from `b`), plus `id` from all three tables and `a_val`, `b_val`, `c_val`.
336+
337+
---
338+
339+
### 28. Table alias does not bypass column deny or column mask
340+
341+
**Vector**: User aliases a table (`SELECT * FROM customers AS c`) hoping the alias bypasses column-level policies. If the policy rewriter only checks the real table name, and the planner resolves columns under the alias qualifier, denied or masked columns might leak.
342+
343+
**Defense**: Column deny is enforced at visibility level — denied columns are removed from the schema before query planning, so they never appear in `SELECT *` regardless of alias. Column mask is enforced at the `TableScan` level via `apply_column_mask_at_scan` (injected `Projection` above each scan), which operates on the real `TableScan` table name before any alias is applied.
344+
345+
**Test**:
346+
- `tc_join_03a_alias_column_deny` — deny `email` on `customers`. `SELECT * FROM customers AS c` returns only `id, name`. `SELECT c.email FROM customers AS c` errors (column not found).
347+
- `tc_join_03b_alias_column_mask` — mask `email` on `customers`. `SELECT c.email FROM customers AS c` returns the masked value `***@example.com`, not the raw email.
348+
349+
---
350+
351+
### 29. row_filter alone does not grant visibility in policy_required mode
352+
353+
**Vector**: In `policy_required` mode, a `row_filter` policy is assigned to a table but no `column_allow` policy. If `row_filter` silently grants table visibility, the user can see the table in `information_schema` and query it, bypassing the zero-trust model.
354+
355+
**Defense**: `compute_user_visibility` only adds tables to `visible_tables` when a `column_allow` policy exists. `row_filter` and `column_mask` do not grant table access. Without a `column_allow` policy, the table is excluded from the per-user `SessionContext`, making it invisible in both `information_schema` queries and direct table references.
356+
357+
**Test**: `tc_zt_04_sidebar_sync_row_filter_only``policy_required` datasource with only a `row_filter` on `users`. `SELECT ... FROM information_schema.tables` returns 0 rows for the schema. Direct `SELECT * FROM users` errors (table not found). Catalog admin API still shows the table (admin view is unfiltered).
358+
359+
---
360+
361+
### 30. CTE wrapping does not bypass column deny, column mask, or column allow
362+
363+
**Vector**: User wraps a table in a CTE (`WITH t AS (SELECT * FROM users) SELECT ssn FROM t`) hoping that the CTE alias changes the column qualifier, causing deny/mask/allow patterns to miss.
364+
365+
**Defense**: Column deny is enforced at visibility level — denied columns are excluded from the `SELECT *` inside the CTE, so they never appear in the CTE output schema. Column mask is enforced at `TableScan` level via `apply_column_mask_at_scan`, which injects a mask `Projection` above the scan before the CTE node is constructed. Column allow (in `policy_required` mode) restricts the schema to allowed columns only, so non-allowed columns are absent from the CTE output.
366+
367+
**Bug found**: Column mask was previously only applied at the top-level `Projection` via `apply_projection_qualified`. CTE nodes (`SubqueryAlias`) change the DFSchema qualifier from the real table name to the CTE alias, causing the top-level mask matching to miss. Raw values leaked through CTEs.
368+
369+
**Fix**: Added `apply_column_mask_at_scan` method in `PolicyEffects` — applies column masks at the `TableScan` level via `transform_up`, before CTE/subquery nodes can change the qualifier. Uses `alias_qualified` to preserve the table qualifier on the masked column. Masks cleared from `column_masks` after scan-level application to prevent double-masking.
370+
371+
**Test**:
372+
- `tc_plan_01a_cte_column_deny` — deny `ssn`. CTE `SELECT *` excludes `ssn`. Explicit `SELECT ssn FROM t` errors.
373+
- `tc_plan_01b_cte_column_mask` — mask `ssn`. CTE `SELECT ssn FROM t` returns masked value `***-**-6789`.
374+
- `tc_plan_01c_cte_column_allow` — allow only `id, name`. CTE `SELECT ssn FROM t` errors (not in allow list).
375+
376+
---
377+
378+
### 31. Subquery-in-FROM wrapping does not bypass column deny, column mask, or column allow
379+
380+
**Vector**: User wraps a table in a subquery (`SELECT sub.ssn FROM (SELECT * FROM users) AS sub`) hoping that the `SubqueryAlias` changes the qualifier from `users` to `sub`, causing deny/mask/allow patterns to miss at the top-level Projection.
381+
382+
**Defense**: Same as CTE (vector 30). Column deny works at visibility level. Column mask works at `TableScan` level via `apply_column_mask_at_scan`. Column allow restricts the schema before the subquery is constructed.
383+
384+
**Bug found**: Same as CTE — column mask was bypassed by subquery aliasing. Fixed by scan-level mask enforcement.
385+
386+
**Test**:
387+
- `tc_plan_02a_subquery_column_deny` — deny `ssn`. Subquery `SELECT *` excludes `ssn`. Explicit `SELECT sub.ssn` errors.
388+
- `tc_plan_02b_subquery_column_mask` — mask `ssn`. Subquery `SELECT sub.ssn` returns masked value `***-**-6789`.
389+
- `tc_plan_02c_subquery_column_allow` — allow only `id, name`. Subquery `SELECT sub.ssn` errors (not in allow list).
390+
391+
---
392+
393+
### 32. Row filter + column mask on the same column
394+
395+
**Vector**: A row filter and column mask target the same column (e.g. `ssn`). If masks are applied before filters in the plan tree, row filters evaluate against masked values instead of raw data, causing incorrect filtering. Example: filter `ssn != '000-00-0000'` passes on masked value `'***-**-0000'`, leaking a row that should be excluded.
396+
397+
**Bug found**: `apply_row_filters` ran before `apply_column_mask_at_scan`. Both use `transform_up` on `TableScan`, producing `Filter(row_filter) → Projection(mask) → TableScan`. Data flows bottom-up: scan → mask → filter, so the filter saw masked values.
398+
399+
**Defense**: Swap the call order so masks are applied first. With `apply_column_mask_at_scan` running before `apply_row_filters`, `transform_up` places the `Filter` between `TableScan` and the mask `Projection`: `Projection(mask) → Filter(row_filter) → TableScan`. Data flows: scan → filter (raw data) → mask. Row filters always evaluate against unmasked values.
400+
401+
**Test**: `row_filter_and_column_mask_same_column` — filter excludes `ssn = '000-00-0000'`, mask replaces ssn with `'***-**-XXXX'`. Verifies 2 rows returned (not 3) and values are masked.
402+

docs/permission-system.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ Replaces a column's value with a masked expression. The `definition` field must
8787

8888
When multiple `column_mask` policies target the same column, the one with the **lowest priority number** (highest precedence) wins.
8989

90+
When a `row_filter` and `column_mask` target the same column, the row filter always evaluates against the **raw** (unmasked) value. Masking is applied after filtering, so filter predicates are never affected by mask expressions.
91+
9092
### column_allow
9193

9294
Acts as a **column allowlist**: only the listed columns are visible in schema metadata and query results. All other columns are hidden. This is the only policy type that makes a table accessible in `policy_required` mode.

docs/roadmap.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,7 @@
44

55
### Remaining Integration Test Cases
66

7-
The following TC-* scenarios are defined but not yet covered by integration tests (`proxy/tests/policy_enforcement.rs`). Implemented cases have already been removed from this list.
8-
9-
- **TC-JOIN-02 (Multi-Table JOIN)**: JOIN 3 tables (`a`, `b`, `c`) all with `id`. **Allow** `id` on `b` only. Verify the result set contains exactly one `id` column (from `b`).
10-
- **TC-JOIN-03 (Aliasing)**: `SELECT c.email FROM customers AS c`. **Deny** `email` on `customers`. Verify `c.email` is stripped correctly (rewriter resolves alias `c` to `customers`).
11-
- **TC-ZT-04 (Sidebar Sync)**: Engine `compute_user_visibility` with `row_filter` only. Verify table exists in sidebar but has 0 columns (requires engine-level test or catalog API assertion).
12-
- **TC-PLAN-01 (CTE Leak)**: `WITH t AS (SELECT * FROM users) SELECT ssn FROM t`. **Deny** `ssn`. Verify `ssn` is stripped inside the CTE definition (at the scan level).
13-
- **TC-PLAN-02 (Subquery in FROM)**: `SELECT sub.ssn FROM (SELECT * FROM users) AS sub`. **Deny** `ssn`. Verify inner `SELECT *` is rewritten to exclude `ssn`.
7+
All TC-* scenarios are now covered by integration tests in `proxy/tests/policy_enforcement.rs`. The list below is empty — new scenarios should be added here before implementation.
148

159
### Configurable Policies
1610

proxy/CLAUDE.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ Policy CRUD handlers also call `state.proxy_handler.rebuild_contexts_for_datasou
6464
`PolicyHook` replaces the old hardcoded `RLSHook`. It loads policies from the DB, caches per `(datasource_id, username)` for 60 seconds, and applies five policy types:
6565

6666
- **row_filter**`Filter(expr)` node injected below the matching `TableScan` via `transform_up`. Template variables (`{user.tenant}`, `{user.username}`, `{user.id}`) are substituted as `Expr::Literal` after parsing — never interpolated as raw SQL. Multiple `row_filter` policies are AND-combined (intersection, not union).
67-
- **column_mask**replaces the column `Expr` in the top-level `Projection` with an aliased mask expression. Parsed synchronously via `sql_ast_to_df_expr(..., Some(ctx))` — sqlparser converts the mask template to a DataFusion `Expr` using the session's `FunctionRegistry` for built-in function lookup (RIGHT, LEFT, UPPER, LOWER, CONCAT, COALESCE, etc.). No standalone SQL plan is created.
67+
- **column_mask**mask `Projection` injected above each matching `TableScan` via `apply_column_mask_at_scan` (`transform_up`). Replaces the masked column with the mask expression, aliased with `alias_qualified` to preserve the table qualifier. Parsed synchronously via `sql_ast_to_df_expr(..., Some(ctx))` — sqlparser converts the mask template to a DataFusion `Expr` using the session's `FunctionRegistry` for built-in function lookup (RIGHT, LEFT, UPPER, LOWER, CONCAT, COALESCE, etc.). Scan-level enforcement prevents CTE/subquery alias bypass. Masks are cleared from `column_masks` after scan-level application to prevent double-masking.
6868
- **column_allow** — specifies which columns a user may see for matching tables. In `policy_required` mode, a `column_allow` policy is the only type that grants table access; without one, the table receives `Filter(lit(false))`.
69-
- **column_deny**strips listed columns from the top-level `Projection`. Does NOT short-circuit the query. If all selected columns are stripped, returns SQLSTATE `42501` (insufficient_privilege).
69+
- **column_deny**enforced at two levels: (1) visibility-level via `compute_user_visibility` / `build_user_context` — denied columns removed from per-user schema at connect time; (2) defense-in-depth via top-level `Projection` in `apply_projection_qualified`. Does NOT short-circuit the query. If all selected columns are stripped, returns SQLSTATE `42501` (insufficient_privilege).
7070
- **table_deny** — denied tables are removed from the catalog at connection time (404-not-403 principle). Queries fail with "table not found" rather than "access denied" to avoid leaking metadata about the existence of denied tables. Audit status is "error", not "denied".
7171

7272
**Policy type encodes effect**: `column_deny` and `table_deny` are deny types (`policy_type.is_deny() == true`); the others are permit types. There is no separate `effect` field.
@@ -75,6 +75,10 @@ Policy CRUD handlers also call `state.proxy_handler.rebuild_contexts_for_datasou
7575

7676
**Cache invalidation**: call `policy_hook.invalidate_datasource(&name)` after any policy or datasource mutation. Call `policy_hook.invalidate_user(&user_id)` after user tenant/deactivation changes. Also call `proxy_handler.rebuild_contexts_for_datasource(&name)` after policy mutations so active connections immediately see the updated schema (column visibility changes without reconnect).
7777

78+
**Enforcement order in `apply_policies`**: (1) `apply_column_mask_at_scan` — mask Projection above TableScan, (2) `apply_row_filters` — Filter below mask Projection but above TableScan, (3) `apply_projection_qualified` — top-level Projection for allow/deny. Masks must run before filters so that `transform_up` places the Filter between TableScan and the mask Projection. This ensures row filters evaluate against raw (unmasked) data. Swapping this order is a security bug — see vector 32 in `docs/permission-security-tests.md`.
79+
80+
**Column-level policies must be enforced at scan level**: All column-level policies (deny, mask, and any future types) MUST be enforced at the `TableScan` level (visibility-level for deny, `transform_up` Projection for mask) to prevent CTE/subquery alias bypass. `SubqueryAlias` and CTE nodes change the DFSchema qualifier from the real table name to the alias, causing top-level-only matching to miss. Top-level `apply_projection_qualified` is defense-in-depth only.
81+
7882
**Audit logging**: after each query, `PolicyHook` spawns a `tokio::spawn` task to insert a `query_audit_log` row asynchronously. The row captures `original_query`, `rewritten_query`, `policies_applied` (JSON with name+version snapshot), `client_ip`, and `client_info` (application_name from pgwire startup params).
7983

8084
## Testing

0 commit comments

Comments
 (0)