You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add scan-level column masking to prevent CTE/subquery bypass
Apply column masks at the TableScan level via transform_up instead of
only at the top-level Projection. This prevents SubqueryAlias and CTE
nodes from changing the DFSchema qualifier, which would cause top-level
masking to miss columns. Masks run before row filters so filters evaluate
against raw (unmasked) data.
Add integration tests for multi-table JOINs with scoped column deny,
CTE mask bypass prevention, subquery mask enforcement, and combined
mask+deny+filter scenarios. Update docs to reflect scan-level enforcement.
Copy file name to clipboardExpand all lines: docs/permission-security-tests.md
+78-2Lines changed: 78 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,9 +66,9 @@ WITH data AS (SELECT * FROM orders) SELECT * FROM data
66
66
67
67
**Vector**: User writes `SELECT ssn || '' FROM customers` to bypass masking of `ssn`.
68
68
69
-
**Defense**: Column masking works by replacing `col("ssn")` in the `Projection` node. If `ssn || ''` is used, the `ssn` reference still passes through the `Projection` as a sub-expression. The proxy replaces the `ssn` column reference inside the expression with the masked value.
69
+
**Defense**: Column masking is enforced at the `TableScan` level — `apply_column_mask_at_scan` injects a `Projection` above each scan that replaces the masked column with the mask expression. For direct `SELECT ssn`, the mask is applied before any downstream node sees the raw value. However, if the user writes `SELECT ssn || '' FROM customers`, the `ssn` column reference in the compound expression resolves to the already-masked value from the scan-level Projection, so the concatenation operates on masked data.
70
70
71
-
**Note**: This is a known limitation for P0 — column masking replaces direct `col(name)` references in the projection. Compound expressions that reference the column are not masked. This is a P1 enhancement.
71
+
**Note**: This is a known limitation for P0 — scan-level masking replaces the column at the source, but compound expressions that reference the column in the user's `SELECT` list operate on the masked value, not the original. The result is masked (not raw), but the transformation may produce unexpected output (e.g., `***-**-6789` concatenated with empty string). This is a P1 enhancement.
72
72
73
73
**Test**: Document the limitation. Verify `SELECT ssn FROM customers` returns masked value. Verify `SELECT ssn || '' FROM customers` is treated as a limitation/known gap.
74
74
@@ -324,3 +324,79 @@ There is no ambiguous per-definition `action` field. `compute_user_visibility()`
324
324
325
325
**Test**: `deny_policy_row_filter_rejected` — error message must not contain the policy name. `tc_audit_02_denied_audit_status` — audit status is `"error"`, `error_message` does not contain the policy name. `tc_audit_04_status_filter` — `status=error` filter matches these entries.
326
326
327
+
---
328
+
329
+
### 27. Column deny scoping in multi-table JOINs
330
+
331
+
**Vector**: Three tables (`a`, `b`, `c`) share a column name (`name`). Denying `name` on `a` and `c` might accidentally also strip `b.name` if the deny logic uses unqualified matching.
332
+
333
+
**Defense**: Column deny is enforced at two levels: (1) visibility-level via `compute_user_visibility` / `build_user_context` — denied columns are removed from the per-user `SessionContext` schema at connect time, scoped per-table; (2) defense-in-depth via `apply_projection_qualified` — the top-level Projection uses DFSchema qualifiers to scope deny patterns to their source table.
334
+
335
+
**Test**: `tc_join_02_multi_table_join_shared_name` — JOIN 3 tables all with `name`. Deny `name` on `a` and `c`. `SELECT *` returns exactly one `name` column (from `b`), plus `id` from all three tables and `a_val`, `b_val`, `c_val`.
336
+
337
+
---
338
+
339
+
### 28. Table alias does not bypass column deny or column mask
340
+
341
+
**Vector**: User aliases a table (`SELECT * FROM customers AS c`) hoping the alias bypasses column-level policies. If the policy rewriter only checks the real table name, and the planner resolves columns under the alias qualifier, denied or masked columns might leak.
342
+
343
+
**Defense**: Column deny is enforced at visibility level — denied columns are removed from the schema before query planning, so they never appear in `SELECT *` regardless of alias. Column mask is enforced at the `TableScan` level via `apply_column_mask_at_scan` (injected `Projection` above each scan), which operates on the real `TableScan` table name before any alias is applied.
344
+
345
+
**Test**:
346
+
-`tc_join_03a_alias_column_deny` — deny `email` on `customers`. `SELECT * FROM customers AS c` returns only `id, name`. `SELECT c.email FROM customers AS c` errors (column not found).
347
+
-`tc_join_03b_alias_column_mask` — mask `email` on `customers`. `SELECT c.email FROM customers AS c` returns the masked value `***@example.com`, not the raw email.
348
+
349
+
---
350
+
351
+
### 29. row_filter alone does not grant visibility in policy_required mode
352
+
353
+
**Vector**: In `policy_required` mode, a `row_filter` policy is assigned to a table but no `column_allow` policy. If `row_filter` silently grants table visibility, the user can see the table in `information_schema` and query it, bypassing the zero-trust model.
354
+
355
+
**Defense**: `compute_user_visibility` only adds tables to `visible_tables` when a `column_allow` policy exists. `row_filter` and `column_mask` do not grant table access. Without a `column_allow` policy, the table is excluded from the per-user `SessionContext`, making it invisible in both `information_schema` queries and direct table references.
356
+
357
+
**Test**: `tc_zt_04_sidebar_sync_row_filter_only` — `policy_required` datasource with only a `row_filter` on `users`. `SELECT ... FROM information_schema.tables` returns 0 rows for the schema. Direct `SELECT * FROM users` errors (table not found). Catalog admin API still shows the table (admin view is unfiltered).
358
+
359
+
---
360
+
361
+
### 30. CTE wrapping does not bypass column deny, column mask, or column allow
362
+
363
+
**Vector**: User wraps a table in a CTE (`WITH t AS (SELECT * FROM users) SELECT ssn FROM t`) hoping that the CTE alias changes the column qualifier, causing deny/mask/allow patterns to miss.
364
+
365
+
**Defense**: Column deny is enforced at visibility level — denied columns are excluded from the `SELECT *` inside the CTE, so they never appear in the CTE output schema. Column mask is enforced at `TableScan` level via `apply_column_mask_at_scan`, which injects a mask `Projection` above the scan before the CTE node is constructed. Column allow (in `policy_required` mode) restricts the schema to allowed columns only, so non-allowed columns are absent from the CTE output.
366
+
367
+
**Bug found**: Column mask was previously only applied at the top-level `Projection` via `apply_projection_qualified`. CTE nodes (`SubqueryAlias`) change the DFSchema qualifier from the real table name to the CTE alias, causing the top-level mask matching to miss. Raw values leaked through CTEs.
368
+
369
+
**Fix**: Added `apply_column_mask_at_scan` method in `PolicyEffects` — applies column masks at the `TableScan` level via `transform_up`, before CTE/subquery nodes can change the qualifier. Uses `alias_qualified` to preserve the table qualifier on the masked column. Masks cleared from `column_masks` after scan-level application to prevent double-masking.
-`tc_plan_01b_cte_column_mask` — mask `ssn`. CTE `SELECT ssn FROM t` returns masked value `***-**-6789`.
374
+
-`tc_plan_01c_cte_column_allow` — allow only `id, name`. CTE `SELECT ssn FROM t` errors (not in allow list).
375
+
376
+
---
377
+
378
+
### 31. Subquery-in-FROM wrapping does not bypass column deny, column mask, or column allow
379
+
380
+
**Vector**: User wraps a table in a subquery (`SELECT sub.ssn FROM (SELECT * FROM users) AS sub`) hoping that the `SubqueryAlias` changes the qualifier from `users` to `sub`, causing deny/mask/allow patterns to miss at the top-level Projection.
381
+
382
+
**Defense**: Same as CTE (vector 30). Column deny works at visibility level. Column mask works at `TableScan` level via `apply_column_mask_at_scan`. Column allow restricts the schema before the subquery is constructed.
383
+
384
+
**Bug found**: Same as CTE — column mask was bypassed by subquery aliasing. Fixed by scan-level mask enforcement.
-`tc_plan_02c_subquery_column_allow` — allow only `id, name`. Subquery `SELECT sub.ssn` errors (not in allow list).
390
+
391
+
---
392
+
393
+
### 32. Row filter + column mask on the same column
394
+
395
+
**Vector**: A row filter and column mask target the same column (e.g. `ssn`). If masks are applied before filters in the plan tree, row filters evaluate against masked values instead of raw data, causing incorrect filtering. Example: filter `ssn != '000-00-0000'` passes on masked value `'***-**-0000'`, leaking a row that should be excluded.
396
+
397
+
**Bug found**: `apply_row_filters` ran before `apply_column_mask_at_scan`. Both use `transform_up` on `TableScan`, producing `Filter(row_filter) → Projection(mask) → TableScan`. Data flows bottom-up: scan → mask → filter, so the filter saw masked values.
398
+
399
+
**Defense**: Swap the call order so masks are applied first. With `apply_column_mask_at_scan` running before `apply_row_filters`, `transform_up` places the `Filter` between `TableScan` and the mask `Projection`: `Projection(mask) → Filter(row_filter) → TableScan`. Data flows: scan → filter (raw data) → mask. Row filters always evaluate against unmasked values.
400
+
401
+
**Test**: `row_filter_and_column_mask_same_column` — filter excludes `ssn = '000-00-0000'`, mask replaces ssn with `'***-**-XXXX'`. Verifies 2 rows returned (not 3) and values are masked.
Copy file name to clipboardExpand all lines: docs/permission-system.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -87,6 +87,8 @@ Replaces a column's value with a masked expression. The `definition` field must
87
87
88
88
When multiple `column_mask` policies target the same column, the one with the **lowest priority number** (highest precedence) wins.
89
89
90
+
When a `row_filter` and `column_mask` target the same column, the row filter always evaluates against the **raw** (unmasked) value. Masking is applied after filtering, so filter predicates are never affected by mask expressions.
91
+
90
92
### column_allow
91
93
92
94
Acts as a **column allowlist**: only the listed columns are visible in schema metadata and query results. All other columns are hidden. This is the only policy type that makes a table accessible in `policy_required` mode.
Copy file name to clipboardExpand all lines: docs/roadmap.md
+1-7Lines changed: 1 addition & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,7 @@
4
4
5
5
### Remaining Integration Test Cases
6
6
7
-
The following TC-* scenarios are defined but not yet covered by integration tests (`proxy/tests/policy_enforcement.rs`). Implemented cases have already been removed from this list.
8
-
9
-
-**TC-JOIN-02 (Multi-Table JOIN)**: JOIN 3 tables (`a`, `b`, `c`) all with `id`. **Allow**`id` on `b` only. Verify the result set contains exactly one `id` column (from `b`).
10
-
-**TC-JOIN-03 (Aliasing)**: `SELECT c.email FROM customers AS c`. **Deny**`email` on `customers`. Verify `c.email` is stripped correctly (rewriter resolves alias `c` to `customers`).
11
-
-**TC-ZT-04 (Sidebar Sync)**: Engine `compute_user_visibility` with `row_filter` only. Verify table exists in sidebar but has 0 columns (requires engine-level test or catalog API assertion).
12
-
-**TC-PLAN-01 (CTE Leak)**: `WITH t AS (SELECT * FROM users) SELECT ssn FROM t`. **Deny**`ssn`. Verify `ssn` is stripped inside the CTE definition (at the scan level).
13
-
-**TC-PLAN-02 (Subquery in FROM)**: `SELECT sub.ssn FROM (SELECT * FROM users) AS sub`. **Deny**`ssn`. Verify inner `SELECT *` is rewritten to exclude `ssn`.
7
+
All TC-* scenarios are now covered by integration tests in `proxy/tests/policy_enforcement.rs`. The list below is empty — new scenarios should be added here before implementation.
Copy file name to clipboardExpand all lines: proxy/CLAUDE.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,9 +64,9 @@ Policy CRUD handlers also call `state.proxy_handler.rebuild_contexts_for_datasou
64
64
`PolicyHook` replaces the old hardcoded `RLSHook`. It loads policies from the DB, caches per `(datasource_id, username)` for 60 seconds, and applies five policy types:
65
65
66
66
-**row_filter** — `Filter(expr)` node injected below the matching `TableScan` via `transform_up`. Template variables (`{user.tenant}`, `{user.username}`, `{user.id}`) are substituted as `Expr::Literal` after parsing — never interpolated as raw SQL. Multiple `row_filter` policies are AND-combined (intersection, not union).
67
-
-**column_mask** — replaces the column `Expr` in the top-level `Projection` with an aliased mask expression. Parsed synchronously via `sql_ast_to_df_expr(..., Some(ctx))` — sqlparser converts the mask template to a DataFusion `Expr` using the session's `FunctionRegistry` for built-in function lookup (RIGHT, LEFT, UPPER, LOWER, CONCAT, COALESCE, etc.). No standalone SQL plan is created.
67
+
-**column_mask** — mask `Projection` injected above each matching `TableScan` via `apply_column_mask_at_scan` (`transform_up`). Replaces the masked column with the mask expression, aliased with `alias_qualified` to preserve the table qualifier. Parsed synchronously via `sql_ast_to_df_expr(..., Some(ctx))` — sqlparser converts the mask template to a DataFusion `Expr` using the session's `FunctionRegistry` for built-in function lookup (RIGHT, LEFT, UPPER, LOWER, CONCAT, COALESCE, etc.). Scan-level enforcement prevents CTE/subquery alias bypass. Masks are cleared from `column_masks` after scan-level application to prevent double-masking.
68
68
-**column_allow** — specifies which columns a user may see for matching tables. In `policy_required` mode, a `column_allow` policy is the only type that grants table access; without one, the table receives `Filter(lit(false))`.
69
-
-**column_deny** — strips listed columns from the top-level `Projection`. Does NOT short-circuit the query. If all selected columns are stripped, returns SQLSTATE `42501` (insufficient_privilege).
69
+
-**column_deny** — enforced at two levels: (1) visibility-level via `compute_user_visibility` / `build_user_context` — denied columns removed from per-user schema at connect time; (2) defense-in-depth via top-level `Projection` in `apply_projection_qualified`. Does NOT short-circuit the query. If all selected columns are stripped, returns SQLSTATE `42501` (insufficient_privilege).
70
70
-**table_deny** — denied tables are removed from the catalog at connection time (404-not-403 principle). Queries fail with "table not found" rather than "access denied" to avoid leaking metadata about the existence of denied tables. Audit status is "error", not "denied".
71
71
72
72
**Policy type encodes effect**: `column_deny` and `table_deny` are deny types (`policy_type.is_deny() == true`); the others are permit types. There is no separate `effect` field.
@@ -75,6 +75,10 @@ Policy CRUD handlers also call `state.proxy_handler.rebuild_contexts_for_datasou
75
75
76
76
**Cache invalidation**: call `policy_hook.invalidate_datasource(&name)` after any policy or datasource mutation. Call `policy_hook.invalidate_user(&user_id)` after user tenant/deactivation changes. Also call `proxy_handler.rebuild_contexts_for_datasource(&name)` after policy mutations so active connections immediately see the updated schema (column visibility changes without reconnect).
77
77
78
+
**Enforcement order in `apply_policies`**: (1) `apply_column_mask_at_scan` — mask Projection above TableScan, (2) `apply_row_filters` — Filter below mask Projection but above TableScan, (3) `apply_projection_qualified` — top-level Projection for allow/deny. Masks must run before filters so that `transform_up` places the Filter between TableScan and the mask Projection. This ensures row filters evaluate against raw (unmasked) data. Swapping this order is a security bug — see vector 32 in `docs/permission-security-tests.md`.
79
+
80
+
**Column-level policies must be enforced at scan level**: All column-level policies (deny, mask, and any future types) MUST be enforced at the `TableScan` level (visibility-level for deny, `transform_up` Projection for mask) to prevent CTE/subquery alias bypass. `SubqueryAlias` and CTE nodes change the DFSchema qualifier from the real table name to the alias, causing top-level-only matching to miss. Top-level `apply_projection_qualified` is defense-in-depth only.
81
+
78
82
**Audit logging**: after each query, `PolicyHook` spawns a `tokio::spawn` task to insert a `query_audit_log` row asynchronously. The row captures `original_query`, `rewritten_query`, `policies_applied` (JSON with name+version snapshot), `client_ip`, and `client_info` (application_name from pgwire startup params).
0 commit comments