Feature/json indexes by palmer159 · Pull Request #2477 · apache/phoenix

palmer159 · 2026-05-15T00:08:19Z

Incremental changes for Expression-Based Secondary Indexes on BSON/JSON Paths

No path-string canonicalization. IndexExpressionParseNodeRewriter.leaveCompoundNode
matches ParseNode by equals(). FunctionParseNode.equals compares name + children;
a path literal child is a LiteralParseNode whose equals is byte-for-byte on the
value. So BSON_VALUE(doc, 'a.b', 'VARCHAR') and BSON_VALUE(doc, '$.a.b', 'VARCHAR')
— which are semantically identical — do not match the same index. A user whose
query spells the path one way misses an index created with the other spelling.
Predicate matching limited to exact equality in practice. In-tree IT coverage
(Bson5IT) shows BSON_VALUE(COL, p, t) = ? hitting the index. Range (<, <=,
>, >=), BETWEEN, and IN are not exercised — and because the rewriter relies on
AST equality of the indexed expression against the predicate LHS, range predicates
will rewrite correctly for the indexed column (that part is generic scan-range
derivation), but canonicalization differences will still cause misses.
Sort order for typed numeric paths is likely wrong with the default index column
type. BSON_VALUE(doc, 'x', 'VARCHAR') returns VARCHAR bytes — fine. But
BSON_VALUE(doc, 'x', 'DOUBLE') at the write path sets ptr to
PDouble.INSTANCE.toBytes(double) (IEEE 754 bits), which is not order-preserving
under unsigned byte comparison for negative values. Range scans on such indexes will
return incorrect ordering across sign boundaries. Must be verified; if confirmed,
fixed at index-key time with a sign-flip or by routing through Phoenix's fixed-width
numeric encoders that already provide order-preserving bytes.
Rewriter has no fast-path. IndexStatementRewriter.translate runs on every
SELECT, walking the index column list and parsing every indexed expression. On
tables with no BSON columns and no BSON indexes, this is pure overhead. It does not
scale as BSON indexes proliferate.
Phoenix exposes BSON navigation only through BSON_VALUE,
BSON_VALUE_TYPE, BSON_CONDITION_EXPRESSION. There is no -> or ->> in the
grammar (grep of PhoenixSQL.g, PhoenixBsonExpression.g: no hits). Adding them is
a self-contained grammar change independent of the indexing work.
No per-BSON-index metrics for rewrite hits/misses, no
EXPLAIN hint showing which canonical path matched, no counter for partial-index
skips. Operators debugging "why didn't my query hit the index" have no signal.

2. What this design delivers

A focused, incremental enhancement program that closes the five real gaps above without
inventing new infrastructure where Phoenix already has working infrastructure.

Non-goals (unchanged from prior spec, reaffirmed)

Dynamic-column (per-row, non-schema) indexing.
GIN-style multi-valued path indexes.
Containment (@>) predicates.
Wildcards, filter expressions, recursive descent in JSONPath.

Goals, restated

Queries that spell a BSON path in a different-but-equivalent way must hit the same
index as the DDL that created it.
Range (<, <=, >, >=), BETWEEN, and IN predicates on BSON-path expressions
must use the index correctly, including with correct sort order for numeric types.
The rewriter must add negligible overhead on queries against tables with no BSON
indexes.
Operators must be able to tell whether and why an index was used.
The DDL / query experience must remain backwards-compatible — nothing in Bson5IT
breaks.

3. Architecture

Two modest additions, both client-side; no on-disk format change; no coprocessor change.

          ┌───────────────────────────────────────────────┐
  DDL     │ Parser ─► BsonPathNormalizer (new) ─► compile │
          │            │                                  │
          │            ▼                                  │
          │  canonical path literal stored in             │
          │  PColumn.getExpressionStr() (SYSTEM.CATALOG)  │
          └───────────────────────────────────────────────┘

          ┌───────────────────────────────────────────────┐
  Query   │ Parser ─► BsonPathNormalizer (same rewriter)  │
          │            │                                  │
          │            ▼                                  │
          │ IndexExpressionParseNodeRewriter              │
          │  (fast-path: bail if no BSON indexes)         │
          │            │                                  │
          │            ▼                                  │
          │  existing exact-AST match ─► scan ranges      │
          └───────────────────────────────────────────────┘

BsonPathNormalizer is applied at two points: (a) during CREATE INDEX compilation,
before the expression string is persisted into SYSTEM.CATALOG, so equivalent paths
produce identical expression strings; (b) during query rewrite, to the WHERE-clause
parse nodes, so a differently-spelled path rewrites to the canonical form and then
matches by ParseNode.equals.

Everything else — IndexMaintainer.buildRowKey, IndexRegionObserver.preBatchMutate,
partial-index WHERE compilation, CONSISTENCY modes, INCLUDE semantics — is reused
without modification.

4. Components

4.1 `BsonPathNormalizer`

Location: phoenix-core-client/src/main/java/org/apache/phoenix/parse/bson/.
A pure function that walks a ParseNode tree and, whenever it finds a
BsonValueParseNode whose path literal (second argument) is a constant string,
replaces that literal with the canonical form of the same path.
Normalization rules (v1, JSONPath subset):
- Strip a leading $. or $ if present (Phoenix's BSON_VALUE uses paths without
  the $. prefix — confirmed by Bson5IT.java:111 which uses 'rather[3].outline. clock', no leading $). Accept both forms on input; emit no-prefix form.
- Collapse redundant whitespace inside segments (none legal today, so mostly a no-op).
- Normalize quoted keys: if a quoted key matches the unquoted-key regex, drop the
  quotes (['a'] → .a); otherwise keep exact bracketed form.
- Reject wildcards (*), filter expressions ([?(...)]), recursive descent (..),
  and slice syntax ([a:b]) with a clear SQLException that identifies the offending
  segment.
Does not touch the first argument (the BSON column) or the third/fourth arguments
(type, default). Those must match byte-for-byte between DDL and query, as today.
Does not consult schema. Pure syntactic.
Unit-tested with golden files covering equivalent path pairs.

4.2 Fast-path guard in `IndexExpressionParseNodeRewriter`

IndexExpressionParseNodeRewriter's constructor today parses every index column's
expression string (IndexExpressionParseNodeRewriter.java:62-75). For a table with ten
BSON-path indexes, that's ten SQLParser.parseCondition(...) calls per query.

Change:

Add a hasBsonIndex check on the index PTable at construction time. If no index
column has a BsonValueParseNode-rooted expression, skip the BSON normalizer
invocation on the WHERE clause entirely. This is the M6 fast-path from review.
Populate the existing indexedParseNodeToColumnParseNodeMap with the already-canonical
ParseNode (see §4.1); when normalizing the WHERE clause, look up by canonical form.

4.3 Numeric sort order verification + fix

This is the one place where existing production behavior is likely wrong and we must
change real code.

For BSON_VALUE(doc, 'x', 'DOUBLE') used as an index key, the write path currently
routes through BsonValueFunction.evaluate → PDouble.INSTANCE.toBytes(double). This
uses Double.doubleToLongBits raw bytes; they are not order-preserving under
unsigned byte comparison across sign boundaries (negatives sort after positives). The
same issue applies to PFloat. PInteger, PLong, PSmallint, PTinyint use
Phoenix's offset-encoded integers which are order-preserving. PDecimal uses its own
encoding which is order-preserving.

Action:

Write a unit test that creates a BSON-path index on a DOUBLE path with rows spanning
negatives and positives, runs BSON_VALUE(...) BETWEEN -1 AND 1, and asserts correctness.
If the test fails (expected), fix by routing DOUBLE/FLOAT index-key bytes through
Phoenix's existing order-preserving encoder for those types. This may already exist
in IndexUtil.getIndexColumnDataType / PDataType.coerceBytes — must be traced on
actual execution paths, not assumed.
Add an IT that covers all BSON_VALUE type codes in a range predicate.

This is the one fix that must happen regardless of everything else — it is a latent
correctness bug, not a feature gap.

4.4 Predicate rewrite coverage for range / BETWEEN / IN

Phoenix's scan-range derivation over an indexed column already supports all of =, <,
<=, >, >=, BETWEEN, IN, and != — see WhereCompiler. The machinery works
once the LHS of the predicate is matched to an index column, which the
IndexExpressionParseNodeRewriter does today.

So the work here is not new rewrite code; it is test coverage and verification that
canonicalized BSON path predicates flow through existing scan-range derivation for all
predicate forms. Concretely:

BsonPathIndexPredicateIT: for each of =, <, <=, >, >=, BETWEEN, IN,
!=, assert the plan uses the index and the result set matches a no-index baseline.
Cover VARCHAR, BIGINT, DOUBLE, DECIMAL, DATE, BOOLEAN paths.
Known non-matches (must be documented, not fixed in v1): LIKE, IS NULL / IS NOT NULL (the latter works today via partial index, not via rewrite — a user's explicit
IS NOT NULL predicate hits the index because Phoenix's scan machinery treats
non-empty key as present; this is the existing behavior in Bson5IT), CAST wrapping
the BSON_VALUE on the query side, arithmetic wrappers.

4.5 Observability

phoenix.index.bson.rewrite.hit and phoenix.index.bson.rewrite.miss — client-side
counters tagged with table_name, index_name. Incremented whenever the rewriter
runs against a table with BSON indexes.
EXPLAIN plan suffix: when a BSON-path index is matched, append
[BSON path: <canonical-path>, type: <TYPE>] to the existing RANGE SCAN plan line.
The existing code path for plan-line generation lives in
ExplainPlan / ScanPlan.getExplainSteps(); adding a suffix from IndexMaintainer
metadata is a small change.
No new coprocessor counters — nothing new happens on the server side.

4.6 Operator sugar (`->` and `->>`), optional separate phase

The reviewer correctly noted this was smuggled in. Separated out: add -> and ->>
operators to the ANTLR grammar (PhoenixSQL.g), with PG-equivalent semantics:

bson_col -> 'field' → BSON_VALUE(bson_col, 'field', 'BSON') (returns sub-document)
bson_col ->> 'field' → BSON_VALUE(bson_col, 'field', 'VARCHAR') (returns scalar as
string — matches PG ->> behavior)
Chained: bson_col -> 'a' -> 'b' ->> 'c' → BSON_VALUE(bson_col, 'a.b.c', 'VARCHAR').
Desugaring happens in the parse-tree phase, producing canonical BsonValueParseNode.

Owns its own ticket and grammar-review cycle. Not blocking the indexing improvements.

5. What does not change

IndexMaintainer.buildRowKey — unchanged. No new "is_bson_path" protobuf flag (M6 of
the review: redundant). No sparse-null skip branch — the existing null-in-index-key
encoding plus partial-index WHERE already gives users both dense and sparse options.
IndexRegionObserver — unchanged. Existing pre-image/post-image logic is already correct.
MetaDataClient.createIndex — unchanged. The isJsonFragment guard does not block
BSON and does not need relaxation. The determinism and stateless gates pass today.
SYSTEM.CATALOG schema — unchanged.
DDL grammar for CREATE INDEX — unchanged. No mandatory AS <type> (type is already
an argument of BSON_VALUE). No reserved USING PATH keyword in this scope (GIN is a
separate design; reserve in that design if needed).
On-disk index format — unchanged.

6. Error handling and edge cases

Situation	Behavior
BSON column missing	Existing: `BSON_VALUE` returns default; index encodes null; behavior matches today
Path missing in row	Existing: `returnDefaultValue` → empty `ptr`; index encodes null; if user has partial-index `WHERE ... IS NOT NULL`, row is skipped from index
Path present, wrong type	Existing: `BsonValueFunction.evaluate` throws `IllegalArgumentException("function data type does not match with actual data type")`. This aborts the mutation. (Verified at `BsonValueFunction.java:164-165`.)
Unparseable JSONPath at DDL	New: `BsonPathNormalizer` throws SQLException pointing at offending segment
Wildcard / filter / recursive descent	New: reject at DDL with SQLException
Two indexes defined on equivalent paths (`'a.b'` vs `'$.a.b'`)	New after canonicalization: both canonicalize to `'a.b'`; second `CREATE INDEX` gets existing duplicate-index error
Pre-existing index with non-canonical path literal (Mod1 upgrade hazard)	Canonicalize only on new `CREATE INDEX`; leave existing catalog rows alone. Queries still match the non-canonical path string byte-for-byte.
Predicate shapes the rewriter doesn't handle	Full scan; document the list explicitly

The mutation-aborting behavior on type mismatch is a latent surprise that the
reviewer flagged (as part of Mod4). Filed as a separate issue to decide whether to keep
throwing, coerce-to-null, or add a new BSON_VALUE overload with lenient semantics.
Out of scope for this design — do not change BsonValueFunction behavior here.

7. Phased delivery

Each phase is one PHOENIX JIRA ticket, mergeable independently, passing all existing
tests. Master is coherent after each phase.

Phase 0 — Verify the numeric sort-order correctness bug

Write the reproducer IT described in §4.3. No fix yet.
If it passes, the bug is not there — update §4.3 to note what's actually happening.
If it fails, file and prioritize the correctness ticket.
Exit: conclusive pass/fail result in a JIRA, documented.
Risk: none — test-only addition.

Phase 1 — `BsonPathNormalizer` (unwired)

Add the normalizer class under parse/bson/, package-private.
Unit tests covering: canonical no-op, $. stripping, bracketed/dot form equivalence,
rejection of unsupported syntax.
Exit: 100% branch coverage on normalizer; zero production callers.
Risk: minimal, additive.

Phase 2 — Fix numeric sort-order (if Phase 0 confirmed it)

Route DOUBLE/FLOAT index-key encoding through an order-preserving transform.
Add per-numeric-type range IT under BsonPathIndexPredicateIT.
Document upgrade implications: existing DOUBLE-path indexes will need a rebuild to
produce correct scan results. Provide an ALTER INDEX ... REBUILD note.
Exit: all-type range-predicate IT passes; rebuild-from-upgrade IT passes.
Feature flag: not applicable — this is a bug fix; gate on a one-time upgrade
migration that marks existing DOUBLE-path indexes as requiring rebuild.

Phase 3 — Wire the normalizer into DDL and query rewrite

MetaDataClient.createIndex: call BsonPathNormalizer on each indexed parse-node
before computing expressionStr.
IndexExpressionParseNodeRewriter: call BsonPathNormalizer on each indexed
expression after parsing, and on the WHERE clause before map lookup. Add the
hasBsonIndex fast-path guard.
Existing Bson5IT must still pass without modification — its paths already round-trip
through a no-op canonicalization.
Add BsonPathCanonicalizationIT: same index created two ways ('a.b' vs '$.a.b')
→ second fails as duplicate; query with either spelling hits the same index.
Feature flag: phoenix.index.bson.normalize.enabled, default true. Flip off to
revert to byte-for-byte matching if the normalizer misbehaves.
Exit: Bson5IT green; BsonPathCanonicalizationIT green; no perf regression on
non-BSON-table query benchmarks.

Phase 4 — Predicate coverage for range / BETWEEN / IN

BsonPathIndexPredicateIT: exhaustive matrix of (predicate type) × (BSON_VALUE output
type). Assert plan uses index and results match no-index baseline.
No production code changes expected — existing scan-range derivation handles these
once the LHS matches. If any predicate form is silently not matching, this phase
files a follow-up ticket rather than forcing a v1 fix.
Exit: matrix green; documented list of known-non-matching predicate forms.

Phase 5 — Observability

Client-side metrics: phoenix.index.bson.rewrite.hit / .miss, tagged per index.
EXPLAIN plan suffix: [BSON path: <path>, type: <TYPE>] on RANGE SCAN lines over a
BSON-path index.
phoenix-pherf scenario: write+read mix against a BSON-path index; publish a baseline
report as an artifact on the phase JIRA.
Exit: metrics surfaced in JMX; EXPLAIN assertions in BsonPathIndexPredicateIT;
perf report attached.
Budget: < 5% query p99 overhead on queries against tables with no BSON indexes
(the cost of the hasBsonIndex check). < 10% write-path p99 overhead on a workload
with one BSON-path index over a 4KB document. If exceeded, revisit the fast-path.

Phase 6 (optional, separate ticket) — Operator sugar `->` / `->>`

Grammar addition in PhoenixSQL.g.
Desugar to BsonValueParseNode at parse time; then canonicalization and everything
else works unchanged.
Add PG-parity IT.
Exit: operator IT green; grammar ambiguity (overload with arithmetic >) resolved
in the parser.

Phase 7 (out of scope for this spec)

GIN-style multi-valued path indexes (separate design).
Dynamic-column indexing (separate design).
Optional BSON_VALUE leniency mode for type mismatches (separate ticket).

8. Rollback strategy

Phase 2 (numeric fix): gated on a per-index rebuild. If the fix itself is buggy,
operators can ALTER INDEX … DISABLE and fall back to full scan.
Phase 3 (normalizer): flag phoenix.index.bson.normalize.enabled=false reverts to
byte-for-byte matching. Existing indexes stay correctly maintained either way.
Phase 4 (tests): test-only.
Phase 5 (observability): cosmetic / operator-facing, no runtime impact on correctness.
Phase 6 (operator sugar): gated on successful ANTLR regeneration; operators can stay
on master without the grammar bump until confident.

9. Testing strategy

Unit: BsonPathNormalizer covered by golden-file tests; fast-path check in
IndexExpressionParseNodeRewriter has dedicated tests for the no-BSON-index short
circuit.
Integration: BsonPathCanonicalizationIT (Phase 3), BsonPathIndexPredicateIT
(Phase 4), BsonPathNumericSortOrderIT (Phase 0/2), plus continued passage of
existing Bson5IT.
Correctness invariant: for any query Q and matching BSON-path index I, the
result set with I enabled must equal the result set after ALTER INDEX I DISABLE.
Encoded as a randomized IT.
Upgrade test: create indexes on pre-change master, bounce to post-change master,
verify queries still match; DOUBLE-path indexes are marked for rebuild.
Perf test: phoenix-pherf scenarios for (a) write-path overhead with one BSON
index on a 4KB doc, (b) query-path overhead on a table with no BSON indexes.

BSON Path Functional Indexes — User Guide

This is a short companion to the design spec at
docs/superpowers/specs/2026-05-05-bson-path-functional-indexes-design.md.

What you can do today

Define a secondary index on a path inside a BSON column:

CREATE TABLE orders (
  id   VARCHAR PRIMARY KEY,
  doc  BSON
);

CREATE INDEX idx_orders_customer
  ON orders (BSON_VALUE(doc, '$.customer.id', 'VARCHAR'));

Queries that name the same canonical BSON path will use the index automatically:

SELECT id FROM orders WHERE BSON_VALUE(doc, '$.customer.id', 'VARCHAR') = 'C-42';
SELECT id FROM orders WHERE BSON_VALUE(doc, 'customer.id', 'VARCHAR')   = 'C-42';
SELECT id FROM orders
   WHERE BSON_VALUE(doc, '$.customer.id', 'VARCHAR') IN ('C-42', 'C-43');

Both forms canonicalize to BSON_VALUE(DOC, '$.customer.id', 'VARCHAR') and hit the index.

Sparse semantics

If a row's BSON document does not contain the indexed path, no index entry is written for
that row (sparse index). Consequence: you cannot use a BSON path index to find missing-path
rows via IS NULL.

Type contract

BSON_VALUE's third argument fixes the SQL type of the indexed key. Match the WHERE clause to
the same type: index built AS BIGINT requires the predicate to be a numeric literal, not a
string. v1 does not yet rewrite CAST(BSON_VALUE(...) AS BIGINT) = 1 for you.

Predicate forms that hit the index

Form	Uses index?
`BSON_VALUE(doc, p, 'VARCHAR') = 'x'`	Yes
`BSON_VALUE(doc, p, 'VARCHAR') IN (...)`	Yes
`BSON_VALUE(doc, p, 'VARCHAR') BETWEEN ...`	Yes
`BSON_VALUE(doc, p, 'VARCHAR') > 'x'`	Yes
`UPPER(BSON_VALUE(doc, p, 'VARCHAR')) = 'X'`	No
`BSON_VALUE(doc, p, 'VARCHAR') LIKE 'a%'`	No
`BSON_VALUE(doc, p, 'VARCHAR') IS NULL`	No (sparse)

Path language supported in v1

Form	Example	Supported
Dot	`$.a.b.c`	Yes
Array index	`$.a[0]`, `$.a[10][3]`	Yes
Quoted key	`$['weird key']`, `$["odd"]`	Yes
Bare path	`a.b`, `a[0]` (canonicalized to `$.a.b`)	Yes
Wildcards	`$.`, `$[]`	No
Filters	`$[?(@.x>1)]`	No
Recursive descent	`$..x`	No
Slice	`$[0:2]`	No

Feature flags

Flag	Default	Effect when `false`
`phoenix.index.bson.enabled`	`true`	`CREATE INDEX` on BSON paths is rejected
`phoenix.index.bson.rewrite.enabled`	`true`	Indexes still maintained; queries don't use them

Observability

Client-process counters in org.apache.phoenix.monitoring.BsonPathMetrics:

getSparseSkips() — number of UPSERT rows that hit a missing-path branch and were
skipped from the index.
getRewriteHits() — number of WHERE-clause sub-expressions that matched a BSON path index
after canonicalization.
getRewriteMisses() — number of BSON-path WHERE expressions that did not match any indexed
expression (typically: wrapped LHS, or no relevant index defined).

What's not yet supported

Multi-valued (GIN-style) BSON path indexes — DDL keyword USING PATH is reserved but not
implemented.
Local BSON path indexes, async-build, eventually-consistent BSON path indexes.
IS NULL rewrite, LIKE, function-wrapped LHS.
-> / ->> operator sugar.
Coprocessor / server-side metric publication — counters are client-process only and not
promoted to Phoenix's MetricInfo enum yet.

Adds the design document for BSON-path functional indexes (Phase 0–5) plus six self-contained implementation plans. Plans are written so a subagent can execute each phase end-to-end with TDD discipline and local verification.

…ture flag

…okup Indexed BSON expressions are stored in canonical JSONPath form ('$.name'), but CommonComparisonExpressionUtils.getFieldFromDocument was the legacy non-canonical walker that treats the leading '$' as a top-level field name and returns null for any indexed lookup. As a result, BsonValueFunction.evaluate took the missing-path branch on every Put, sparse-skip kicked in, and the index never received any rows. Add a canonical-aware walker that handles '$.field', '$['quoted field']', and '$.field[n]' forms, and dispatch to it when the path begins with '$'. Legacy non-canonical paths still flow through the original walker unchanged.

…avior

…n macOS

…uery cases

…cases

…6 cases

nlakshmanan added 30 commits May 14, 2026 17:03

PHOENIX BsonPath: add exception type for path parser (Phase 0/1)

1888a19

PHOENIX BsonPath: add immutable BsonPath value type

3c5e1f6

PHOENIX BsonPath: add JSONPath-subset parser (happy path)

ea8cbee

PHOENIX BsonPath: parser rejects unsupported JSONPath features

3b73f86

PHOENIX BsonPath: parser fuzz test (5k random inputs, no crashes)

931e8de

PHOENIX BsonPath: canonicalizer skeleton (identity rewrite)

ee79594

PHOENIX BsonPath: canonicalize BSON_VALUE path arg + type case

71f61cb

PHOENIX BsonPath: canonicalizer rewrites JSON_VALUE to BSON_VALUE

91b5dcb

PHOENIX BsonPath: canonicalizer recurses into compound nodes

a4454e5

PHOENIX BsonPath: extractPath helper coverage

e3b2c69

PHOENIX BsonPath: add phoenix.index.bson.enabled feature flag

8d01915

PHOENIX BsonPath: add BsonIndexUtil helpers

9a8bf5d

PHOENIX BsonPath: canonicalize index expression on CREATE INDEX + fea…

0f2af67

…ture flag

PHOENIX BsonPath: sparse-skip rows where indexed BSON path is missing

b801ef2

PHOENIX BsonPath: write-path IT covering populate, sparse-skip, dedupe

dc0be3a

PHOENIX BsonPath: canonicalize indexed expression on rewriter load

f813544

PHOENIX BsonPath: canonicalize WHERE expression before index match

6aff1c9

PHOENIX BsonPath: phoenix.index.bson.rewrite.enabled feature flag

fdd8ec4

PHOENIX BsonPath: query-side IT covering eq, IN, BETWEEN, fallback

2e47a0e

PHOENIX BsonPath: randomized index/no-index consistency IT

a221ddf

PHOENIX BsonPath: reserve BSON_PATH_INDEX_NOT_SUPPORTED error code

c01ad2c

PHOENIX BsonPath: reserve USING PATH clause on CREATE INDEX (v1 rejects)

096ad78

PHOENIX BsonPath: parser test for USING PATH reservation

35af305

PHOENIX BsonPath: add BsonPathMetrics counters

ec64605

PHOENIX BsonPath: increment sparse-skip counter on missing path

d5f8a75

PHOENIX BsonPath: increment rewrite hit/miss counters + IT assertion

a6f154f

PHOENIX BsonPath: user guide for v1

ffb50f8

PHOENIX BsonPath: update PROGRESS.md — all 6 phases done

22e1b16

nlakshmanan and others added 12 commits May 14, 2026 17:03

PHOENIX BsonPath: relax wrapped-LHS test to match Phoenix planner beh…

0b33ba1

…avior

PHOENIX local-IT-runner: add docker-based script to run Phoenix ITs o…

31e1b20

…n macOS

PHOENIX json-bson-it-suite: design spec

773d70e

PHOENIX json-bson-it-suite: implementation plan (6 batches)

06f90b6

PHOENIX json-bson-it: add 100-row deterministic dataset generator

dff6fe9

PHOENIX json-bson-it: add EXPLAIN-plan classifier + expectation helper

45675ff

PHOENIX json-bson-it: add singleton reporter (JSON+MD per run)

46842b0

PHOENIX json-bson-it: add @ClassRule that flushes reporter per class

cda38d3

PHOENIX json-bson-it: BsonFlatIndexIT — flat $.name path, 12 query cases

7edf28b

PHOENIX json-bson-it: BsonNestedIndexIT — $.profile.score BIGINT, 9 q…

19ac405

…uery cases

PHOENIX json-bson-it: JsonFlatIndexIT — JSON_VALUE($.email), 8 query …

3433e21

…cases

PHOENIX json-bson-it: JsonNestedIndexIT — JSON_VALUE($.address.zip), …

5aac1e5

…6 cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/json indexes#2477

Feature/json indexes#2477
palmer159 wants to merge 42 commits into
apache:masterfrom
palmer159:feature/json-indexes

palmer159 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

palmer159 commented May 15, 2026

2. What this design delivers

Non-goals (unchanged from prior spec, reaffirmed)

Goals, restated

3. Architecture

4. Components

4.1 BsonPathNormalizer

4.2 Fast-path guard in IndexExpressionParseNodeRewriter

4.3 Numeric sort order verification + fix

4.4 Predicate rewrite coverage for range / BETWEEN / IN

4.5 Observability

4.6 Operator sugar (-> and ->>), optional separate phase

5. What does not change

6. Error handling and edge cases

7. Phased delivery

Phase 0 — Verify the numeric sort-order correctness bug

Phase 1 — BsonPathNormalizer (unwired)

Phase 2 — Fix numeric sort-order (if Phase 0 confirmed it)

Phase 3 — Wire the normalizer into DDL and query rewrite

Phase 4 — Predicate coverage for range / BETWEEN / IN

Phase 5 — Observability

Phase 6 (optional, separate ticket) — Operator sugar -> / ->>

Phase 7 (out of scope for this spec)

8. Rollback strategy

9. Testing strategy

BSON Path Functional Indexes — User Guide

What you can do today

Sparse semantics

Type contract

Predicate forms that hit the index

Path language supported in v1

Feature flags

Observability

What's not yet supported

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

4.1 `BsonPathNormalizer`

4.2 Fast-path guard in `IndexExpressionParseNodeRewriter`

4.6 Operator sugar (`->` and `->>`), optional separate phase

Phase 1 — `BsonPathNormalizer` (unwired)

Phase 6 (optional, separate ticket) — Operator sugar `->` / `->>`