Skip to content

Support pattern expressions as boolean expressions#2360

Open
gregfelice wants to merge 3 commits intoapache:masterfrom
gregfelice:feature_pattern_in_where
Open

Support pattern expressions as boolean expressions#2360
gregfelice wants to merge 3 commits intoapache:masterfrom
gregfelice:feature_pattern_in_where

Conversation

@gregfelice
Copy link
Copy Markdown
Contributor

@gregfelice gregfelice commented Mar 25, 2026

Summary

Adds support for openCypher pattern expressions as general boolean expressions. Pattern expressions are now accepted anywhere a regular expression is valid — in WHERE, RETURN, WITH, SET, CASE, and as operands of boolean combinators — not just in WHERE.

Note on scope: An earlier version of this PR was titled "Support pattern expressions in WHERE clause via GLR parser." While implementing, Copilot correctly pointed out that adding anonymous_path to expr_atom actually enables pattern expressions in every expression context, not only WHERE. Rather than restrict the grammar with a WHERE-specific nonterminal, this PR now explicitly embraces the broader surface area — matching what openCypher defines and what Neo4j users expect — and covers the new contexts with regression tests. The PR title and description have been updated accordingly.

// WHERE (original motivation — still works)
MATCH (a:Person), (b:Person)
WHERE (a)-[:KNOWS]->(b)
RETURN a.name, b.name

// NOT pattern expression in WHERE
MATCH (a:Person)
WHERE NOT (a)-[:KNOWS]->(:Person)
RETURN a.name

// RETURN — project a boolean
MATCH (a:Person)
RETURN a.name, (a)-[:KNOWS]->(:Person) AS knows_someone

// CASE WHEN
MATCH (a:Person)
RETURN a.name,
       CASE WHEN (a)-[:KNOWS]->(:Person) THEN 'social' ELSE 'loner' END

// Combined with AND / OR
MATCH (a:Person)
RETURN a.name,
       (a)-[:KNOWS]->(:Person) AND (a)-[:WORKS_WITH]->(:Person) AS has_both

// SET — persist a computed boolean as a property
MATCH (a:Person)
SET a.is_social = (a)-[:KNOWS]->(:Person)
RETURN a.name, a.is_social

// WITH pipeline
MATCH (a:Person)
WITH a.name AS name, (a)-[:KNOWS]->(:Person) AS knows
WHERE knows
RETURN name

Implementation

Grammar (cypher_gram.y)

  • Switched the parser to GLR mode (%glr-parser) to handle the inherent ambiguity between parenthesized expressions and graph patterns. Both start with (, so a single-token lookahead cannot decide which production applies until several tokens later. GLR forks at the conflict point and discards the failing alternative.
  • Added anonymous_path to expr_atom with a %dprec 1 annotation. A bare (a) prefers the expression-variable interpretation (%dprec 2 on '(' expr ')'), so single-node pattern expressions still resolve as plain variable references.
  • Added make_exists_pattern_sublink() to wrap the pattern in an EXISTS subquery — a bare pattern (a)-[:KNOWS]->(b) in expression context is semantically equivalent to EXISTS((a)-[:KNOWS]->(b)).
  • Added a helper make_explain_stmt() and placed it immediately above its docstring comment so the grammar file stays readable.
  • No hardcoded conflict budget. GLR produces 7 shift/reduce and 3 reduce/reduce conflicts (all arising from the ( ambiguity between path_node and parenthesized expr, plus expr_var/var_name_opt overlap on )/}/=). These are handled correctly at runtime by GLR + %dprec, but the exact counts can drift across Bison versions. Rather than %expect 7 / %expect-rr 3, we pass -Wno-conflicts-sr -Wno-conflicts-rr via BISONFLAGS in the Makefile, so the build stays clean across distros and future Bison releases without binding us to a specific version. A block comment in cypher_gram.y documents what the conflicts are and why they're expected.

Parser (cypher_analyze.c, cypher_clause.c)

  • Existing EXISTS sublink handling already covers the pattern case — no new transform code needed on the analyze/clause side.

Files changed

File Change
src/backend/parser/cypher_gram.y %glr-parser, %dprec annotations, anonymous_path in expr_atom, make_exists_pattern_sublink(), GLR comment block explaining why conflicts are expected
Makefile Pass -Wno-conflicts-sr -Wno-conflicts-rr to Bison via BISONFLAGS; register pattern_expression regression test
regress/sql/pattern_expression.sql New regression suite: WHERE, NOT, nested, multi-hop, RETURN projection, mixed projections, CASE WHEN, AND/OR combinators, SET, WITH pipeline, regression guards for parenthesized arithmetic
regress/expected/pattern_expression.out Expected output for above

Test plan

  • Basic pattern in WHERE: WHERE (a)-[:KNOWS]->(b)
  • NOT pattern in WHERE: WHERE NOT (a)-[:KNOWS]->(:Person)
  • Pattern in WHERE with label: WHERE (a)-[:KNOWS]->(:Person)
  • Pattern via EXISTS subquery (original syntax still works)
  • Pattern in RETURN with AS alias → boolean column
  • Pattern in RETURN without alias (positional column)
  • Multiple pattern expressions in same RETURN
  • Pattern in CASE WHEN ... THEN ... ELSE ... END
  • Pattern combined with AND and OR in RETURN
  • Pattern in SET a.flag = (a)-[:R]->(:L)
  • Pattern in WITH projection + subsequent WHERE filter
  • Regression guards: RETURN (1 + 2), RETURN (n.name) — parenthesized expressions still work
  • All 32 regression tests pass (31 existing + pattern_expression)
  • No build warnings across Bison (verified with unpinned distro Bison)

…che#1577)

Enable bare graph patterns as boolean expressions in WHERE clauses:

  MATCH (a:Person), (b:Person)
  WHERE (a)-[:KNOWS]->(b)        -- now valid, equivalent to EXISTS(...)
  RETURN a.name, b.name

Previously, this required wrapping in EXISTS():
  WHERE EXISTS((a)-[:KNOWS]->(b))

The bare pattern syntax is standard openCypher and is used extensively
in Neo4j.  Its absence was the most frequently cited migration blocker.

Implementation approach:
- Switch the Cypher parser from LALR(1) to Bison GLR mode.  GLR handles
  the inherent ambiguity between parenthesized expressions '(' expr ')'
  and graph path nodes '(' var_name label_opt props ')' by forking the
  parse stack and discarding the failing path.
- Add anonymous_path as an expr_atom alternative with %dprec 1 (lower
  priority than expression path at %dprec 2).  The action wraps the
  pattern in a cypher_sub_pattern + EXISTS SubLink, reusing the same
  transform_cypher_sub_pattern() machinery as explicit EXISTS().
- Extract make_exists_pattern_sublink() helper shared by both
  EXISTS(pattern) and bare pattern rules.
- Fix YYLLOC_DEFAULT to use YYRHSLOC() for GLR compatibility.
- %dprec annotations on expr_var/var_name_opt resolve the reduce/reduce
  conflict between expression variables and pattern node variables.

Conflict budget: 7 shift/reduce (path extension vs arithmetic on -/<),
3 reduce/reduce (expr_var vs var_name_opt on )/}/=).  All are expected
and handled correctly by GLR forking + %dprec disambiguation.

All 32 regression tests pass (31 existing + 1 new).  New
pattern_expression test covers: bare patterns, NOT patterns, labeled
nodes, AND/OR combinations, left-directed patterns, anonymous nodes,
multi-hop patterns, EXISTS() backward compatibility, and non-pattern
expression regression checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds openCypher-compatible “pattern expressions” as boolean predicates in WHERE by resolving the (a) ambiguity (parenthesized expression vs node pattern) using Bison GLR parsing.

Changes:

  • Switch cypher_gram.y to %glr-parser, add %dprec annotations, and introduce make_exists_pattern_sublink() to share EXISTS(pattern) wrapping logic.
  • Allow anonymous_path to appear as an expr_atom, translating bare patterns into an EXISTS sublink boolean.
  • Add regression coverage for pattern expressions (regress/sql + regress/expected) and register the new test in REGRESS.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/backend/parser/cypher_gram.y Enables bare pattern parsing in expression context via GLR + %dprec, refactors EXISTS wrapping into a helper.
regress/sql/pattern_expression.sql New regression test cases covering pattern expressions in WHERE and boolean combinations.
regress/expected/pattern_expression.out Expected output for the new regression test.
Makefile Adds pattern_expression to the regression test run list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_gram.y Outdated
Comment thread src/backend/parser/cypher_gram.y Outdated
Comment thread regress/sql/pattern_expression.sql
@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Please see the above comments by Copilot

@gregfelice
Copy link
Copy Markdown
Contributor Author

Addressed all 3 Copilot suggestions:

  1. Misplaced comment — Moved "Helper function to create an ExplainStmt node" from above make_exists_pattern_sublink() to directly above make_explain_stmt() where it belongs.

  2. %expect/%expect-rr documentation — Added block comment explaining the conflict budget: 7 shift/reduce from path extension vs arithmetic operators on -/<, 3 reduce/reduce from expr_var vs var_name_opt on )/}/=. All resolved by GLR forking + %dprec annotations. Noted to update counts if grammar rules change.

  3. Misleading test comment — Changed "Regular expressions still work" to "Regular (non-pattern) expressions still work" to avoid confusion with regex.

Regression test passes (pattern_expression: ok).

@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Did you push them?

1. Move "Helper function to create an ExplainStmt node" comment from
   above make_exists_pattern_sublink() to above make_explain_stmt()
   where it belongs.

2. Add block comment documenting the %expect/%expect-rr conflict
   budget: 7 S/R from path vs arithmetic on - and <, 3 R/R from
   expr_var vs var_name_opt on ) } =.

3. Clarify test comment: "Regular expressions" -> "Regular (non-pattern)
   expressions" to avoid confusion with regex.

Regression test: pattern_expression OK.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gregfelice
Copy link
Copy Markdown
Contributor Author

@jrgemignani Pushed now — sorry about that! All 3 Copilot suggestions addressed in commit 5d11080.

And noted on the workflow — I'll reply directly to Copilot's comments going forward instead of posting standalone. Thanks for the heads up.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_gram.y Outdated
Comment thread src/backend/parser/cypher_gram.y
@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Please see Copilot above

@gregfelice
Copy link
Copy Markdown
Contributor Author

gregfelice commented Apr 6, 2026 via email

- Pattern expressions are now accepted anywhere an expr is valid
  (RETURN, WITH, SET, CASE, boolean combinations), not only WHERE.
  This matches openCypher semantics and documents the broader surface
  area that was already implicitly enabled by adding anonymous_path
  to expr_atom.  Added regression tests for each new context:
  RETURN projection (bare and AS-aliased), mixed with other
  projections, CASE WHEN, boolean AND/OR combinators, SET to
  persist a computed boolean property, and WITH ... WHERE pipeline.

- Remove the hardcoded `%expect 7` / `%expect-rr 3` conflict budget
  from cypher_gram.y.  The exact conflict counts can drift across
  Bison versions and distros, which would break builds even though
  the grammar is correct (GLR handles the conflicts at runtime via
  fork + %dprec).  Instead, pass -Wno-conflicts-sr / -Wno-conflicts-rr
  via BISONFLAGS in the Makefile so the build stays clean without
  binding us to a specific Bison release.  Kept a block comment in
  the grammar explaining why GLR conflicts are expected and how
  they resolve.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gregfelice gregfelice changed the title Support pattern expressions in WHERE clause via GLR parser Support pattern expressions as boolean expressions Apr 15, 2026
@gregfelice
Copy link
Copy Markdown
Contributor Author

@jrgemignani — both Copilot items addressed and threads resolved. The %expect concern is moot (we use -Wno-conflicts-sr -Wno-conflicts-rr via BISONFLAGS instead), and the broader pattern-expression surface area is intentional, documented in the PR description, and covered by regression tests for WHERE, RETURN, CASE WHEN, SET, WITH, and boolean combinators. Ready for re-review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants