Skip to content

FlatExpression POC: data-oriented flat expression tree#511

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/data-oriented-expression-optimization
Draft

FlatExpression POC: data-oriented flat expression tree#511
Copilot wants to merge 3 commits intomasterfrom
copilot/data-oriented-expression-optimization

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 12, 2026

Explores the idea from #512: represent an expression tree as a single flat array of fat structs with integer index references instead of object-graph pointers — enabling stack allocation for small trees, trivial serialization, and O(1) structural equality.

Core types (src/FastExpressionCompiler/FlatExpression.cs)

  • Idx — 1-based int index into Nodes; default (It == 0) is the nil sentinel
  • ExpressionNode — sequential fat struct: NodeType, Type, Info, ConstantIndex, NextIdx (next sibling), ChildIdx (first child), ExtraIdx (second child slot)
  • ExpressionTree — holds nodes in SmallList<ExpressionNode, Stack16<…>, NoArrayPool<…>> (first 16 nodes on the call-stack) and closure constants in SmallList<object, Stack4<…>, …>; factory methods for Constant, Parameter, Unary, Binary, New, Call, Lambda, Conditional, Block
  • ToSystemExpression() — converts to System.Linq.Expressions so existing FEC compilation path is reachable
  • StructurallyEqual() — O(n) structural comparison via a single pass over the flat arrays; no traversal needed
var tree = default(ExpressionTree);
var px = tree.Parameter(typeof(int), "x");
var py = tree.Parameter(typeof(int), "y");
var add = tree.Add(px, py, typeof(int));
tree.Lambda(typeof(Func<int, int, int>), body: add, parameters: [px, py]);

// Round-trip to System.Linq.Expressions and compile
var fn = ((Expression<Func<int, int, int>>)tree.ToSystemExpression()).Compile();
fn(4, 7); // 11

Key design insight surfaced

Lambda parameters cannot be chained via NextIdx — the same parameter node may already have its NextIdx occupied as part of a New/Call argument chain. Lambda stores its parameters as Idx[] in Info instead. This is the central intrusive-list tension: one small Idx[] allocation per lambda avoids silent list corruption at construction time. A future optimisation could replace it with a (start, count) slice into a dedicated side array.

Wins

  • ≤ 16 nodes → zero heap allocation (stack-resident via Stack16)
  • Trivially serializable: arrays of plain structs + integer refs
  • O(1) node access; O(n) structural equality without recursion
  • Closure constants collected at build time, mutable after build

Gaps / obstacles

  • Not API-compatible with System.Linq.Expressions without the adapter
  • Mutable struct: accidental by-value copy silently forks state
  • Info field boxes MethodBase/string — one allocation per call/new/parameter node

Tests (test/FastExpressionCompiler.UnitTests/FlatExpressionTests.cs)

22 tests covering node construction, parameter identity, structural equality, ToSystemExpression conversion, round-trip compile+invoke, and the mutable-closure-constant use case. Guarded with #if !LIGHT_EXPRESSION since FlatExpression.cs is not included in the LightExpression variant.

Copilot AI changed the title [WIP] Add data-oriented expression representation with optimization FlatExpression POC: data-oriented flat expression tree Apr 12, 2026
Copilot AI requested a review from dadhi April 12, 2026 22:17
Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many comments. Keep only important stuff explaining why/how abd remove trivial what.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 12, 2026

Too many comments. Keep only important stuff explaining why/how abd remove trivial what.

Done in d565578. Removed all comments that just describe what the code does. Kept only the three non-obvious why/how notes: the Lambda params-in-Info design decision, the // reset in case node was previously linked safety note in LinkList, and the InfoEqual note about array reference equality. Source went from 738 → 455 lines, tests from 391 → 329.

Copilot AI requested a review from dadhi April 12, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data-oriented representation of Expression partly on stack and optimized for constant collection

2 participants