Skip to content

fix: add 168 missing lexer tokens to keyword rule#186

Open
engalar wants to merge 2 commits intomendixlabs:mainfrom
engalar:fix/keyword-rule-missing-tokens
Open

fix: add 168 missing lexer tokens to keyword rule#186
engalar wants to merge 2 commits intomendixlabs:mainfrom
engalar:fix/keyword-rule-missing-tokens

Conversation

@engalar
Copy link
Copy Markdown
Contributor

@engalar engalar commented Apr 10, 2026

Summary

  • The keyword rule in MDLParser.g4 allows lexer tokens to be used as identifiers (entity/attribute/enum names via qualifiedName). 168 word-type tokens were missing, causing parse failures when user-defined names matched keywords like Data, Filter, Match, Empty, Open, Container, Node, Activity, etc.
  • Reorganized the rule by category with alphabetical ordering for easy auditing
  • Removed 3 duplicate entries (STRUCTURES, PAGING, EXECUTE) and 1 phantom token (UI)

Before

-- Parse error: "extraneous input '.' expecting {',', ')'}"
$Issue = CREATE MyModule.OrderIssue (
    Status = MyModule.ENUM_IssueStatus.Open
);

After

-- All keyword-named enum values now parse correctly
$Issue = CREATE MyModule.OrderIssue (
    Status = MyModule.ENUM_IssueStatus.Open,     -- was broken
    Kind = MyModule.ENUM_Kind.Data,               -- was broken
    Mode = MyModule.ENUM_Mode.Filter,             -- was broken
    State = MyModule.ENUM_State.Empty,            -- was broken
    Type = MyModule.ENUM_Type.Container,          -- was broken
    Phase = MyModule.ENUM_Phase.Match,            -- was broken
    Method = MyModule.ENUM_M.Get                  -- was broken
);

Scope

Only the keyword rule in MDLParser.g4 changed (+ regenerated parser files). No executor, visitor, or AST changes.

Follow-up from PR #174 code review which identified this systemic gap.

Test plan

  • make test — all tests pass (including previously failing TestQuotedIdentifierInWidgetAttribute and TestShowPageMicroflowStyleArgsInWidget which were broken during development by accidentally dropping LAYOUT)
  • Smoke tested 15+ keyword-as-enum-value cases (Data, Filter, Match, Empty, Container, Open, Node, Activity, Condition, Get, Post, etc.)
  • No original keyword tokens dropped (verified via comm -23)
  • No duplicate tokens in final rule (verified via uniq -d)

The keyword rule in MDLParser.g4 allows lexer tokens to be used as
identifiers (entity names, attribute names, enum values). Many tokens
were missing, causing parse failures when user-defined names matched
keywords like Data, Filter, Match, Empty, Open, Container, etc.

This adds all word-type lexer tokens to the keyword rule, organized
by category with alphabetical ordering for easy auditing. Also removes
duplicate entries (STRUCTURES, PAGING, EXECUTE) and a phantom UI token.
@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Clear problem/solution explanation: The PR effectively explains that 168 missing lexer tokens were causing parse failures when user-defined names matched keywords (like Data, Filter, Match), with concrete before/after examples showing the fix works.
  • Appropriate scope: The change is narrowly focused on fixing the keyword rule in MDLParser.g4 to resolve a systemic parsing gap identified in PR fix: add OPEN to keyword rule for enum value parsing #174, with no unnecessary modifications to executor, visitor, or AST layers (as correctly noted).
  • Maintainability improvements: Reorganized the keyword rule by category with alphabetical ordering for easier auditing, removed 3 duplicate entries and 1 phantom token (UI), improving code quality.
  • Thorough testing: Verified via make test (including previously failing tests), smoke-tested 15+ keyword-as-enum-value cases, confirmed no original tokens were dropped, and verified no duplicates remain.
  • Follows conventions: The PR correctly notes that generated parser files (mdl/grammar/parser/) are noise and should be skipped in review, aligning with project instructions.

Recommendation

Approve. This PR successfully resolves the parsing issue by adding missing tokens to the keyword rule, improves maintainability through reorganization and deduplication, and provides adequate test coverage. The scoped change appropriately addresses the foundation layer without requiring modifications to other pipeline components since it fixes core parsing functionality. No changes are needed.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

@github-actions
Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Bug Fix: Correctly addresses parse failures when user-defined names (entity/attribute/enum names) matched missing lexer tokens like Data, Filter, Match, etc. The before/after examples clearly demonstrate the fix works for enum values and other identifiers.
  • Thorough Reorganization: The keyword rule is now categorized by functional area (DDL/DML, Entity/Domain model, Types, etc.) with alphabetical ordering within each group, making future maintenance and auditing significantly easier.
  • Cleanup: Removed 3 duplicate entries (STRUCTURES, PAGING, EXECUTE) and 1 phantom token (UI) as stated, improving accuracy of the keyword list.
  • Verification: Explicitly confirmed no original tokens were dropped (comm -23) and no duplicates remain (uniq -d), showing due diligence.
  • Test Coverage: All tests pass including previously failing tests, plus manual smoke testing of 15+ keyword-as-enum-value cases.
  • Scoped Change: Limited to the grammar layer only (no AST/visitor/executor changes needed), which is appropriate for this type of parsing fix.
  • Generated Files: Properly regenerated and committed ANTLR parser files after grammar change, as required.
  • MDL Syntax Compliance: Maintains all existing MDL syntax patterns; the fix enables proper use of keywords as identifiers without altering the language design.

Recommendation

Approve. This is a well-executed bug fix that resolves a significant parsing gap while improving code maintainability through thoughtful reorganization. The change is minimal, focused, and thoroughly tested. No modifications to downstream pipeline components are needed or appropriate for this grammar-level fix.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

Copy link
Copy Markdown
Collaborator

@ako ako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the broad fix I recommended in the PR #174 review.

What's good

  • Solves the systemic problem: 168 word-type tokens missing from keyword is a significant gap. This affects enum values, attribute names, entity names, anywhere qualifiedName is used.
  • Categorical organization with 18 groups + alphabetical ordering makes future audits trivial — easy to find where to add new tokens
  • Cleanup: removes 3 duplicates (STRUCTURES, PAGING, EXECUTE) and 1 phantom (UI)
  • Pure grammar change — no executor/visitor/AST, lowest possible risk
  • Verification approach documented: comm -23 for missing tokens, uniq -d for duplicates

Concerns

No regression tests added. "Smoke tested 15+ cases" is good for confidence but doesn't prevent regression. A test file with CREATE ENUMERATION Test.E (Open, Data, Filter, Match, Empty, Container, Node, ...) would catch any future drift.

No automated lexer/keyword sync check. A make target that grep's word-type tokens from MDLLexer.g4 and verifies each is in the keyword rule (or explicitly excluded) would prevent this from happening again. Worth a follow-up issue.

PR #174 becomes a partial duplicate since this includes the OPEN fix. Should be closed when this merges.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants