fix: add 168 missing lexer tokens to keyword rule#186
fix: add 168 missing lexer tokens to keyword rule#186engalar wants to merge 2 commits intomendixlabs:mainfrom
Conversation
The keyword rule in MDLParser.g4 allows lexer tokens to be used as identifiers (entity names, attribute names, enum values). Many tokens were missing, causing parse failures when user-defined names matched keywords like Data, Filter, Match, Empty, Open, Container, etc. This adds all word-type lexer tokens to the keyword rule, organized by category with alphabetical ordering for easy auditing. Also removes duplicate entries (STRUCTURES, PAGING, EXECUTE) and a phantom UI token.
AI Code ReviewWhat Looks Good
RecommendationApprove. This PR successfully resolves the parsing issue by adding missing tokens to the keyword rule, improves maintainability through reorganization and deduplication, and provides adequate test coverage. The scoped change appropriately addresses the foundation layer without requiring modifications to other pipeline components since it fixes core parsing functionality. No changes are needed. Automated review via OpenRouter (Nemotron Super 120B) — workflow source |
AI Code ReviewWhat Looks Good
RecommendationApprove. This is a well-executed bug fix that resolves a significant parsing gap while improving code maintainability through thoughtful reorganization. The change is minimal, focused, and thoroughly tested. No modifications to downstream pipeline components are needed or appropriate for this grammar-level fix. Automated review via OpenRouter (Nemotron Super 120B) — workflow source |
ako
left a comment
There was a problem hiding this comment.
This is exactly the broad fix I recommended in the PR #174 review.
What's good
- Solves the systemic problem: 168 word-type tokens missing from
keywordis a significant gap. This affects enum values, attribute names, entity names, anywherequalifiedNameis used. - Categorical organization with 18 groups + alphabetical ordering makes future audits trivial — easy to find where to add new tokens
- Cleanup: removes 3 duplicates (
STRUCTURES,PAGING,EXECUTE) and 1 phantom (UI) - Pure grammar change — no executor/visitor/AST, lowest possible risk
- Verification approach documented:
comm -23for missing tokens,uniq -dfor duplicates
Concerns
No regression tests added. "Smoke tested 15+ cases" is good for confidence but doesn't prevent regression. A test file with CREATE ENUMERATION Test.E (Open, Data, Filter, Match, Empty, Container, Node, ...) would catch any future drift.
No automated lexer/keyword sync check. A make target that grep's word-type tokens from MDLLexer.g4 and verifies each is in the keyword rule (or explicitly excluded) would prevent this from happening again. Worth a follow-up issue.
PR #174 becomes a partial duplicate since this includes the OPEN fix. Should be closed when this merges.
LGTM.
Summary
keywordrule inMDLParser.g4allows lexer tokens to be used as identifiers (entity/attribute/enum names viaqualifiedName). 168 word-type tokens were missing, causing parse failures when user-defined names matched keywords likeData,Filter,Match,Empty,Open,Container,Node,Activity, etc.STRUCTURES,PAGING,EXECUTE) and 1 phantom token (UI)Before
After
Scope
Only the
keywordrule inMDLParser.g4changed (+ regenerated parser files). No executor, visitor, or AST changes.Follow-up from PR #174 code review which identified this systemic gap.
Test plan
make test— all tests pass (including previously failingTestQuotedIdentifierInWidgetAttributeandTestShowPageMicroflowStyleArgsInWidgetwhich were broken during development by accidentally droppingLAYOUT)comm -23)uniq -d)