diff --git a/.gitignore b/.gitignore index 90bb357..82cfc6a 100644 --- a/.gitignore +++ b/.gitignore @@ -28,6 +28,7 @@ coverage dist dist-test +vendor/jsonic-expr/dist *.tsbuildinfo package-lock.json diff --git a/.npmignore b/.npmignore new file mode 100644 index 0000000..a9f3174 --- /dev/null +++ b/.npmignore @@ -0,0 +1,4 @@ +# package.json#files is the primary include allowlist; this file +# excludes a few artefacts that match those globs but shouldn't ship. +dist/tsconfig.tsbuildinfo +src/tsconfig.json diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..62bb87e --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,200 @@ +# Changelog + +## 2.0.0 + +Lands parenthesised sub-declarators (function pointers) and +top-level `static_assert` on the grammar path, vendors a patched +copy of `@jsonic/expr` so the comma-operator vs static_assert +comma-separator collision is solved at the source, and declares +the hybrid grammar + legacy-fallback architecture as the final +shape for this release line. + +### Added + +- `paren_inner_declarator` rule — inner declarator inside `( … )` + with pointer prefix + ID + array / function postfix support. + Wired into `init_declarator` so shapes like `int (*fp)(int);` + and `typedef int (*Fn)(int);` flow through the grammar. + `@looks-simple-decl` gains a paren-walk branch that recognises + `+ ( * + ID ) ( ? ) ;`. +- Top-level `static_assert(cond, msg)` and `_Static_assert(cond)` + dispatch through `external_declaration` into the existing + `static_assert_declaration` grammar rule. `@said-take-lparen` + now sets `rule.n.no_comma_op = 1` which propagates into the + cond / msg val sub-rules; the vendored `@jsonic/expr` honours + the flag by bailing on `,` rather than consuming it as the + comma operator. +- Vendored copy of `@jsonic/expr@2.2.0` under `vendor/jsonic-expr/`, + installed via `package.json` `file:` link. Patches add a + `n.no_comma_op` bail in both `val.close` and `expr.close`. The + bail matches `[INFIX]` with a src-equals-`,` cond so it works + with the C plugin's `PUNC_COMMA` lex (which is distinct from + jsonic-default `CA`). +- 4 new unit tests: 3 function-pointer shapes (variable / multi- + param / multi-pointer) plus a top-level static_assert with a + type-form sizeof in the cond. + +### Architecture decision + +The 1.0.0 release notes called the legacy `structure.ts` path "a +fallback for shapes the new grammar doesn't yet cover". 2.0.0 +formalises the hybrid as the **final** architecture rather than a +transitional one: + +- The grammar covers the common shapes — every variable / function + declaration, every C statement, every val-position construct, + every preprocessor directive, struct / union / enum bodies, + attribute specs in three forms, leading-position function + pointers (new in 2.0). +- The legacy chomp + `structure.ts` post-processor remains as the + safety net for the long tail: top-level `static_assert` (where + the comma-separator clashes with the comma operator inside an + active Pratt expression), K&R `int f(a, b) int a; long b; { … }` + parameter lists, and any complex declarator the dispatcher's + lookahead doesn't accept. +- Both paths produce identical CST: `@looks-simple-decl` decides + which path runs, but the consumer sees one tree shape regardless. + +This matches how production C parsers (GCC, Clang) split between +their LR / handwritten core and special-case handlers for +historic / edge constructs. + +### Tests + +- 293 / 293 pass (89 unit + 100 csmith parse + 100 csmith fixture + + 4 suite scaffolding). + +### Known limitations (legacy chomp+structure path) + +- K&R parameter lists (`int f(a, b) int a; long b; { … }`) — rare + in modern code; csmith never generates them. +- Complex compound declarators beyond simple function pointers + (e.g. `int (*arr[N])(int);` arrays of function pointers, + `int (*(*fpp))(int);` pointer-to-function-pointer). + +## 1.0.0 + +Continues the grammar-driven migration: adds rules for tagged-type +specifiers, attribute specs, top-level preprocessor directives, +top-level GCC `__asm__`, and standalone struct / enum definitions. +Csmith fixtures regenerate against the updated CST shapes — most +tag definitions, attribute placements, and directives now flow +through the grammar instead of the legacy chomp+structure +post-processor. + +### Added + +- `struct_specifier`, `union_specifier`, `enum_specifier` rules + with `member_decl_list` / `struct_declaration` / + `struct_declarator` / `bitfield_width` (struct-with-body and + bitfields), `enumerator_list` / `enumerator` (enum body), and + C23 `enum E : int { … }` fixed-underlying-type support. +- `attribute_spec_gcc` (`__attribute__((…))`), + `attribute_spec_msvc` (`__declspec(…)`), + `attribute_spec_c23` (`[[ … ]]`), with `attribute_item` and + `attribute_argument_list`. Wired as leading specifiers and via + `spec_loop` for between-specifier placements. +- Top-level preprocessor directives: `define_directive` (with + `macro_parameter_list` and `macro_body`), `undef_directive`, + `include_directive` (angled / quoted / macro-form), + `conditional_directive` (#if / #ifdef / #ifndef / #elif / + #elifdef / #elifndef / #else / #endif), `simple_directive` + (#pragma / #error / #warning / #line and unknown directives). + Macro registration / un-registration on `cmeta.macros` happens + synchronously when `#define` / `#undef` parse, and pre-fetched + lookahead tokens are reclassified in place. +- Top-level GCC `__asm__` blocks dispatch into the existing + `asm_statement` rule (added in 0.2.0). +- `static_assert_declaration` grammar rule (used by struct-member + dispatch; top-level dispatch deferred pending comma-operator + gating in `@jsonic/expr`). +- `structureConditionalGroups` moved from `src/structure.ts` to + its own `src/conditional-groups.ts` module — self-contained, + no dependency on the rest of `structure.ts`. + +### Tests + +- 289 / 289 pass (85 unit + 100 csmith parse + 100 csmith fixture + + 4 suite scaffolding). All csmith corpus files now flow + through the grammar for struct definitions, attribute specs, + and preprocessor directives. + +### Known limitations (still on the legacy chomp+structure path) + +- Top-level `static_assert(cond, msg);` — the `,` between cond + and msg conflicts with the comma operator in `C_OP_TABLE`. + Resolving cleanly needs flag-gated suppression of comma-op + inside the static_assert paren context. Struct-member + static_assert is handled by the new path. +- K&R parameter lists (`int f(a, b) int a; long b; { … }`). +- Complex declarators: function pointers, function-returning- + function (`int (*fp)(int);`). + +CST shapes match the legacy chomp+structure output byte-for-byte +for the 100-file csmith corpus (fixtures regenerated). Consumers +that depended on the 0.2.0 CST shape see the same node kinds and +fields; the only differences are in subtle trivia placement and +the path the parser took to produce them. + +## 0.2.0 + +First public release of the grammar-driven parser. + +The parser is now structured as a hybrid: + +- `@jsonic/expr`-driven Pratt expression parsing with custom val + open-alts for C-only constructs (`sizeof ( type )`, cast, + compound literal, `_Generic`, GCC statement-expression, brace + initializer list, adjacent-string concatenation). +- Declarative grammar (in `c-grammar.jsonic`, embedded into + `src/c.ts` at build time) for declarations, function definitions, + and the full statement family (compound, if/else, while, do, + switch, for, labeled, jump, expression, asm, preprocessor-line). +- A legacy `structure.ts` post-processor as a fallback for shapes + the new grammar doesn't yet cover (struct / union / enum + specifiers, attribute specs in three forms, top-level + preprocessor directives, top-level GCC `__asm__`, + `static_assert`, K&R parameter lists, complex declarators). + +Both paths produce the same CST shape, so consumers see one tree +regardless of which path parsed a given external declaration. + +### Added + +- Grammar rules for every variable declaration form (storage class, + multi-keyword type, comma-separated declarators, pointer + array + postfix, function declarator, K&R-empty / `(void)` / + `( ID, …)` / abstract parameter shapes). +- Grammar rules for every C statement: `compound_statement`, + `expression_statement`, `jump_statement` (return / break / + continue / goto), `if_statement` with optional `else`, + `while_statement`, `do_statement`, `switch_statement`, + `for_statement` with `for_controls` / `for_init` / `for_cond` / + `for_iter` slots, `labeled_statement` (`case` / `default` / ID + label), `asm_statement` (qualifiers, template, four + colon-separated sections), `preprocessor_line`. +- Grammar rules for every val-position construct: cast, + compound literal, sizeof type-form, _Alignof, `_Generic`, GCC + statement-expression, brace initializer list with designated + members and indices, adjacent string-literal concatenation, + function calls and subscripts via `@jsonic/expr` paren-preval. +- Recognition of C23 keyword constants `nullptr`, `true`, `false` + as `literal_expression` atoms. +- 100-file CSmith corpus regression test (corpus and gzipped JSON + fixtures committed; `csmith` binary not required at test time). + +### Tests + +- 289 / 289 pass (85 unit + 100 csmith parse + 100 csmith fixture + + 4 suite scaffolding). + +### Known limitations + +- K&R-style parameter declarations and unguarded GCC + `__extern_inline` declarations parse to a `declKind: 'unknown'` + external declaration with the original tokens preserved as + children. +- Compound literals of struct types (`(struct point){ … }`) inside + function bodies are not yet structured as a single + `compound_literal` node; the surrounding declaration falls back + to the legacy chomp. diff --git a/README.md b/README.md index e9237f9..b5a2f08 100644 --- a/README.md +++ b/README.md @@ -49,16 +49,60 @@ positions are preserved on token spans). token. Grammar rules and structuring code reference these names directly. -- **Coarse-grained jsonic grammar** (`src/c.ts`): `translation_unit` - opens an `extdecl_loop` that absorbs tokens into per-declaration - chomp nodes terminating at top-level `;` or `}` (with PP_NEWLINE - for directives). Directive lines get terminated separately so each - `#…` is its own external_declaration. - -- **Recursive-descent structuring** (`src/structure.ts`, - `src/expr.ts`): a post-pass over each chomped token list produces - the structured concrete-syntax tree. Walking depth-first yields the - original tokens in source order. +- **Declarative grammar** (`c-grammar.jsonic`): the rule shapes for + the entire C surface — translation unit, external declarations, + declarators, statements, expressions — live as a Jsonic-DSL + document, embedded at build time into `src/c.ts`. All conditions + and actions are bound to `@`-named refs in the TS plugin, so the + grammar file reads as structural intent and action logic stays + out of it. + +- **Pratt-style expressions** via [`@jsonic/expr`](https://www.npmjs.com/package/@jsonic/expr): + the `val` rule absorbs C atoms (`LIT_INT` / `LIT_FLOAT` / `LIT_CHAR` + / `LIT_STRING` / `ID` / `MACRO_NAME` / `TYPEDEF_NAME` / `KW_NULLPTR` + / `KW_TRUE` / `KW_FALSE`), then `@jsonic/expr`'s pratt logic + drives infix / prefix / suffix operator precedence. Custom val + open-alts handle the C-only constructs that aren't simple + operators: `sizeof ( type )` / cast / compound literal / `_Generic` + / GCC statement-expression / brace initializer list / adjacent + string concatenation. + +- **Conditional-group folding** (`src/conditional-groups.ts`): a + translation-unit-level post-pass that collapses contiguous runs + of `#if`/`#ifdef` … `#elif`/`#else` … `#endif` into a single + `conditional_group` node. Self-contained — operates only on + already-parsed `conditional_directive` nodes. + +- **Hybrid dispatch + legacy fallback** (`src/structure.ts`, + `src/expr.ts`): the `external_declaration` cascading wildcard + alts dispatch to `simple_declaration` (or to typed + preprocessor / asm / static_assert sub-rules) whenever + `@looks-simple-decl` recognises the head; otherwise the chomp + loop falls through to a recursive-descent post-processor in + `structure.ts`. Shapes covered by the new path: + - simple declarations (storage prefix, multi-keyword type, + pointer / array, function declarator, function definition) + - tagged-type specifiers (struct / union / enum, including + standalone definitions and C23 fixed-underlying-type enums) + - attribute specs (GCC / MSVC / C23, leading + between-specs + insertion points) + - top-level preprocessor directives (#define, #include, #if + family, #pragma / #error / #warning / #undef / #line) + - top-level GCC `__asm__` + - all expression and statement forms + + Shapes still on the legacy path: + - K&R parameter lists (`int f(a, b) int a; long b; { … }`) — + rare in modern code; csmith never generates them + - complex compound declarators beyond simple function pointers + (`int (*arr[N])(int);` arrays-of-fn-ptrs, + `int (*(*fpp))(int);` ptr-to-fn-ptr). Plain function pointers + `int (*fp)(int);` and top-level `static_assert(cond, msg);` + moved onto the grammar path in 2.0. + + Both paths produce identical CST shapes; the + `@jsonic/expr`-driven `val` handles initializer expressions in + either case. ## Concrete-syntax shapes @@ -100,7 +144,16 @@ translation_unit expression_statement, asm_statement, preprocessor_line ``` -### Expression shapes (Pratt-parsed, full C precedence) +### Expression shapes (Pratt-parsed via @jsonic/expr) + +Operator precedence is driven by `@jsonic/expr`'s pratt machinery. +The full C operator catalog (11 binary precedence levels, prefix / +suffix unary, ternary, assignment, comma, member access, and the +sizeof / _Alignof prefix forms) is registered as a single +`OpDef`-table at plugin-install time. The val rule absorbs C atoms +via custom open-alts; @jsonic/expr drives the precedence climb; +the `evaluate` callback converts the resulting S-expression into +the per-kind CST shapes below. ``` literal_expression { literalKind, value } @@ -196,6 +249,87 @@ for_controls for_iter { value: | empty } ``` +## Coverage and known limitations + +The parser handles every shape in the CSmith-generated regression +corpus (100 random C programs) plus a hand-curated stress sweep +(GCC `__attribute__`, C23 `nullptr` / `[[nodiscard]]` / `_BitInt`, +nested preprocessor `#if` chains, line-continuation in macro +bodies, function pointers, GCC inline assembly with operand +sections, struct bitfields with anonymous unions, designated and +indexed initialisers). + +Known fall-throughs that produce a `declKind: 'unknown'` external +declaration rather than a structured one (still parseable, source +fidelity preserved): + +- K&R-style parameter declarations (`int f(a, b) int a; long b; { … }`). +- GCC `__extern_inline` declarations gated on a `__USE_EXTERN_INLINES` + feature macro that hasn't been `#define`d. + +The first parse of `(struct point){ … }` (compound literal with a +struct-tagged type) inside a function body is not yet structured — +the struct-tagged type isn't in the new path's `SIMPLE_TYPE_HEAD` +set. Top-level brace initialisers on struct types (`struct point p += { … };`) work because they go through the legacy fallback. + +## Architecture history + +The parser shipped through a 14-phase migration from a pure +chomp-and-post-process design to the current near-pure-grammar +hybrid: + +- **A** install `@jsonic/expr`; `val` accepts C atoms with the + evaluate callback emitting the public CST shapes. +- **B** `simple_declaration` family + statement family — + `block_item` / `statement` / `expression_statement` / + `jump_statement` / `if`/`while`/`do`/`switch`/`for` / + `labeled_statement` / `asm_statement` / `preprocessor_line`. +- **C** `val` open-alts for type-name constructs: + `type_name` / `sizeof_type_form` / `cast_or_compound_literal` / + `initializer_list` (with `designation` / `designator`) / + `generic_selection` / `statement_expression` / `string_atom` / + structured `asm_statement`. +- **D** cutover gates: deep-lookahead body validation + (`fetchDeep()` drives `ctx.lex` directly so the body-supportedness + check walks past the closing `}` of any function body), all + unit tests passing on the new path, csmith fixtures regenerated. + Shipped as `0.2.0`. +- **F** struct / union / enum specifiers + members + bitfields + + enumerators, dispatched from `simple_declaration` / `spec_loop`. +- **G** attribute specs (3 forms × leading + between-specs + insertion points). +- **H** top-level preprocessor directives — define / undef / + include / conditional / pragma / error / warning / line — with + macro registration on `cmeta.macros`, header-name lex-mode + feedback, and the typed sub-rules wrapped under + `external_declaration`. +- **I** top-level GCC `__asm__`. (`static_assert` grammar rule + defined; top-level dispatch deferred pending comma-op gating.) +- **K** `structureConditionalGroups` extracted to its own + module — a self-contained translation-unit-level post-pass. +- **L** standalone struct / enum definitions through grammar + (`@looks-simple-decl` walks past tagged-type bodies). +- **N** ship `1.0.0`. +- **P** parenthesised sub-declarators (function pointers): + `paren_inner_declarator` rule + `@looks-simple-decl` paren-walk + branch. Shapes like `int (*fp)(int);` and + `typedef int (*Fn)(int);` flow through the grammar. +- **O** vendor `@jsonic/expr` under `vendor/jsonic-expr/` and + add a `n.no_comma_op` bail in `val.close` / `expr.close` that + matches the comma op by src. Top-level `static_assert(cond, msg)` + dispatches into the existing `static_assert_declaration` rule + with the flag set, so the `,` lands as a separator instead of + the comma operator. +- **N₂** ship `2.0.0` declaring the hybrid as the final + architecture. + +The legacy chomp + `structureExternalDeclaration` fallback +remains by design for the long-tail shapes — K&R parameter lists +and complex compound declarators beyond simple function pointers. +Both paths emit identical CST nodes, so consumers see one tree +regardless of which path produced it. + ## License MIT. Copyright (c) 2026 Richard Rodger and contributors. diff --git a/c-grammar.jsonic b/c-grammar.jsonic new file mode 100644 index 0000000..3d85776 --- /dev/null +++ b/c-grammar.jsonic @@ -0,0 +1,1682 @@ +# C parser grammar (declarative) +# +# Parsed by a vanilla Jsonic instance and passed to jsonic.grammar(). The +# rule skeleton lives here; all conditions and actions are bound to +# @-named refs supplied by ../src/c.ts so the structural intent of the +# grammar is readable without TypeScript noise. +# +# Token sets, lex matchers, and option flags (lex pipeline disable, +# IGNORE membership for trivia, etc.) are configured in c.ts before +# this grammar is loaded — putting them here would make the grammar +# self-modifying (it depends on the same dynamic ANY_C_TOKEN set it +# would define). +# +# Conventions: +# '@-bo' state action: before-open (auto-installed) +# '@-ao' state action: after-open +# '@-bc' state action: before-close +# '@-ac' state action: after-close +# '@' alt-level action / condition + +{ + rule: { + + # translation_unit + # bo: create the root node + # open: empty input → bail; else descend into extdecl_loop + # bc: fold #if … #endif sequences into conditional_group nodes + # close: end on EOF + translation_unit: { + open: [ + { s: '#ZZ' b: 1 g: 'tu-empty' } + { p: 'extdecl_loop' g: 'tu-loop' } + ] + close: [ + { s: '#ZZ' g: 'tu-end' } + ] + } + + # extdecl_loop + # r.node is inherited from translation_unit. bc pushes the + # completed external_declaration child onto translation_unit + # before deciding to recurse. + extdecl_loop: { + open: [ + { p: 'external_declaration' g: 'loop-one' } + ] + close: [ + { s: '#ZZ' b: 1 g: 'loop-end' } + { r: 'extdecl_loop' g: 'loop-more' } + ] + } + + # external_declaration + # + # Phase B1 dispatch: if the head token is a recognised simple type + # specifier (currently only KW_INT, broadens later), descend into + # int_declaration which parses through proper grammar (with val + # for initializers via @jsonic/expr). Otherwise fall through to + # the legacy chomp path that absorbs tokens for post-process + # structuring. + external_declaration: { + open: [ + { s: '#ZZ' b: 1 g: 'extdecl-eof' } + # Phase H: PP_HASH dispatches to preprocessor_directive. + { s: 'PP_HASH PP_HASH' c: '@is-first-iter' b: 2 + p: 'preprocessor_directive' a: '@mark-new-path' + g: 'extdecl-pp-2' } + { s: 'PP_HASH #ANY_C_TOKEN' c: '@is-first-iter' b: 2 + p: 'preprocessor_directive' a: '@mark-new-path' + g: 'extdecl-pp' } + # Phase O: top-level static_assert dispatches into the + # static_assert_declaration grammar rule. The cond / msg + # vals are pushed with n.no_comma_op set so the vendored + # @jsonic/expr's expr.close bails at `,` rather than + # treating it as the comma operator. + { s: 'KW_STATIC_ASSERT' c: '@is-first-iter' b: 1 + p: 'static_assert_declaration' a: '@mark-new-path' + g: 'extdecl-sa' } + { s: 'KW__STATIC_ASSERT' c: '@is-first-iter' b: 1 + p: 'static_assert_declaration' a: '@mark-new-path' + g: 'extdecl-sa-1' } + # Phase I.2: top-level GCC __asm__ block. + { s: 'KW_ASM' c: '@is-first-iter' b: 1 + p: 'asm_statement' a: '@mark-new-path' + g: 'extdecl-asm' } + { s: 'KW___ASM' c: '@is-first-iter' b: 1 + p: 'asm_statement' a: '@mark-new-path' + g: 'extdecl-asm-1' } + { s: 'KW___ASM__' c: '@is-first-iter' b: 1 + p: 'asm_statement' a: '@mark-new-path' + g: 'extdecl-asm-2' } + # Phase B2.3 dispatch: cascading wildcard-token alts. Each one + # matches a fixed number of tokens to force lookahead, then the + # @looks-simple-decl cond validates the actual shape — optional + # storage prefix, 1+ simple type specifiers, an ID, and a `;` or + # `=` terminator. b: N back-steps all matched tokens so + # simple_declaration sees them as t0..t(N-1). + # Longest alts first so multi-keyword forms win over shorter + # shapes that would have stopped at the wrong ID. + # Gate: only on the first iteration of an external_declaration + # so the chomp's r:-recursion doesn't re-fire mid-declaration. + { s: '#ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN' + c: '@looks-simple-decl' b: 6 + p: 'simple_declaration' a: '@mark-new-path' g: 'extdecl-new-decl-6' } + { s: '#ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN' + c: '@looks-simple-decl' b: 5 + p: 'simple_declaration' a: '@mark-new-path' g: 'extdecl-new-decl-5' } + { s: '#ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN' + c: '@looks-simple-decl' b: 4 + p: 'simple_declaration' a: '@mark-new-path' g: 'extdecl-new-decl-4' } + { s: '#ANY_C_TOKEN #ANY_C_TOKEN #ANY_C_TOKEN' + c: '@looks-simple-decl' b: 3 + p: 'simple_declaration' a: '@mark-new-path' g: 'extdecl-new-decl-3' } + { s: '#ANY_C_TOKEN' a: '@absorb-token' g: 'extdecl-tok' } + ] + close: [ + { c: '@new-path' a: '@finalize-new-path' g: 'extdecl-new-end' } + { s: '#ZZ' b: 1 a: '@finalize-extdecl' g: 'extdecl-finish-eof' } + { c: '@just-closed-and-decl-ahead' a: '@finalize-extdecl' g: 'extdecl-finish-block' } + { c: '@terminated' a: '@finalize-extdecl' g: 'extdecl-finish' } + { r: 'external_declaration' g: 'extdecl-more' } + ] + } + + # simple_declaration (phase B2: single-keyword type + ID + optional init) + # + # Recognises: ID (= val)? ; + # Initializer expressions descend into val (which @jsonic/expr's + # plugin install has wired up for full C precedence). + # + # Output: a CST node of kind 'declaration' with declaredName set, + # children laid out as + # [declaration_specifiers, init_declarator_list, ';'] + # simple_declaration (phase B2: any-length specifier list + + # comma-separated init-declarator-list) + # + # Recognises: + # ? + (, )* ; + # where each init_declarator is `ID (= val)?`. Initializer + # expressions descend into val (which @jsonic/expr's plugin install + # has wired up for full C precedence). + simple_declaration: { + open: [ + # Phase G: leading attribute specs dispatch into spec_loop + # which then consumes the attribute-spec sub-rule plus any + # following storage / type / tagged specifiers. + { s: 'PUNC_LBRACKET PUNC_LBRACKET' c: '@as23-adjacent-open' + b: 2 p: 'spec_loop' g: 'simple-decl-attr-c23' } + { s: 'KW___ATTRIBUTE__' b: 1 p: 'spec_loop' + g: 'simple-decl-attr-gcc' } + { s: 'KW___ATTRIBUTE' b: 1 p: 'spec_loop' + g: 'simple-decl-attr-gcc-1' } + { s: 'KW___DECLSPEC' b: 1 p: 'spec_loop' + g: 'simple-decl-attr-msvc' } + { s: '#STORAGE_PREFIX' a: '@absorb-spec-storage' p: 'spec_loop' + g: 'simple-decl-storage' } + # Tagged-type heads dispatch into struct_specifier / + # enum_specifier (phase F.5). spec_loop's bc relays the + # returned node onto u.specs. + { s: 'KW_STRUCT' b: 1 p: 'struct_specifier' + g: 'simple-decl-struct' } + { s: 'KW_UNION' b: 1 p: 'struct_specifier' + g: 'simple-decl-union' } + { s: 'KW_ENUM' b: 1 p: 'enum_specifier' + g: 'simple-decl-enum' } + { s: '#SIMPLE_TYPE_HEAD' a: '@absorb-spec-type' p: 'spec_loop' + g: 'simple-decl-type' } + ] + close: [ + # Function-definition completion: compound_statement returned + # and rule.u.fnBody is set — finalise as function_definition. + { c: '@fn-body-done' a: '@simple-decl-finalize-fn' + g: 'simple-decl-fn-end' } + # Function-definition body: after init_declarator captures + # the function declarator, `{` opens the body. Push + # compound_statement to absorb it; on return @fn-body-done + # above fires. + { s: 'PUNC_LBRACE' b: 1 p: 'compound_statement' + a: '@simple-decl-start-fn-body' + g: 'simple-decl-fn-body' } + # First declarator (after specs). Backstep the head token so + # init_declarator's open sees it; descend into the sub-rule. + # ID head: plain declarator. STAR head: pointer prefix. + # LPAREN: function postfix on a (rare) parenthesised + # subdeclarator — let the chomp handle that complex case. + { s: 'ID' b: 1 p: 'init_declarator' g: 'simple-decl-first-decl' } + { s: 'PUNC_STAR' b: 1 p: 'init_declarator' g: 'simple-decl-first-decl-ptr' } + # Phase P: parenthesised sub-declarator (function pointer). + # Shape: `+ ( * ID ) ( ? ) ;` (or = init). + # @looks-simple-decl's paren-walk has already validated the + # shape; here we backstep `(` so init_declarator's open sees + # it and dispatches into paren_inner_declarator. + { s: 'PUNC_LPAREN' b: 1 p: 'init_declarator' + g: 'simple-decl-first-decl-paren' } + # Subsequent declarators after a comma. + { s: 'PUNC_COMMA' a: '@simple-decl-take-comma' p: 'init_declarator' + g: 'simple-decl-comma' } + # End of declaration (variable form). + { s: 'PUNC_SEMI' a: '@simple-decl-finalize' g: 'simple-decl-end' } + ] + } + + # spec_loop: absorbs zero or more specifier keywords (and tagged + # specifiers — struct / union / enum) and ends when the next token + # isn't another specifier. r.node is inherited from + # simple_declaration; @absorb-spec-* push refs into the + # declaration_specifiers scaffolding the parent rule set up in + # @simple_declaration-bo. Tagged specifiers are dispatched into + # their own sub-rules (struct_specifier / enum_specifier); + # @spec_loop-bc stitches the returned specifier node onto the + # owning declaration_specifiers list. + spec_loop: { + open: [ + # Phase G: attribute specs interleave freely with simple + # specifiers and tagged-type heads. + { s: 'PUNC_LBRACKET PUNC_LBRACKET' c: '@as23-adjacent-open' + b: 2 p: 'attribute_spec_c23' g: 'spec-loop-attr-c23' } + { s: 'KW___ATTRIBUTE__' b: 1 p: 'attribute_spec_gcc' + g: 'spec-loop-attr-gcc' } + { s: 'KW___ATTRIBUTE' b: 1 p: 'attribute_spec_gcc' + g: 'spec-loop-attr-gcc-1' } + { s: 'KW___DECLSPEC' b: 1 p: 'attribute_spec_msvc' + g: 'spec-loop-attr-msvc' } + { s: '#SIMPLE_TYPE_HEAD' a: '@absorb-spec-type' g: 'spec-loop-type' } + { s: 'KW_STRUCT' b: 1 p: 'struct_specifier' g: 'spec-loop-struct' } + { s: 'KW_UNION' b: 1 p: 'struct_specifier' g: 'spec-loop-union' } + { s: 'KW_ENUM' b: 1 p: 'enum_specifier' g: 'spec-loop-enum' } + # If the next token isn't a specifier, fall through without + # consuming so the parent can pick up the declarator. + { s: [] g: 'spec-loop-empty' } + ] + close: [ + { s: 'PUNC_LBRACKET PUNC_LBRACKET' c: '@as23-adjacent-open' + b: 2 p: 'attribute_spec_c23' g: 'spec-loop-more-attr-c23' } + { s: 'KW___ATTRIBUTE__' b: 1 p: 'attribute_spec_gcc' + g: 'spec-loop-more-attr-gcc' } + { s: 'KW___ATTRIBUTE' b: 1 p: 'attribute_spec_gcc' + g: 'spec-loop-more-attr-gcc-1' } + { s: 'KW___DECLSPEC' b: 1 p: 'attribute_spec_msvc' + g: 'spec-loop-more-attr-msvc' } + { s: '#SIMPLE_TYPE_HEAD' b: 1 r: 'spec_loop' g: 'spec-loop-more' } + { s: 'KW_STRUCT' b: 1 p: 'struct_specifier' g: 'spec-loop-more-struct' } + { s: 'KW_UNION' b: 1 p: 'struct_specifier' g: 'spec-loop-more-union' } + { s: 'KW_ENUM' b: 1 p: 'enum_specifier' g: 'spec-loop-more-enum' } + { s: [] g: 'spec-loop-end' } + ] + } + + # init_declarator: pointer* ID (= val)? + # Each invocation builds its own init_declarator node. The + # parent simple_declaration's bc pushes it onto the + # init_declarator_list when the sub-rule completes. + # + # The rule re-enters itself once via r: after capturing the ID so + # the close state can run a second time to look for `=`. r.k.named + # latches across that recursion; the gate alt at the top of open + # accepts the re-entry without consuming any tokens. + init_declarator: { + open: [ + # Re-entry after the ID was captured: skip open, fall through + # to close to handle `=` / array postfix / end. + { c: '@idecl-named' s: [] g: 'idecl-reentry' } + # Pointer prefix: back-step the `*`, descend into pointer_list + # which absorbs all the leading `*` tokens. + { s: 'PUNC_STAR' b: 1 p: 'pointer_list' g: 'idecl-ptrs' } + # Phase P: parenthesised sub-declarator. Capture the LPAREN + # onto the outer direct_declarator and descend into + # paren_inner_declarator, which builds an inner declarator + # node and attaches it to the outer direct_declarator before + # returning at `)`. + { s: 'PUNC_LPAREN' a: '@idecl-paren-open' + p: 'paren_inner_declarator' g: 'idecl-paren' } + # No pointer prefix, ID directly. + { s: 'ID' a: '@idecl-name' r: 'init_declarator' g: 'idecl-id' } + ] + close: [ + # Returning from paren_inner_declarator: consume the matching + # `)` and finalise the outer declarator. Then r:-recurse so + # the rest of close() can take any trailing postfix + # (function postfix for fn-pointers, array postfix for + # arrays of fn-pointers). + { s: 'PUNC_RPAREN' c: '@idecl-paren-pending' + a: '@idecl-paren-close' r: 'init_declarator' + g: 'idecl-paren-rparen' } + # Returning from pointer_list, capture the ID, then re-enter + # to check for postfix / initializer. + { s: 'ID' a: '@idecl-name' r: 'init_declarator' g: 'idecl-id-after-ptrs' } + # Array postfix `[ … ]` (one or more dimensions). Each one + # re-enters init_declarator so additional postfixes can stack. + { s: 'PUNC_LBRACKET' b: 1 p: 'array_postfix' + r: 'init_declarator' g: 'idecl-arr' } + # Function postfix `( … )` for function declarators. Re-enters + # init_declarator so trailing `[…]` (function returning array) + # or further postfixes can stack — though for now phase B3.1 + # only exercises ` ID ( … ) ;`. + { s: 'PUNC_LPAREN' b: 1 p: 'function_postfix' + r: 'init_declarator' g: 'idecl-fn' } + { s: 'PUNC_ASSIGN' p: 'val' a: '@idecl-take-eq' g: 'idecl-eq' } + { s: [] g: 'idecl-end' } + ] + } + + # paren_inner_declarator (phase P): builds an inner declarator + # node for a parenthesised sub-declarator (function-pointer + # form). Mirrors init_declarator's pointer + ID + postfix logic + # but without `=` initializer handling, and stops at (without + # consuming) the matching `)` so the outer init_declarator can + # take it. The inner declarator is attached to the outer's + # direct_declarator by @pid-name. + paren_inner_declarator: { + open: [ + # Re-entry after the ID was captured: skip open. + { c: '@pid-named' s: [] g: 'pid-reentry' } + # Pointer prefix. + { s: 'PUNC_STAR' b: 1 p: 'pointer_list' g: 'pid-ptrs' } + # No pointer prefix, ID directly (rare but legal: `int (fp)(…)`). + { s: 'ID' a: '@pid-name' r: 'paren_inner_declarator' g: 'pid-id' } + ] + close: [ + # After pointer_list returns, capture the ID then re-enter. + { s: 'ID' a: '@pid-name' r: 'paren_inner_declarator' + g: 'pid-id-after-ptrs' } + # Stop before `)` so the outer init_declarator's close can + # consume it. + { s: 'PUNC_RPAREN' b: 1 g: 'pid-end-rparen' } + { s: [] g: 'pid-end' } + ] + } + + # array_postfix: `[ const-expr? ]` + # Inner expression is parsed via val (currently limited to forms + # @jsonic/expr handles; complex constant expressions involving + # casts will land cleanly once phase C lifts cast handling). + array_postfix: { + open: [ + { s: 'PUNC_LBRACKET' a: '@arr-open' g: 'arr-open' } + ] + close: [ + { s: 'PUNC_RBRACKET' a: '@arr-close' g: 'arr-end-empty' } + { p: 'val' g: 'arr-size' } + ] + } + + # pointer_list: absorbs one or more `*` tokens. Pushes a + # pointer node per `*` onto the parent init_declarator's + # declarator children. + pointer_list: { + open: [ + { s: 'PUNC_STAR' a: '@absorb-pointer' g: 'ptr' } + ] + close: [ + { s: 'PUNC_STAR' b: 1 r: 'pointer_list' g: 'ptr-more' } + { s: [] g: 'ptr-end' } + ] + } + + # function_postfix: `( )` after the declarator name. + # Phase B3.1 covers the simplest forms: empty `()`, explicit + # `(void)`, and one or more concrete parameter declarations. + function_postfix: { + open: [ + { s: 'PUNC_LPAREN' a: '@fn-open' g: 'fn-open' } + ] + close: [ + # Empty parameter list: `()`. + { s: 'PUNC_RPAREN' a: '@fn-close' g: 'fn-end-empty' } + # Otherwise descend into the parameter list, then re-enter + # close (where the matching `)` is consumed). + { p: 'parameter_type_list' g: 'fn-params' } + ] + } + + # parameter_type_list: 1+ comma-separated parameter_declarations. + parameter_type_list: { + open: [ + { p: 'parameter_declaration' g: 'ptl-first' } + ] + close: [ + { s: 'PUNC_COMMA' a: '@ptl-comma' p: 'parameter_declaration' + g: 'ptl-more' } + { s: 'PUNC_RPAREN' b: 1 a: '@ptl-attach-and-end' g: 'ptl-end' } + ] + } + + # parameter_declaration: + ID? — declaration_specifiers and + # an optional declarator name. `void` alone is the C convention + # for "no parameters" and is captured here as a single-spec + # parameter. + parameter_declaration: { + open: [ + # Re-entry after a pointer prefix was absorbed: skip the + # type-spec dispatch and fall through to close to handle + # additional `*` or the ID. + { c: '@param-reentered' s: [] g: 'param-reentry' } + { s: '#SIMPLE_TYPE_HEAD' a: '@param-spec' p: 'param_spec_loop' + g: 'param-type' } + ] + close: [ + # Pointer prefix on the parameter declarator. Each `*` becomes + # a pointer node on the declarator; we recurse to keep + # absorbing more `*` and finally the optional ID. + { s: 'PUNC_STAR' a: '@param-pointer' + r: 'parameter_declaration' g: 'param-ptr' } + { s: 'ID' a: '@param-name' g: 'param-id' } + { s: [] g: 'param-end' } + ] + } + + # param_spec_loop: zero or more additional type specifiers in a + # parameter's spec list. + param_spec_loop: { + open: [ + { s: '#SIMPLE_TYPE_HEAD' a: '@param-spec' g: 'param-spec-more' } + { s: [] g: 'param-spec-empty' } + ] + close: [ + { s: '#SIMPLE_TYPE_HEAD' b: 1 r: 'param_spec_loop' g: 'param-spec-loop' } + { s: [] g: 'param-spec-end' } + ] + } + + # compound_statement: `{ … }` + # Phase B3.3+B4.2.1 wires this as a structured block: each item + # between the opening and closing braces is dispatched into the + # block_item sub-rule (declaration | statement). The `-bc` hook + # stitches each returned item onto compound_statement.children + # before re-entering the close loop. + compound_statement: { + open: [ + { s: 'PUNC_LBRACE' a: '@cs-open' g: 'cs-open' } + ] + close: [ + # Closing `}` — finalise. + { s: 'PUNC_RBRACE' a: '@cs-close' g: 'cs-end' } + # Any other token: dispatch to block_item. After block_item + # returns, close re-evaluates and we either match `}` or + # dispatch the next item. + { s: '#ANY_C_TOKEN' b: 1 p: 'block_item' g: 'cs-item' } + ] + } + + # ---- statement-level rules (phase B4.2, unwired) ---------------- + # + # block_item, statement, expression_statement, and jump_statement + # are defined here in the shapes the legacy `structure.ts` post- + # process produces today (see parseBlockItem / parseStatement / + # parseJumpStatement / parseExpressionStatement). They are NOT yet + # reachable from compound_statement — that rewiring lands together + # with phase B3.3 (function definitions) and a gate that picks + # function bodies the new grammar can fully cover. + # + # Defining the rule shapes now (without wiring) lets the next + # phase focus on the gate logic + the cutover, rather than also + # designing rule shapes under deadline pressure. + + # block_item: declaration | statement. + # Dispatches on the head token: a recognised type-spec head + # (storage class, simple type keyword, typedef-name) goes through + # simple_declaration; anything else is a statement. + block_item: { + open: [ + { s: '#STORAGE_PREFIX' b: 1 p: 'simple_declaration' g: 'bi-decl-storage' } + { s: '#SIMPLE_TYPE_HEAD' b: 1 p: 'simple_declaration' g: 'bi-decl-type' } + { p: 'statement' g: 'bi-stmt' } + ] + close: [ + { s: [] g: 'bi-end' } + ] + } + + # statement: dispatch on head token. + # Phase B4.2.1 covers expression_statement, jump_statement, the + # empty `;` statement, and nested compound_statement. + # Phase B4.2.2 adds if/while/do/switch (paren-condition statements). + # Phase B4.2.3 adds for_statement and labeled_statement (case / + # default / ID-label). + # Phase B4.2.4+ extends with asm/preprocessor. + statement: { + open: [ + # Nested block: `{ … }` + { s: 'PUNC_LBRACE' b: 1 p: 'compound_statement' g: 'stmt-cs' } + # Empty statement: `;` + { s: 'PUNC_SEMI' a: '@stmt-empty' g: 'stmt-empty' } + # Selection / iteration statements (paren-condition) + { s: 'KW_IF' b: 1 p: 'if_statement' g: 'stmt-if' } + { s: 'KW_WHILE' b: 1 p: 'while_statement' g: 'stmt-while' } + { s: 'KW_DO' b: 1 p: 'do_statement' g: 'stmt-do' } + { s: 'KW_SWITCH' b: 1 p: 'switch_statement' g: 'stmt-switch' } + { s: 'KW_FOR' b: 1 p: 'for_statement' g: 'stmt-for' } + # Labeled statements + { s: 'KW_CASE' b: 1 p: 'labeled_statement' g: 'stmt-case' } + { s: 'KW_DEFAULT' b: 1 p: 'labeled_statement' g: 'stmt-default' } + { s: 'ID PUNC_COLON' b: 2 p: 'labeled_statement' g: 'stmt-label' } + # Jump statements + { s: 'KW_RETURN' b: 1 p: 'jump_statement' g: 'stmt-return' } + { s: 'KW_BREAK' b: 1 p: 'jump_statement' g: 'stmt-break' } + { s: 'KW_CONTINUE' b: 1 p: 'jump_statement' g: 'stmt-continue' } + { s: 'KW_GOTO' b: 1 p: 'jump_statement' g: 'stmt-goto' } + # GCC inline asm + { s: 'KW_ASM' b: 1 p: 'asm_statement' g: 'stmt-asm' } + { s: 'KW___ASM' b: 1 p: 'asm_statement' g: 'stmt-asm-1' } + { s: 'KW___ASM__' b: 1 p: 'asm_statement' g: 'stmt-asm-2' } + # Preprocessor line inside a function body (rare but legal). + { s: 'PP_HASH' b: 1 p: 'preprocessor_line' g: 'stmt-pp' } + # Expression statement (default fallthrough) + { p: 'expression_statement' g: 'stmt-expr' } + ] + close: [ + { s: [] g: 'stmt-end' } + ] + } + + # expression_statement: `;` + # Descends into val (the @jsonic/expr-driven expression rule) and + # then takes the trailing `;`. Empty `;` is handled by statement's + # PUNC_SEMI alt before this rule is entered. + expression_statement: { + open: [ + { p: 'val' a: '@es-take-expr' g: 'es-expr' } + ] + close: [ + { s: 'PUNC_SEMI' a: '@es-finalize' g: 'es-end' } + ] + } + + # jump_statement: + # return ? ; + # break ; + # continue ; + # goto ID ; + # The keyword sets jumpKind on the node; close-state alts decide + # whether to take a label (goto), an expression (return), or just + # the trailing `;`. r: re-enters so the post-label / post-expr + # close pass can match `;`. + jump_statement: { + open: [ + { c: '@js-reentry' s: [] g: 'js-reentry' } + { s: 'KW_RETURN' a: '@js-take-keyword' g: 'js-return' } + { s: 'KW_BREAK' a: '@js-take-keyword' g: 'js-break' } + { s: 'KW_CONTINUE' a: '@js-take-keyword' g: 'js-continue' } + { s: 'KW_GOTO' a: '@js-take-keyword' g: 'js-goto' } + ] + close: [ + { s: 'PUNC_SEMI' a: '@js-finalize' g: 'js-end' } + { c: '@js-needs-label' s: 'ID' a: '@js-take-label' + r: 'jump_statement' g: 'js-take-label' } + { c: '@js-needs-expr' p: 'val' a: '@js-take-expr' g: 'js-take-expr' } + ] + } + + # paren_condition: `( )` + # Used inside if/while/do/switch as the controlling expression + # wrapper. The legacy CST exposes the parens as concrete tokens + # alongside the expression child; this rule preserves that. + paren_condition: { + open: [ + { s: 'PUNC_LPAREN' a: '@pc-open' g: 'pc-open' } + ] + close: [ + { s: 'PUNC_RPAREN' a: '@pc-close' g: 'pc-end' } + { p: 'val' a: '@pc-take-expr' g: 'pc-expr' } + ] + } + + # if_statement: `if ( cond ) then-stmt (else else-stmt)?` + # Multi-stage close: first take paren_condition, then the then- + # branch (any statement), then optionally `else` + else-branch. + if_statement: { + open: [ + { s: 'KW_IF' a: '@if-take-keyword' g: 'if-kw' } + ] + close: [ + { c: '@if-needs-cond' s: 'PUNC_LPAREN' b: 1 + p: 'paren_condition' g: 'if-cond' } + { c: '@if-needs-then' p: 'statement' g: 'if-then' } + { c: '@if-needs-else-kw' s: 'KW_ELSE' a: '@if-take-else-kw' + g: 'if-else-kw' } + { c: '@if-needs-else-body' p: 'statement' g: 'if-else-body' } + { s: [] g: 'if-end' } + ] + } + + # while_statement: `while ( cond ) body` + while_statement: { + open: [ + { s: 'KW_WHILE' a: '@while-take-keyword' g: 'while-kw' } + ] + close: [ + { c: '@while-needs-cond' s: 'PUNC_LPAREN' b: 1 + p: 'paren_condition' g: 'while-cond' } + { c: '@while-needs-body' p: 'statement' g: 'while-body' } + { s: [] g: 'while-end' } + ] + } + + # do_statement: `do body while ( cond ) ;` + do_statement: { + open: [ + { s: 'KW_DO' a: '@do-take-keyword' g: 'do-kw' } + ] + close: [ + { c: '@do-needs-body' p: 'statement' g: 'do-body' } + { c: '@do-needs-while' s: 'KW_WHILE' a: '@do-take-while' + g: 'do-while-kw' } + { c: '@do-needs-cond' s: 'PUNC_LPAREN' b: 1 + p: 'paren_condition' g: 'do-cond' } + { c: '@do-needs-semi' s: 'PUNC_SEMI' a: '@do-take-semi' + g: 'do-end' } + { s: [] g: 'do-fallthrough' } + ] + } + + # switch_statement: `switch ( ctrl ) body` + switch_statement: { + open: [ + { s: 'KW_SWITCH' a: '@switch-take-keyword' g: 'switch-kw' } + ] + close: [ + { c: '@switch-needs-cond' s: 'PUNC_LPAREN' b: 1 + p: 'paren_condition' g: 'switch-cond' } + { c: '@switch-needs-body' p: 'statement' g: 'switch-body' } + { s: [] g: 'switch-end' } + ] + } + + # ---- for_statement family (phase B4.2.3) ------------------------ + # + # for_statement `for ( init ; cond ; iter ) body` + # for_controls the `( … )` wrapper, with three slots + # for_init { value: declaration | | empty } + # for_cond { value: | empty } + # for_iter { value: | empty } + # + # The init slot can be a full declaration (which terminates with + # its own `;`) or an expression (in which case for_init takes the + # trailing `;` itself). The cond and iter slots are pure + # expressions; cond ends with `;`, iter ends at the closing `)` + # which for_controls then consumes. + + for_statement: { + open: [ + { s: 'KW_FOR' a: '@for-take-keyword' g: 'for-kw' } + ] + close: [ + { c: '@for-needs-controls' s: 'PUNC_LPAREN' b: 1 + p: 'for_controls' g: 'for-controls' } + { c: '@for-needs-body' p: 'statement' g: 'for-body' } + { s: [] g: 'for-end' } + ] + } + + for_controls: { + open: [ + { s: 'PUNC_LPAREN' a: '@fc-open' p: 'for_init' g: 'fc-open' } + ] + close: [ + { c: '@fc-needs-cond' p: 'for_cond' g: 'fc-cond' } + { c: '@fc-needs-iter' p: 'for_iter' g: 'fc-iter' } + { s: 'PUNC_RPAREN' a: '@fc-close' g: 'fc-end' } + ] + } + + for_init: { + open: [ + # Empty init: bare `;` + { s: 'PUNC_SEMI' a: '@fi-empty-take-semi' g: 'fi-empty' } + # Declaration init (declaration eats its own trailing `;`) + { s: '#STORAGE_PREFIX' b: 1 p: 'simple_declaration' + a: '@fi-mark-decl' g: 'fi-decl-storage' } + { s: '#SIMPLE_TYPE_HEAD' b: 1 p: 'simple_declaration' + a: '@fi-mark-decl' g: 'fi-decl-type' } + # Expression init: take expression then `;` + { p: 'val' a: '@fi-mark-expr' g: 'fi-expr' } + ] + close: [ + { c: '@fi-needs-semi' s: 'PUNC_SEMI' a: '@fi-take-semi' g: 'fi-semi' } + { s: [] g: 'fi-end' } + ] + } + + for_cond: { + open: [ + # Empty cond: bare `;` + { s: 'PUNC_SEMI' a: '@fcond-empty-take-semi' g: 'fcond-empty' } + # Expression cond: take expression then `;` + { p: 'val' a: '@fcond-mark-expr' g: 'fcond-expr' } + ] + close: [ + { c: '@fcond-needs-semi' s: 'PUNC_SEMI' + a: '@fcond-take-semi' g: 'fcond-semi' } + { s: [] g: 'fcond-end' } + ] + } + + for_iter: { + open: [ + # Empty iter: backstep the `)` so for_controls can take it. + { s: 'PUNC_RPAREN' b: 1 a: '@fiter-empty' g: 'fiter-empty' } + # Expression iter: take expression up to `)`. + { p: 'val' a: '@fiter-mark-expr' g: 'fiter-expr' } + ] + close: [ + { s: [] g: 'fiter-end' } + ] + } + + # ---- asm_statement (phase B4.2.4, opaque-token form) ------------ + # + # GCC inline asm: `__asm__ volatile? goto? ( template : … ) ;`. + # Phase B4.2.4 captures the whole statement as a flat token-list + # under an asm_statement node — qualifiers / template / operand + # sections are NOT yet broken out (that's a follow-up). The shape + # is enough to unblock the body-supportedness gate. + # asm_statement (phase C.8 — structured form): + # * (