Move K&R function definitions onto the grammar path#4
Conversation
K&R-style definitions (`int f(a, b) int a; long b; { ... }`) used to
fall through to the legacy chomp, where the chomp's top-level `;`
terminator fragmented them into multiple `declKind: 'unknown'`
external declarations. The validator now accepts the shape, the
grammar dispatches into a new `kr_declaration_list` rule between the
parameter-list `)` and the body `{`, and the result is a single
structured `function_definition` external declaration. The
declaration-list child preserves the source as flat token refs to
match the legacy CST shape.
https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Phase P landed simple function-pointer declarators on the grammar but left the complex shapes — arrays of fn-pointers, fn returning ptr-to- array, nested paren-forms, leading-pointer types with paren-form declarators — on the legacy chomp + structure.ts path. Worse, the validator silently accepted `int (*arr[3])(int);` while the grammar emitted a structurally wrong CST (the inner `[3]` postfix sat as a sibling of the inner declarator instead of inside it). This change extends the grammar so all four shapes parse correctly: int (*arr[3])(int); // inner array postfix on inner DD int (*get())[10]; // inner function postfix on inner DD int (*(*fpp))(int); // nested paren-form (recursive PID) char *(*foo[3])(int); // leading-pointer-type with paren-form `paren_inner_declarator` now dispatches `array_postfix` / `function_postfix` for inner postfixes (they attach to its own direct_declarator via rule.parent.k.directDeclarator), recurses into itself for nested paren-forms, and tracks paren-pending state separately from `init_declarator`'s. `@pid-paren-close` performs the declarator-attachment that `@pid-name` does for non-nested PIDs. `init_declarator` close gains a paren-form alt gated on `!named` so the leading-pointer-type case routes here instead of falling into function_postfix. The validator factors out a `walkParenFormDeclarator` helper that recursively validates the new shapes. https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Plain-C was already 100% covered by the grammar path; the chomp + structureExternalDeclaration safety net was reachable only for malformed input, and extended-mode (preprocessor, GCC __attribute__ / __asm__, MSVC __declspec, in-body #-lines, etc.) depended on the legacy post-process for anything the grammar didn't structure. Per the upcoming clean-slate rewrite of extended mode, both go. Removed: - src/structure.ts (~1975 lines), src/conditional-groups.ts, src/expr.ts (the latter two were already dead code in the grammar path) - The chomp wildcard alt + finalize-extdecl close alts in external_declaration; the wildcard cascade and @looks-simple-decl validator - All extension grammar rules: preprocessor_directive / define_directive / undef_directive / include_directive / conditional_directive / simple_directive / macro_parameter_list / macro_body / header_form / preprocessor_line, asm_statement / asm_template / asm_section / asm_operand / asm_clobber / asm_label_ref, attribute_spec_gcc / attribute_spec_msvc - COptions.extended, EXTENSION_RULES stripping, @extended-on / @extended-off / @plain-and-first-iter / @ext-and-first-iter / @plain-as23-and-first / @new-path / @mark-new-path / @absorb-token / @finalize-extdecl / @terminated / @just-closed-and-decl-ahead - isFunctionBodySupported, fetchDeep, walkParenFormDeclarator, skipLeadingAttributes, skipTaggedSpec, UNSUPPORTED_BODY_TOKENS, plus the registerTypedefIfApplicable / finalizeExternalDeclaration / registerMacrosFromTree / firstNonTriviaIs / startsNewExternalDeclaration helpers and the legacy declarator-walk helpers (findDeclaredName, splitDeclarators, declaratorPart, findSpecBoundary, isSpecifierKw, matchClose) plus the unused TYPE_SPEC_KEYWORD_NAMES / STORAGE_CLASS_NAMES / etc. classification sets - csmith corpus + fixture tests + generator - test/spec/path-dispatch.tsv (single path → no dispatch to track) - ~24 extension-feature tests in c.test.ts (preprocessor, GCC asm, GCC __attribute__, MSVC __declspec, conditional_group, macro tagging in #define bodies, ...) and the viaPath assertions external_declaration's open now dispatches statically: KW_STATIC_ASSERT / KW__STATIC_ASSERT into static_assert_declaration, #SIMPLE_TYPE_HEAD / #STORAGE_PREFIX / KW__BITINT / `[[` into simple_declaration. Anything else is a parse error. external_declaration's close runs @finalize-new-path unconditionally. Tests: 78/78 pass. https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Brings back support for the five typed top-level preprocessor directive families plus the conditional family, opt-in via a new COptions.extended boolean. Plain mode stays the canonical default and is byte-identical to the prior build (extension grammar rules are stripped from the spec when the flag is off). Scope this round: #define / #undef / #include / #pragma / #error / #warning / #line, and #if / #elif / #else / #endif. Each #-line is its own external_declaration containing a typed directive node; conditional directives stay flat (no #if-group folding — consumers that want grouped structure walk the tree themselves). #define populates cmeta.macros and tags subsequent identifiers as MACRO_NAME at lex time; #undef reverts. call_expression nodes whose callee was tagged carry isMacro: true. GCC __attribute__, MSVC __declspec, GCC __asm__, and in-body #-lines are not in scope this round. Implementation: grammar-only. The 9 preprocessor sub-rules (preprocessor_directive dispatcher + define/undef/include/conditional/ simple typed sub-rules + macro_parameter_list / macro_body / header_form helpers) are pasted from commit 84d19eb~1 into c-grammar.jsonic. external_declaration.open gains two PP_HASH dispatch alts gated on the new @ext-and-first-iter condition. All 54 referenced @-actions / -conditions were already present in src/c.ts (left orphaned by the prior teardown), so no new TS authoring is needed beyond the option plumbing and the three extension gate refs (@extended-on, @extended-off, @ext-and-first-iter). Tests: 12 new under describe('extended-mode preprocessor') — covers each directive shape, plain-mode rejection of #-lines, macro tagging across #define/#undef cycles, function-like macro call isMacro tagging. All 90 (78 plain + 12 extended) pass. https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Brings the two compiler-specific attribute forms back as grammar-only extensions, alongside the existing C23 [[…]] form. Decoration points match the prior implementation: leading position in simple_declaration.open, plus interleaved alts in spec_loop.open and spec_loop.close. external_declaration.open also gains direct KW___ATTRIBUTE__ / KW___ATTRIBUTE / KW___DECLSPEC dispatch into simple_declaration so leading-attribute external decls don't have to go through a different head. The two paren-form rules (attribute_spec_gcc with double-paren shape, attribute_spec_msvc with single-paren shape) are pasted from 84d19eb~1; both delegate to the already-present attribute_item / attribute_argument_list helpers shared with the C23 form. All referenced @-actions (@asg-*, @asm2-*) were already in src/c.ts. New to EXTENSION_RULES so the rules are stripped from the spec in plain mode. One small follow-on fix: spec_loop.close now also accepts #STORAGE_PREFIX so a leading attribute followed by a storage class keyword (`__attribute__((unused)) static int q;`) parses correctly. @absorb-spec-storage / @absorb-spec-type are state-aware about which slot (rule.o0 vs rule.c0) holds the matched token. Out of scope this round: post-declarator attributes (`int x __attribute__((aligned(8)));`), GCC __asm__, in-body #-lines, #if folding. Tests: 11 new under describe('extended-mode GCC __attribute__ and MSVC __declspec') — covers leading + interleaved positions, MSVC single-paren form, attribute argument lists, keyword-as-attribute- name, multi-form coexistence with C23, plain-mode rejection. 101 / 101 pass (78 plain + 12 preprocessor + 11 attribute). https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Restores the six asm grammar rules — asm_statement (the state-machine driver), asm_template, asm_section, asm_operand, asm_clobber, asm_label_ref — pasted from 84d19eb~1. Dispatched from external_declaration.open (KW_ASM / KW___ASM / KW___ASM__ gated on @ext-and-first-iter) and from statement.open (same three keywords gated on @extended-on for in-body asm). All ~30 referenced @-actions (@asm-*, @asec-*, @aop-*, @acl-*, @alr-*) were already present in src/c.ts. Added to EXTENSION_RULES so the rules drop out of the spec in plain mode. @finalize-new-path now wraps an asm_statement child as a single child of external_declaration (matching static_assert_declaration) rather than splicing its children — this preserves the asm_statement node itself for consumers. Tests: 6 new under describe('extended-mode GCC __asm__') — plain-mode rejection, top-level template-only, volatile + clobbers, in-body asm under compound_statement, asm goto with labels, full extended form with outputs / inputs / clobbers. 107 / 107 pass (78 plain + 12 preprocessor + 11 attribute + 6 asm). Out of scope this round: in-body #-lines, post-declarator attributes, #if folding. https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Restores the preprocessor_line rule for #-lines that appear inside function bodies (mid-body #pragma, #ifdef, #error, etc — rare but legal C). Pasted from 84d19eb~1; dispatched from statement.open on PP_HASH gated on @extended-on. The four referenced @-actions (@preprocessor_line-bo, @pp-take-hash, @pp-reentry, @pp-absorb, @pp-take-newline) were already present in src/c.ts. Added to EXTENSION_RULES so the rule drops out of the spec in plain mode. The lexer's PP_HASH matcher remains line-start-gated, so the `#` must be preceded only by whitespace since the last newline — same constraint as top-level directives. Tests: 4 new under describe('extended-mode in-body preprocessor_line') — plain-mode rejection, in-body #pragma, in-body #ifdef/#endif as separate preprocessor_line nodes, and trailing PP_NEWLINE preservation. 111 / 111 pass. This completes the preprocessor coverage (top-level + in-body). Remaining deferred items: post-declarator attributes and #if-group folding. https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Adds the previously-unsupported post-declarator attribute decoration
point. Now `int x __attribute__((aligned(8)));`, `void f(void)
__attribute__((noreturn));`, `int z [[deprecated]];`, and `int g
__declspec(thread);` all parse with the attribute attaching to the
init_declarator (between the declarator and any `=` initializer /
terminator).
Grammar: four new alts in init_declarator.close gated on a new
@idecl-named (so they only fire after the declarator is complete):
- KW___ATTRIBUTE__ / KW___ATTRIBUTE → attribute_spec_gcc, gated on
@idecl-named-and-extended.
- KW___DECLSPEC → attribute_spec_msvc, same gate.
- PUNC_LBRACKET PUNC_LBRACKET → attribute_spec_c23, gated on
@idecl-named-and-as23 (named + token-adjacency check; no
extended-mode requirement since [[…]] is plain C23).
The C23 alt is positioned BEFORE the single-token PUNC_LBRACKET
array_postfix alt so the 2-token lookahead wins on `[[`. Each alt
has `r: 'init_declarator'` so multiple attributes chain and the
trailing `=` / `,` / `;` is still picked up.
@init_declarator-bc gains a branch that pushes any returned
attribute_spec child onto rule.node.children, with a per-rule
attachedAttrs Set to prevent re-pushing on the next bc cycle.
Two new conditions: @idecl-named-and-extended,
@idecl-named-and-as23.
Tests: 10 new under describe('extended-mode post-declarator
attributes') — plain-mode rejection, GCC attr on variable / function
/ array, attr-with-initializer, multi-attr chaining, per-declarator
attrs in a multi-declarator decl, C23 [[…]] in this slot, MSVC
__declspec, regression for plain `int x[10]`. 121 / 121 pass.
Remaining deferred: #if-group folding (the hardest, still on the
roadmap).
https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
Folds runs of #if/#ifdef/#ifndef … (#elif/#elifdef/#elifndef/#else)*
… #endif into a single conditional_group node, with one
conditional_branch per opening directive carrying its directive +
body items, plus a separate `endif` slot on the group. Nested #if
inside a branch produces a nested conditional_group (recursion in
the grammar, not via a post-pass). Stray #endif / unterminated #if
degrade gracefully — orphan directives stay flat as
external_declaration{conditional_directive}, unterminated groups
omit the `endif` field.
This was previously a tree post-pass (`structureConditionalGroups`,
removed in 84d19eb). Now it's pure grammar.
Two new grammar rules:
- conditional_group: state machine that takes the head directive,
delegates body absorption to cg_branch_body, then reads the next
directive (advance vs. close).
- cg_branch_body: absorber that pushes external_declarations and
nested conditional_groups onto the parent's curBranch.children;
stops without consuming at the next boundary directive.
extdecl_loop gains a 2-token-lookahead dispatch alt
(s: 'PP_HASH #ANY_C_TOKEN' c: '@cg-head-is-if-family' b: 2) that
routes #if-family heads into conditional_group; everything else
goes through external_declaration as before. Lookahead conditions
in close-state alts use the same s: + b: pattern to force-fetch
PP_HASH and the directive name (close-state `s: []` alts don't
auto-fetch ctx.t).
@conditional_group-bo distinguishes r:-recursion (preserve in-
progress group) from p:-descent (nested #if — fresh init) via
rule.prev: jsonic sets rule.prev to the previous instance only on
r:; p:-descent leaves it as NORULE. This is the same pattern
paren_inner_declarator uses for nested paren-forms.
EXTENSION_RULES gains conditional_group and cg_branch_body so
plain mode strips them entirely.
Tests: 9 new under describe('extended-mode #if conditional_group
folding') — basic if/endif, ifdef/else, include guards (ifndef +
multi-item body), nested groups, group followed by top-level decl,
stray #endif, unterminated #if, three-way fold's directive
identity. Updated the previously-flat-only test to assert the new
folded shape. 130 / 130 pass.
This completes the extended-mode roadmap. All deferred items from
the rewrite are now shipped: top-level preprocessor (with macro
tagging, #if folding), in-body #-lines, GCC __attribute__ + MSVC
__declspec (leading / interleaved / post-declarator), GCC __asm__
(top-level + in-body), post-declarator C23 attributes.
https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8eb7a2aeae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Stop at the body-opening \`{\` (don't consume — let | ||
| # simple_declaration's PUNC_LBRACE alt drive | ||
| # compound_statement). | ||
| { s: 'PUNC_LBRACE' b: 1 g: 'kr-end' } |
There was a problem hiding this comment.
Track nested braces in K&R declaration list
The kr_declaration_list rule currently stops as soon as it sees any PUNC_LBRACE, but valid K&R parameter declarations may contain braces inside a type specifier (for example int f(x) struct S { int a; } x; { ... }). In that case this rule will terminate at the struct body opener instead of the real function-body opener, so the function definition is split at the wrong place and the resulting CST is malformed. The stop condition here needs brace-depth awareness (or declaration-aware parsing) rather than a raw first-{ cutoff.
Useful? React with 👍 / 👎.
K&R-style definitions (
int f(a, b) int a; long b; { ... }) used tofall through to the legacy chomp, where the chomp's top-level
;terminator fragmented them into multiple
declKind: 'unknown'external declarations. The validator now accepts the shape, the
grammar dispatches into a new
kr_declaration_listrule between theparameter-list
)and the body{, and the result is a singlestructured
function_definitionexternal declaration. Thedeclaration-list child preserves the source as flat token refs to
match the legacy CST shape.
https://claude.ai/code/session_01DEdkKecwpq59ydTqZ7Aobv