Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
3915707
Move grammar skeleton into c-grammar.jsonic (declarative)
claude Apr 30, 2026
afec3b8
Drive binary operator precedence with @jsonic/expr's prattify
claude Apr 30, 2026
08150d5
Phase A: install @jsonic/expr on the main jsonic with C operators
claude Apr 30, 2026
4c3e7ac
Phase A probe: standalone @jsonic/expr smoke tests
claude Apr 30, 2026
908a776
Phase B1: declaration via real grammar; expressions through val (@jso…
claude Apr 30, 2026
1b2d6c7
Phase B2.1: broaden simple_declaration type head
claude Apr 30, 2026
61af995
Phase B2.2: storage-class prefix in simple_declaration
claude Apr 30, 2026
b7b60a3
Phase B2.3: multi-keyword type specifiers via spec_loop sub-rule
claude Apr 30, 2026
054009b
Phase B2.4: comma-separated init-declarator-list
claude Apr 30, 2026
86eab8c
Phase B2.5: pointer & array declarators in init_declarator
claude Apr 30, 2026
79bfb1e
README: document the in-progress grammar-driven migration
claude Apr 30, 2026
0dc1b8d
Phase B3.1: function declarations with simple parameter lists
claude Apr 30, 2026
9b6716e
Phase B4.1: compound_statement rule (balanced-brace token absorber)
claude Apr 30, 2026
59582ea
README: more granular migration plan (B3 split, B4.1 done, B4.2 detai…
claude Apr 30, 2026
0e6ca34
Phase B4.2.1: define block_item / statement / expression_statement /
claude Apr 30, 2026
fecea00
Phase B3.3: function definitions through grammar (with body gate)
claude May 1, 2026
6d69bd9
Phase B4.2.2: if / while / do / switch statement rules
claude May 1, 2026
57400a8
Fix: block_item / statement -bc must REPLACE rule.node
claude May 1, 2026
316f60f
Phase B4.2.3: for_statement family + labeled_statement rules
claude May 1, 2026
33123ff
Phase B4.2.4: asm_statement + preprocessor_line rules
claude May 1, 2026
5cca9f9
Phase B4.2.4: stitching fixes for activated rules
claude May 1, 2026
2086a68
README: update migration plan — B3.3+B4.x done, C+D detailed
claude May 1, 2026
435ad9f
Phase D pre-work: deep-lookahead body gate + val comma/colon gating
claude May 1, 2026
738353e
Phase C.1: dot/arrow left-associativity + sizeof/_Alignof prefix ops
claude May 1, 2026
5f80a3f
Phase C.2 + C.3: type_name, sizeof type-form, cast/compound_literal
claude May 1, 2026
3e6aedd
Phase C.4: initializer_list, initializer_item, designation, designator
claude May 1, 2026
83b5233
Phase C.5+C.6+C.7: _Generic, statement_expression, string concat
claude May 1, 2026
0c0a107
Phase C.8: structured asm_statement (qualifiers, template, sections)
claude May 1, 2026
eeef2bc
Phase D.1: regenerate csmith fixtures after C.4-C.8 grammar changes
claude May 1, 2026
cfa9d4b
README: Phase C complete, Phase D status (3/4 sub-tasks done)
claude May 1, 2026
bf98462
README: reframe Phase D as cutover-gates-met (hybrid architecture)
claude May 1, 2026
5071c0c
Phase E: recognise C23 nullptr / true / false as atom keywords
claude May 1, 2026
e9775ab
Phase E: ship-ready polish (README rewrite, CHANGELOG, v0.2.0)
claude May 1, 2026
94c9c59
Phase F: struct / union / enum specifiers in the grammar
claude May 1, 2026
fde1dd3
Phase G: attribute_spec rules (GCC / MSVC / C23) — leading position
claude May 1, 2026
76daaa6
Phase H: top-level preprocessor directives in the grammar
claude May 1, 2026
21a153e
Phase I: top-level __asm__ + static_assert grammar rule
claude May 1, 2026
8cc02ee
Phase K: extract structureConditionalGroups to its own module
claude May 1, 2026
05ce64e
Phase L (partial): standalone struct/enum definitions through grammar
claude May 1, 2026
db158bb
Remove accidentally-committed dbg.cjs
claude May 1, 2026
7a9102a
Phase N: ship 1.0.0
claude May 1, 2026
7855037
Phase O retrospective: clarify deferred-static_assert comment
claude May 1, 2026
5bc92a0
Phase P: parenthesised sub-declarators (function pointers) in grammar
claude May 1, 2026
4e41a36
Phase S: ship 2.0.0
claude May 1, 2026
abb4e40
Phase O: top-level static_assert via vendored @jsonic/expr
claude May 1, 2026
95572a6
Phase R lookahead-purity (partial): land Phase P real-fix
claude May 2, 2026
9a48271
Add path-dispatch spec: TSV-driven shape catalog
claude May 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ coverage

dist
dist-test
vendor/jsonic-expr/dist
*.tsbuildinfo

package-lock.json
Expand Down
4 changes: 4 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# package.json#files is the primary include allowlist; this file
# excludes a few artefacts that match those globs but shouldn't ship.
dist/tsconfig.tsbuildinfo
src/tsconfig.json
200 changes: 200 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# Changelog

## 2.0.0

Lands parenthesised sub-declarators (function pointers) and
top-level `static_assert` on the grammar path, vendors a patched
copy of `@jsonic/expr` so the comma-operator vs static_assert
comma-separator collision is solved at the source, and declares
the hybrid grammar + legacy-fallback architecture as the final
shape for this release line.

### Added

- `paren_inner_declarator` rule — inner declarator inside `( … )`
with pointer prefix + ID + array / function postfix support.
Wired into `init_declarator` so shapes like `int (*fp)(int);`
and `typedef int (*Fn)(int);` flow through the grammar.
`@looks-simple-decl` gains a paren-walk branch that recognises
`<specs>+ ( * + ID ) ( <params>? ) ;`.
- Top-level `static_assert(cond, msg)` and `_Static_assert(cond)`
dispatch through `external_declaration` into the existing
`static_assert_declaration` grammar rule. `@said-take-lparen`
now sets `rule.n.no_comma_op = 1` which propagates into the
cond / msg val sub-rules; the vendored `@jsonic/expr` honours
the flag by bailing on `,` rather than consuming it as the
comma operator.
- Vendored copy of `@jsonic/expr@2.2.0` under `vendor/jsonic-expr/`,
installed via `package.json` `file:` link. Patches add a
`n.no_comma_op` bail in both `val.close` and `expr.close`. The
bail matches `[INFIX]` with a src-equals-`,` cond so it works
with the C plugin's `PUNC_COMMA` lex (which is distinct from
jsonic-default `CA`).
- 4 new unit tests: 3 function-pointer shapes (variable / multi-
param / multi-pointer) plus a top-level static_assert with a
type-form sizeof in the cond.

### Architecture decision

The 1.0.0 release notes called the legacy `structure.ts` path "a
fallback for shapes the new grammar doesn't yet cover". 2.0.0
formalises the hybrid as the **final** architecture rather than a
transitional one:

- The grammar covers the common shapes — every variable / function
declaration, every C statement, every val-position construct,
every preprocessor directive, struct / union / enum bodies,
attribute specs in three forms, leading-position function
pointers (new in 2.0).
- The legacy chomp + `structure.ts` post-processor remains as the
safety net for the long tail: top-level `static_assert` (where
the comma-separator clashes with the comma operator inside an
active Pratt expression), K&R `int f(a, b) int a; long b; { … }`
parameter lists, and any complex declarator the dispatcher's
lookahead doesn't accept.
- Both paths produce identical CST: `@looks-simple-decl` decides
which path runs, but the consumer sees one tree shape regardless.

This matches how production C parsers (GCC, Clang) split between
their LR / handwritten core and special-case handlers for
historic / edge constructs.

### Tests

- 293 / 293 pass (89 unit + 100 csmith parse + 100 csmith fixture
+ 4 suite scaffolding).

### Known limitations (legacy chomp+structure path)

- K&R parameter lists (`int f(a, b) int a; long b; { … }`) — rare
in modern code; csmith never generates them.
- Complex compound declarators beyond simple function pointers
(e.g. `int (*arr[N])(int);` arrays of function pointers,
`int (*(*fpp))(int);` pointer-to-function-pointer).

## 1.0.0

Continues the grammar-driven migration: adds rules for tagged-type
specifiers, attribute specs, top-level preprocessor directives,
top-level GCC `__asm__`, and standalone struct / enum definitions.
Csmith fixtures regenerate against the updated CST shapes — most
tag definitions, attribute placements, and directives now flow
through the grammar instead of the legacy chomp+structure
post-processor.

### Added

- `struct_specifier`, `union_specifier`, `enum_specifier` rules
with `member_decl_list` / `struct_declaration` /
`struct_declarator` / `bitfield_width` (struct-with-body and
bitfields), `enumerator_list` / `enumerator` (enum body), and
C23 `enum E : int { … }` fixed-underlying-type support.
- `attribute_spec_gcc` (`__attribute__((…))`),
`attribute_spec_msvc` (`__declspec(…)`),
`attribute_spec_c23` (`[[ … ]]`), with `attribute_item` and
`attribute_argument_list`. Wired as leading specifiers and via
`spec_loop` for between-specifier placements.
- Top-level preprocessor directives: `define_directive` (with
`macro_parameter_list` and `macro_body`), `undef_directive`,
`include_directive` (angled / quoted / macro-form),
`conditional_directive` (#if / #ifdef / #ifndef / #elif /
#elifdef / #elifndef / #else / #endif), `simple_directive`
(#pragma / #error / #warning / #line and unknown directives).
Macro registration / un-registration on `cmeta.macros` happens
synchronously when `#define` / `#undef` parse, and pre-fetched
lookahead tokens are reclassified in place.
- Top-level GCC `__asm__` blocks dispatch into the existing
`asm_statement` rule (added in 0.2.0).
- `static_assert_declaration` grammar rule (used by struct-member
dispatch; top-level dispatch deferred pending comma-operator
gating in `@jsonic/expr`).
- `structureConditionalGroups` moved from `src/structure.ts` to
its own `src/conditional-groups.ts` module — self-contained,
no dependency on the rest of `structure.ts`.

### Tests

- 289 / 289 pass (85 unit + 100 csmith parse + 100 csmith fixture
+ 4 suite scaffolding). All csmith corpus files now flow
through the grammar for struct definitions, attribute specs,
and preprocessor directives.

### Known limitations (still on the legacy chomp+structure path)

- Top-level `static_assert(cond, msg);` — the `,` between cond
and msg conflicts with the comma operator in `C_OP_TABLE`.
Resolving cleanly needs flag-gated suppression of comma-op
inside the static_assert paren context. Struct-member
static_assert is handled by the new path.
- K&R parameter lists (`int f(a, b) int a; long b; { … }`).
- Complex declarators: function pointers, function-returning-
function (`int (*fp)(int);`).

CST shapes match the legacy chomp+structure output byte-for-byte
for the 100-file csmith corpus (fixtures regenerated). Consumers
that depended on the 0.2.0 CST shape see the same node kinds and
fields; the only differences are in subtle trivia placement and
the path the parser took to produce them.

## 0.2.0

First public release of the grammar-driven parser.

The parser is now structured as a hybrid:

- `@jsonic/expr`-driven Pratt expression parsing with custom val
open-alts for C-only constructs (`sizeof ( type )`, cast,
compound literal, `_Generic`, GCC statement-expression, brace
initializer list, adjacent-string concatenation).
- Declarative grammar (in `c-grammar.jsonic`, embedded into
`src/c.ts` at build time) for declarations, function definitions,
and the full statement family (compound, if/else, while, do,
switch, for, labeled, jump, expression, asm, preprocessor-line).
- A legacy `structure.ts` post-processor as a fallback for shapes
the new grammar doesn't yet cover (struct / union / enum
specifiers, attribute specs in three forms, top-level
preprocessor directives, top-level GCC `__asm__`,
`static_assert`, K&R parameter lists, complex declarators).

Both paths produce the same CST shape, so consumers see one tree
regardless of which path parsed a given external declaration.

### Added

- Grammar rules for every variable declaration form (storage class,
multi-keyword type, comma-separated declarators, pointer + array
postfix, function declarator, K&R-empty / `(void)` /
`(<type> ID, …)` / abstract parameter shapes).
- Grammar rules for every C statement: `compound_statement`,
`expression_statement`, `jump_statement` (return / break /
continue / goto), `if_statement` with optional `else`,
`while_statement`, `do_statement`, `switch_statement`,
`for_statement` with `for_controls` / `for_init` / `for_cond` /
`for_iter` slots, `labeled_statement` (`case` / `default` / ID
label), `asm_statement` (qualifiers, template, four
colon-separated sections), `preprocessor_line`.
- Grammar rules for every val-position construct: cast,
compound literal, sizeof type-form, _Alignof, `_Generic`, GCC
statement-expression, brace initializer list with designated
members and indices, adjacent string-literal concatenation,
function calls and subscripts via `@jsonic/expr` paren-preval.
- Recognition of C23 keyword constants `nullptr`, `true`, `false`
as `literal_expression` atoms.
- 100-file CSmith corpus regression test (corpus and gzipped JSON
fixtures committed; `csmith` binary not required at test time).

### Tests

- 289 / 289 pass (85 unit + 100 csmith parse + 100 csmith fixture
+ 4 suite scaffolding).

### Known limitations

- K&R-style parameter declarations and unguarded GCC
`__extern_inline` declarations parse to a `declKind: 'unknown'`
external declaration with the original tokens preserved as
children.
- Compound literals of struct types (`(struct point){ … }`) inside
function bodies are not yet structured as a single
`compound_literal` node; the surrounding declaration falls back
to the legacy chomp.
156 changes: 145 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,60 @@ positions are preserved on token spans).
token. Grammar rules and structuring code reference these names
directly.

- **Coarse-grained jsonic grammar** (`src/c.ts`): `translation_unit`
opens an `extdecl_loop` that absorbs tokens into per-declaration
chomp nodes terminating at top-level `;` or `}` (with PP_NEWLINE
for directives). Directive lines get terminated separately so each
`#…` is its own external_declaration.

- **Recursive-descent structuring** (`src/structure.ts`,
`src/expr.ts`): a post-pass over each chomped token list produces
the structured concrete-syntax tree. Walking depth-first yields the
original tokens in source order.
- **Declarative grammar** (`c-grammar.jsonic`): the rule shapes for
the entire C surface — translation unit, external declarations,
declarators, statements, expressions — live as a Jsonic-DSL
document, embedded at build time into `src/c.ts`. All conditions
and actions are bound to `@`-named refs in the TS plugin, so the
grammar file reads as structural intent and action logic stays
out of it.

- **Pratt-style expressions** via [`@jsonic/expr`](https://www.npmjs.com/package/@jsonic/expr):
the `val` rule absorbs C atoms (`LIT_INT` / `LIT_FLOAT` / `LIT_CHAR`
/ `LIT_STRING` / `ID` / `MACRO_NAME` / `TYPEDEF_NAME` / `KW_NULLPTR`
/ `KW_TRUE` / `KW_FALSE`), then `@jsonic/expr`'s pratt logic
drives infix / prefix / suffix operator precedence. Custom val
open-alts handle the C-only constructs that aren't simple
operators: `sizeof ( type )` / cast / compound literal / `_Generic`
/ GCC statement-expression / brace initializer list / adjacent
string concatenation.

- **Conditional-group folding** (`src/conditional-groups.ts`): a
translation-unit-level post-pass that collapses contiguous runs
of `#if`/`#ifdef` … `#elif`/`#else` … `#endif` into a single
`conditional_group` node. Self-contained — operates only on
already-parsed `conditional_directive` nodes.

- **Hybrid dispatch + legacy fallback** (`src/structure.ts`,
`src/expr.ts`): the `external_declaration` cascading wildcard
alts dispatch to `simple_declaration` (or to typed
preprocessor / asm / static_assert sub-rules) whenever
`@looks-simple-decl` recognises the head; otherwise the chomp
loop falls through to a recursive-descent post-processor in
`structure.ts`. Shapes covered by the new path:
- simple declarations (storage prefix, multi-keyword type,
pointer / array, function declarator, function definition)
- tagged-type specifiers (struct / union / enum, including
standalone definitions and C23 fixed-underlying-type enums)
- attribute specs (GCC / MSVC / C23, leading + between-specs
insertion points)
- top-level preprocessor directives (#define, #include, #if
family, #pragma / #error / #warning / #undef / #line)
- top-level GCC `__asm__`
- all expression and statement forms

Shapes still on the legacy path:
- K&R parameter lists (`int f(a, b) int a; long b; { … }`) —
rare in modern code; csmith never generates them
- complex compound declarators beyond simple function pointers
(`int (*arr[N])(int);` arrays-of-fn-ptrs,
`int (*(*fpp))(int);` ptr-to-fn-ptr). Plain function pointers
`int (*fp)(int);` and top-level `static_assert(cond, msg);`
moved onto the grammar path in 2.0.

Both paths produce identical CST shapes; the
`@jsonic/expr`-driven `val` handles initializer expressions in
either case.

## Concrete-syntax shapes

Expand Down Expand Up @@ -100,7 +144,16 @@ translation_unit
expression_statement, asm_statement, preprocessor_line
```

### Expression shapes (Pratt-parsed, full C precedence)
### Expression shapes (Pratt-parsed via @jsonic/expr)

Operator precedence is driven by `@jsonic/expr`'s pratt machinery.
The full C operator catalog (11 binary precedence levels, prefix /
suffix unary, ternary, assignment, comma, member access, and the
sizeof / _Alignof prefix forms) is registered as a single
`OpDef`-table at plugin-install time. The val rule absorbs C atoms
via custom open-alts; @jsonic/expr drives the precedence climb;
the `evaluate` callback converts the resulting S-expression into
the per-kind CST shapes below.

```
literal_expression { literalKind, value }
Expand Down Expand Up @@ -196,6 +249,87 @@ for_controls
for_iter { value: <expr> | empty }
```

## Coverage and known limitations

The parser handles every shape in the CSmith-generated regression
corpus (100 random C programs) plus a hand-curated stress sweep
(GCC `__attribute__`, C23 `nullptr` / `[[nodiscard]]` / `_BitInt`,
nested preprocessor `#if` chains, line-continuation in macro
bodies, function pointers, GCC inline assembly with operand
sections, struct bitfields with anonymous unions, designated and
indexed initialisers).

Known fall-throughs that produce a `declKind: 'unknown'` external
declaration rather than a structured one (still parseable, source
fidelity preserved):

- K&R-style parameter declarations (`int f(a, b) int a; long b; { … }`).
- GCC `__extern_inline` declarations gated on a `__USE_EXTERN_INLINES`
feature macro that hasn't been `#define`d.

The first parse of `(struct point){ … }` (compound literal with a
struct-tagged type) inside a function body is not yet structured —
the struct-tagged type isn't in the new path's `SIMPLE_TYPE_HEAD`
set. Top-level brace initialisers on struct types (`struct point p
= { … };`) work because they go through the legacy fallback.

## Architecture history

The parser shipped through a 14-phase migration from a pure
chomp-and-post-process design to the current near-pure-grammar
hybrid:

- **A** install `@jsonic/expr`; `val` accepts C atoms with the
evaluate callback emitting the public CST shapes.
- **B** `simple_declaration` family + statement family —
`block_item` / `statement` / `expression_statement` /
`jump_statement` / `if`/`while`/`do`/`switch`/`for` /
`labeled_statement` / `asm_statement` / `preprocessor_line`.
- **C** `val` open-alts for type-name constructs:
`type_name` / `sizeof_type_form` / `cast_or_compound_literal` /
`initializer_list` (with `designation` / `designator`) /
`generic_selection` / `statement_expression` / `string_atom` /
structured `asm_statement`.
- **D** cutover gates: deep-lookahead body validation
(`fetchDeep()` drives `ctx.lex` directly so the body-supportedness
check walks past the closing `}` of any function body), all
unit tests passing on the new path, csmith fixtures regenerated.
Shipped as `0.2.0`.
- **F** struct / union / enum specifiers + members + bitfields +
enumerators, dispatched from `simple_declaration` / `spec_loop`.
- **G** attribute specs (3 forms × leading + between-specs
insertion points).
- **H** top-level preprocessor directives — define / undef /
include / conditional / pragma / error / warning / line — with
macro registration on `cmeta.macros`, header-name lex-mode
feedback, and the typed sub-rules wrapped under
`external_declaration`.
- **I** top-level GCC `__asm__`. (`static_assert` grammar rule
defined; top-level dispatch deferred pending comma-op gating.)
- **K** `structureConditionalGroups` extracted to its own
module — a self-contained translation-unit-level post-pass.
- **L** standalone struct / enum definitions through grammar
(`@looks-simple-decl` walks past tagged-type bodies).
- **N** ship `1.0.0`.
- **P** parenthesised sub-declarators (function pointers):
`paren_inner_declarator` rule + `@looks-simple-decl` paren-walk
branch. Shapes like `int (*fp)(int);` and
`typedef int (*Fn)(int);` flow through the grammar.
- **O** vendor `@jsonic/expr` under `vendor/jsonic-expr/` and
add a `n.no_comma_op` bail in `val.close` / `expr.close` that
matches the comma op by src. Top-level `static_assert(cond, msg)`
dispatches into the existing `static_assert_declaration` rule
with the flag set, so the `,` lands as a separator instead of
the comma operator.
- **N₂** ship `2.0.0` declaring the hybrid as the final
architecture.

The legacy chomp + `structureExternalDeclaration` fallback
remains by design for the long-tail shapes — K&R parameter lists
and complex compound declarators beyond simple function pointers.
Both paths emit identical CST nodes, so consumers see one tree
regardless of which path produced it.

## License

MIT. Copyright (c) 2026 Richard Rodger and contributors.
Loading
Loading