Add C language parser plugin for Jsonic by rjrodger · Pull Request #1 · jsonicjs/c

rjrodger · 2026-04-30T15:38:04Z

This PR replaces the JSONC (JSON with Comments) plugin with a new C language parser plugin for Jsonic.

Summary

The repository has been transformed from a JSONC parser to a comprehensive C language parser. This includes a complete lexer, parser, and AST builder that handles C23 syntax with support for compiler extensions and macros.

Key Changes

New C Parser Implementation:

src/c.ts - Main plugin entry point that integrates the C parser with Jsonic
src/structure.ts - Post-processing pass that converts flat token lists into a structured concrete syntax tree using recursive-descent parsing
src/expr.ts - Pratt-style expression parser handling all C operator precedence levels (C23 §6.5)
src/matchers.ts - Focused lexer matchers for C tokens (whitespace, comments, preprocessor directives, identifiers, literals, punctuators)
src/tokens.ts - Token name catalog for all C tokens (keywords, punctuators, literals)
src/symbols.ts - Symbol and macro tables for resolving the identifier/typedef-name disambiguation problem

Removed JSONC Implementation:

Deleted src/jsonc.ts, go/jsonc.go, go/jsonc_test.go
Removed JSONC grammar file and embedding script
Removed JSONTestSuite test fixtures and related documentation

Configuration Updates:

Updated package.json to reflect C parser package (@jsonic/c)
Updated README.md with C parser documentation
Updated Makefile with C-specific build targets
Added .gitattributes for binary test fixture handling

Implementation Details

The C parser uses a two-pass approach:

Lexing phase: Tokenizes source while preserving trivia (comments, whitespace) and maintaining symbol tables for typedef-name disambiguation
Structuring phase: Recursive-descent parser that builds an AST while preserving source fidelity through token references

Key features:

Preserves macros and compiler extensions (GCC/Clang attributes)
Handles C23 syntax including new keywords and features
Maintains complete source location information (span tracking)
Separates trivia from grammar-level decisions for clean AST structure
Expression parsing with full operator precedence table

https://claude.ai/code/session_01Qjw28F24FXYwmDtUHnB4Gr

Convert this package from @jsonic/jsonc to @jsonic/c. Targets C23 with GCC/Clang/MSVC extensions; produces a concrete syntax tree preserving every token, comment, macro, and extension as-is. - src/tokens.ts: token catalog covering C23 keywords, all extension keywords, every punctuator (one named token per literal form). - src/symbols.ts: SymbolTable with full nested scopes (file, fn-proto, fn-body, block, struct-union, enum, for-init), MacroTable, and a LexMode flag bag — all bundled as CMeta on ctx.meta.cmeta so lex matchers and rule actions share the same state. - src/matchers.ts: focused lex matchers, one job each — whitespace, line continuation, line/block comments, preprocessor directive opener (line-start gated), directive newline, header name (mode-gated), identifier with keyword/typedef-name/macro-name reclassification, integer (dec/hex/oct/binary, separators, suffixes), float (decimal + hex), char (with prefixes), string (with prefixes and raw R""), and a single longest-match punctuator dispatch. - src/c.ts: plugin entry. Disables jsonic's built-in lexers so our matchers fully own tokenization, registers all token names so grammar rules can reference them, installs CMeta via parse.prepare. Grammar covers translation_unit -> extdecl_loop -> external_declaration with a coarse-grained token chomper that recognises the typedef-name shape and registers it in the symbol table. Pre-lexed lookahead tokens are reclassified in place after typedef registration so subsequent matches see TYPEDEF_NAME immediately. Smoke tests cover tokenization correctness (keyword vs identifier boundary, multi-char punctuator dispatch, comment trivia preservation) and the typedef disambiguation path end to end. Subsequent slices will replace the chomper with proper declarator, declaration, statement, and expression grammar (the latter driven by @jsonic/expr); add directive nodes via @jsonic/directive with best-effort parsing of conditional groups; and surface the macro table to tag macro-call sites.

- Replace the single-name typedef heuristic with a real declarator walker. For each declarator in `typedef T1 *p, q[3], (*fn)(int);` the trailing identifier is found by stripping pointers, qualifiers, attribute groups, array/function postfixes, and recursing into parenthesised subdeclarators. Every declared name is registered. - splitDeclarators splits the init-declarator-list at top-level commas, ignoring commas inside (), [], or {}. - declaratorPart drops the initializer suffix at top-level `=` so initializer expressions don't pollute the name search. - findSpecBoundary identifies where declaration-specifiers end: it walks past storage-classes, type-specifiers, qualifiers, function- specifiers, attribute groups, and a single TYPEDEF_NAME, plus the optional brace-balanced body of struct/union/enum specifiers. - The chomper no longer auto-terminates at a top-level `}`. Instead it marks "just closed a brace" and only terminates if the next non- trivia token unambiguously begins a new external declaration (storage-class/type-specifier/qualifier keyword, attribute spec, TYPEDEF_NAME, preprocessor hash, or EOF). This lets `typedef struct { … } S;` and `enum E { … } var;` finish at the trailing `;`, while function definitions still terminate at `}` before the next top-level decl. Tests cover multi-name typedef, pointer/array/function-pointer typedefs, struct-tag typedefs, struct-with-body typedefs, mixed declarator lists, function-definition-then-decl, and brace-bearing initializers.

Move comment and line-continuation tokens into the IGNORE token set so the parser proper never sees them, then capture them in a sub-lex hook that attaches the buffered trivia to the next non-trivia token via tkn.use.leading. The chomper drains use.leading into the AST in source order ahead of each absorbed token, preserving comments verbatim without any grammar rule needing to mention them. - Split trivia into PRESERVE (block/line comments, line continuations) and DROP (whitespace, jsonic's #LN/#CM). Only PRESERVE flows to use.leading. - pendingTrivia lives on CMeta so it travels with parser state. - jsonic.sub({lex}) registers the hook once at plugin install. Tests: trivia ordering through declarations, line-continuation preservation, whitespace remains absent from the AST.

Add a post-processing pass over the chomped token list (src/structure.ts) that produces structured concrete-syntax nodes for declarations and function definitions. Walking the resulting tree depth-first yields the original token sequence in order, so source fidelity is preserved while clients gain a shape they can actually navigate. The external_declaration close action now calls structureExternalDeclaration(tokens). On a clean parse it replaces the flat token-ref children with structured children: external_declaration { declKind: 'declaration' } declaration_specifiers <token-refs for storage-class / type-specifier / qualifier> struct_specifier|union_specifier|enum_specifier (with optional member_decl_list / enumerator_list still flat for now) attribute_spec (__attribute__((...)) / __declspec(...)) init_declarator_list init_declarator { declaredName } declarator pointer* (with qualifiers + attribute specs) direct_declarator { declaredName } <ID or parenthesised subdeclarator> array_postfix* function_postfix* asm_label? attribute_spec* '=' initializer? ',' init_declarator ... ';' external_declaration { declKind: 'function_definition' } declaration_specifiers declarator kr_declaration_list? (K&R old-style parameter declarations) compound_statement (body still flat tokens for now) The TokenStream class hides PRESERVED trivia from grammar decisions but emits the trivia tokens in order as siblings of the next real token's ref. This keeps comments and line continuations in the right place inside the structured tree. When structure can't recognise the shape (e.g. preprocessor lines, raw expression statements at top level) the chomper's flat token-ref list is retained and declKind is set to 'unknown'. Tests cover: simple int x = 1, multi-decl int a,b=2,c, pointer/array/ function declarators, function-definition with compound_statement, struct-with-body, enum with C23 fixed underlying type, and __attribute__ on a declaration.

Replace the opaque brace-balanced member_decl_list / enumerator_list of slice 4 with proper member parsing. struct/union bodies now contain struct_declaration nodes, each with: - specifier_qualifier_list (declaration_specifiers reused, renamed) - struct_declarator_list of struct_declarator nodes - trailing ';' struct_declarator carries: - the declarator (with declaredName) - optional bitfield_width (`: const-expr`) - optional trailing attribute_spec This handles `struct S { unsigned f : 1; int : 7; int n; };` cleanly, including anonymous bitfields. static_assert at member level (C23 + GCC) becomes a static_assert_declaration node. enum bodies now contain enumerator nodes, each with: - declaredName - optional [[…]] / __attribute__((…)) attribute_spec (C23) - optional initializer (constant-expression, opaque for now) The trailing comma after the last enumerator is preserved. Tests: three-field struct, bitfield + anonymous bitfield, enum with initializer and trailing comma.

Replace the flat-token compound_statement of slice 4 with proper block-item parsing. A compound_statement now contains: - declaration nodes (when the head is a specifier or static_assert) - statement nodes, dispatched by leading token Statement kinds modelled (each preserves all source tokens): - if_statement (with paren_condition + then-stmt + optional else) - switch_statement - while_statement - do_statement (do … while (…) ;) - for_statement (for_controls captures the parenthesised header) - jump_statement (goto/continue/break/return; jumpKind set) - labeled_statement (case / default / label; labelKind/labelName) - expression_statement - compound_statement (recursive) - asm_statement (GCC __asm__ / asm with optional qualifiers) - preprocessor_line (PP_HASH … PP_NEWLINE, opaque) Also factored out parseDeclaration so the same shape used at the top level is reused inside blocks. Tests: function with mixed decl+expr+return body, if/else+while+for in one body, switch with case+default labels, goto+label round trip, do/while.

Each #-line on the input now becomes its own external_declaration containing one structured directive node, instead of being absorbed into the surrounding code by the chomper. Directive node kinds and their structured fields: - include_directive { includeForm, headerKind, headerName } - define_directive { macroName, macroKind ('object-like' | 'function-like'), macroParams?, macroVariadic? } - undef_directive { macroName } - conditional_directive { directive: 'if'|'ifdef'|'ifndef'|'elif'| 'elifdef'|'elifndef'|'else'|'endif' } - pragma_directive (opaque body) - error_directive (opaque body) - warning_directive (opaque body) - line_directive (opaque body) - unknown_directive (any other #-form) The function-like distinction in #define checks adjacency: only `NAME(` with no whitespace between produces a function-like macro. Parameter list parsing pulls out parameter names and detects the variadic ellipsis. The chomper now terminates an external_declaration at PP_NEWLINE when the first non-trivia token is PP_HASH, so a directive line plus a following declaration land as two separate external_declarations instead of one giant chomp. Define/undef directives populate ctx.meta.cmeta.macros so future slices can tag macro-call sites. Conditional groups (#if … #endif) are intentionally left as a flat sequence of directive + declaration nodes for now; collapsing them into a single nested group with branches comes in a follow-up slice. Tests: angled and quoted #include, object-like and function-like #define (with variadic), #if/#endif sequencing, #pragma/#error, #undef.

function_postfix nodes are no longer opaque '(' … ')' chunks. Each one now contains either: parameter_type_list parameter_declaration { declaredName? } declaration_specifiers declarator | abstract_declarator parameter_variadic (the '...' marker, also sets parameter_type_list.variadic = true) identifier_list (K&R-style identifier-only list) Special cases handled: - `()` — empty postfix, returned as-is - `(void)` — collapsed into a single parameter_declaration whose spec list contains the lone void - K&R `(a, b, c)` — detected by lookahead (every comma-separated item is exactly one ID, ending with ')') Abstract vs concrete declarator: parseParameterDeclaration tries the concrete form first; if no declaredName surfaces it backtracks and re-parses as abstract. This keeps `int qsort(void *, size_t, ...)` clean (no spurious declaredName on the size_t parameters) once size_t is registered as a typedef. Tests: void prototype, named ANSI parameters with declaredName extraction, variadic ellipsis, abstract parameters across a typedef boundary, and K&R-style identifier_list.

Identifiers previously seen in a #define now lex as MACRO_NAME instead of ID, mirroring the typedef-name path. The grammar accepts MACRO_NAME wherever it accepts ID (added a small isIdLike helper and replaced the relevant call sites in structure.ts), so structuring is unchanged while clients can distinguish macro references from ordinary identifiers without consulting the macro table themselves. The identifier matcher consults ctx.meta.cmeta.macros after the typedef check. After a #define directive is structured into a define_directive node, registerMacrosFromTree calls reclassifyAsMacro which walks the lexer's pre-fetched lookahead (ctx.t and lex.pnt.token) mutating any matching ID tokens to MACRO_NAME — so the very first post-#define occurrence already carries the correct token name even when jsonic's lookahead got there first. #undef removes the entry from the macro table; subsequent uses of the name re-emerge as plain ID. Tests cover all three transitions.

Inside expression_statement, initializer, and jump_statement bodies, the post-chomp pass now promotes ID/MACRO_NAME-followed-by-(args) sequences into nested call_expression nodes. The grammar context for calls is identical regardless of statement form, so a single structureCallsInPlace helper handles all three. call_expression { callee, isMacro } <callee token> argument_list '(' <recursively-structured tokens> ')' isMacro is set from the callee token's tname (true for MACRO_NAME), giving consumers a syntactic flag that distinguishes a macro invocation from a real function call without re-querying the macro table. Recursion is handled by structuring the argument list's interior as a synthetic node and inlining the result, so g(f(1), h(2)) produces three nested call_expression nodes. Tests: simple call (isMacro false), macro invocation (isMacro true), nested calls inside arguments.

Add a translation_unit-level post-pass that folds the flat sequence of #if/#ifdef/#ifndef … (#elif…)* (#else)? … #endif directives into a single conditional_group node containing typed branches. Best-effort: unmatched #endif or unterminated #if leaves the surrounding flat sequence untouched so the rest of the tree stays intact. Output shape: conditional_group branches: [ conditional_branch { branchKind: 'if'|'ifdef'|'ifndef'|'elif'|'elifdef'|'elifndef'|'else', directive: <external_declaration containing the directive>, body: [<external_declaration | conditional_group>...], children: [directive, ...body] // depth-first walk fidelity }, ... ] endif: <external_declaration containing the #endif directive> children: [...branches, endif] Nested #if … #endif inside a branch are recursively grouped by re-running structureConditionalGroups on the branch body. The pass also re-runs inside any preserved children (e.g. function bodies) so preprocessor groups that live mid-function get the same treatment. Tests: simple if/endif fold, three-way if/elif/else, nested ifdef inside an outer ifdef, and best-effort handling of a stray #endif (left flat).

Add src/expr.ts: a hand-rolled Pratt-style parser covering the full C operator-precedence table from C23 §6.5. All expression contexts — expression_statement bodies, jump_statement (return/goto) operands, init_declarator initializers — now flow through it instead of being absorbed as flat tokens with a post-pass. Output shapes (all preserve every source token via depth-first children): literal_expression { literalKind, value } identifier_expression { name } paren_expression call_expression { callee, isMacro, argument_list } subscript_expression { target, index_list } member_expression { object, op ('.'|'->'), memberName } postfix_unary_expression { target, op ('++'|'--') } unary_expression { op, operand } // ++/--/+/-/!/~/*/&/sizeof/_Alignof/... cast_expression { typeName, operand } binary_expression { op, left, right } // 11 precedence levels conditional_expression { cond, then, else } assignment_expression { left, op, right } // right-assoc comma_expression generic_selection statement_expression // GCC ({ ... }) compound_literal { typeName, initializer_list } Implementation notes: - Cast vs paren-expression vs compound-literal disambiguation peeks one token past `(`. Type-name detection accepts type keywords and TYPEDEF_NAMEs (the typedef-name table is already populated by the earlier slices). - sizeof / _Alignof on a parenthesised type-name produce a type_name operand; on an expression they recurse into parseUnary. - Adjacent string literals are folded into a single literal_expression. - Macro-call detection moves into parsePostfix's call branch: when the immediate target is an identifier_expression whose token was MACRO_NAME, isMacro=true. The slice-10 post-pass is no longer needed for these contexts and was removed. Tests: precedence 1+2*3, right-assoc assignment chain, ternary, postfix subscript+member chain, prefix -/!/* unary, typedef-name cast, sizeof on expr and on type-name, adjacent-string concatenation.

Parse two header- and source-shaped fragments end to end and verify that the structural CST matches expectations. Catches regressions where a single feature works in isolation but breaks under composition. Header-shape coverage: - #ifndef … #define … #endif wrapping the whole file (folds into a single conditional_group with one ifndef branch), - #include <…> (angled), - eight typedef declarations (signed/unsigned char/short/int/long long), all registered as typedef-names, - three function-like and object-like #define directives whose macro names land in the macro table, - typedef of a struct-with-body (struct vec → vec_t), three int32_t members, - two function prototypes using the freshly-registered typedef-names, - C23 fixed-underlying-type enum (`enum status : int`) with two enumerators plus trailing comma. Source-shape coverage: - #include "vec.h", - three function definitions (sign, vec_add, vec_dot), - if-statements, multiple return values, member access via . and ->, - long chained binary_expression on vec_dot's return. Both tests verify the structural shape (e.g. conditional_group has the expected branchKind, struct has the expected member count, top-level return chain is rooted at +) rather than just "doesn't throw", giving confidence that real-world C will round-trip through the parser.

initializer_list contents are no longer opaque. Each item is parsed as an initializer_item that may carry a leading designation: initializer_item { designation?, value } designation member_designator { memberName } // .x index_designator // [n] value: initializer | <expression-node> Nested initializer-lists are recognised: each item's value may itself be an initializer wrapping another initializer_list, so 2D arrays (`{ {1,2}, {3,4} }`) and nested-struct initializers structure the whole way down. The leading PUNC_ASSIGN of a designation is captured on the designation node so source fidelity is preserved. _Static_assert / static_assert split into typed children: static_assert_declaration { condition, message? } condition is parsed via the Pratt parser so the boolean expression is fully structured; the optional second argument lands as a literal_expression (or whatever expression form). Top-level static_assert at the translation-unit level is now dispatched explicitly in structureExternalDeclaration (it isn't a declaration- specifier head). Tests: .field designators, [index] designators, nested initializer lists, static_assert with both arguments, and bare static_assert without a message.

_Generic( ctrl, T1: e1, T2: e2, default: eD ) is no longer an opaque balanced-paren node. Output: generic_selection controlling: generic_controlling_expression { expression } associations: [ generic_association { associationKind: 'type'|'default', typeName?: type_name, value: <expression-node> }, ... ] The controlling expression and each association's value run through the Pratt parser so binary operators, calls, identifiers etc. all land structured. The type-name slot still holds an opaque token list because a full type-name parser lives in structure.ts and consuming it from inside an expression context would be circular for now; preserved verbatim. Test: _Generic(x, int:1, double:2, default:0) — three associations with the right kinds and structured values.

attribute_spec is no longer an opaque ((…)) chunk. Each attribute item inside the parens becomes a typed sub-node: attribute_spec { attributeForm: 'gcc'|'msvc'|'unknown', items } attribute_item { attributeName, attributePrefix?, argumentList? } attribute_argument_list // structured args via Pratt Form distinction: - GCC __attribute__((items)) uses double parentheses - MSVC __declspec(items) uses single parentheses The attribute name slot is permissive: identifiers, typedef-names, macro-names, and even C reserved words are accepted (so things like __attribute__((const)) and __attribute__((noreturn)) parse the same way). C23 namespaced form `prefix::name` is recognised and split into attributePrefix + attributeName. Each argument is parsed with the Pratt expression parser so e.g. __attribute__((format(printf, 1, 2))) yields three structured expression arguments instead of opaque tokens. Tests: GCC __attribute__ with bare name + format(...) + nonnull(...), MSVC __declspec(dllexport), and __attribute__((const)) using a keyword as the name slot.

asm_statement is no longer an opaque (...) block. The body now splits along its colons into typed sections, each member structured: asm_statement { qualifiers: ['volatile'|'inline'|'goto'...] } asm_template { expression } // string literal expr asm_outputs // optional asm_operand { asmName?, constraint { value }, value { expression } } ... asm_inputs // optional asm_operand ... asm_clobbers // optional asm_clobber { value } // string literal ... asm_labels // optional asm_label_ref { labelName } // identifier ... Output and input operand sub-shapes are identical (same C grammar): optional [asm-name] in brackets, a string-literal constraint, and a parenthesised C expression that the Pratt parser structures. Trailing empty sections (e.g. `: : : "cc"` with empty outputs and inputs) are fine — each ':' opens a new section regardless of whether the previous one had items, and the section's children remain empty. Walking depth-first still yields the original tokens in order including the colons. Tests: bare template, full extended form (output, two inputs, clobbers), `__asm__ goto` with labels section, and operand with [asm-name] prefix.

C23 introduces a new attribute syntax sitting alongside GCC's __attribute__ and MSVC's __declspec. The lexer emits `[` and `]` as single PUNC_LBRACKET / PUNC_RBRACKET tokens, so detection requires an adjacency check against the source positions: isC23AttributeOpen(ts): PUNC_LBRACKET at offset 0, PUNC_LBRACKET at offset 1, second.sI === first.sI + first.len // no chars between A new parseC23AttributeSpec produces an attribute_spec node with attributeForm: 'c23', sharing the parseAttributeItem shape from slice 16. parseAnyAttributeSpec dispatches on the head token between gcc / msvc / c23 forms so callers don't have to. Hooked in at every relevant site: - parseDeclarationSpecifiers accepts a leading [[…]] block (declaration head can be the attribute itself). - structureExternalDeclaration's head dispatch recognises [[…]] as starting a declaration. - parseDeclaration (block-item path) likewise. - parseEnumerator now uses parseAnyAttributeSpec, replacing the ad-hoc inline handling. Items inside [[…]] support all C23 forms: - Plain identifier -> attribute_item { attributeName } - Namespaced prefix::name -> { attributePrefix, attributeName } - With argument list -> + attribute_argument_list (Pratt-parsed) Tests: [[nodiscard]] on a function decl, [[gnu::pure]] namespaced, [[deprecated("reason")]] with a string-literal argument, and a [[deprecated]] applied to an enumerator inside an enum body.

The header of a for-loop is no longer captured as one opaque balanced paren. for_controls now contains three typed slots, each populated with the structured form of its expression or declaration: for_controls init: for_init { value: declaration | <expression-node> | (empty) } cond: for_cond { value: <expression-node> | (empty) } iter: for_iter { value: <expression-node> | (empty) } init dispatches between declaration form (when the head is a specifier, static_assert, or C23 [[…]]) and expression form, consuming the trailing `;` in the expression form so subsequent slots see the right boundary. The declaration form's terminating ';' is part of the declaration node itself. Empty slots (`for (;;)`) keep their `;`s as direct token children so source fidelity is preserved while .value is undefined. Tests: full `for (int i = 0; i < 10; i++)` declaration init form, expression init form `for (i = 0; …)`, and the empty `for (;;)` infinite-loop shape.

Build a 100-file regression corpus with the csmith random C program generator (seeds 1..100). Each file's structured CST is captured as a gzipped JSON fixture; the test suite re-parses the corpus and asserts the result is byte-identical to the fixture. Layout: test/csmith-corpus/seed-NNN.c — csmith output, committed test/csmith-fixtures/seed-NNN.json.gz — golden CST, committed test/csmith-fixture.ts — fixture serializer test/csmith-gen.ts — corpus + fixture generator CLI test/csmith.test.ts — regression test runner Approach: 1. csmith.h's stdint typedefs (int8_t, uint64_t, size_t, FILE, …) are pre-registered in the parser via meta.cmeta before each parse, since the parser doesn't expand `#include`. Without this, e.g. `static int32_t g_2 = 6L;` would parse `int32_t` as the declared name. With the pre-registration, every csmith program structures cleanly with zero `unknown` declarations. 2. The fixture serializer (toFixture) walks `kind`, `children`, and a stable whitelist of scalar metadata. Convenience cross-references that the parser exposes for ergonomic access (.left/.right/.target/ .value when it points to a node, etc.) are dropped to avoid duplication, which would otherwise blow up JSON.stringify exponentially. Trivia tokens (block/line comments, line continuations) are also dropped — csmith's preamble alone is a 1KB block comment per file. 3. The fixture is gzipped at level 9. Average size: ~70 KB per file, so the 100-fixture suite takes ~7.7 MB. Plain JSON would be ~3 MB per file; uncompressed cross-references would have been ~50 MB per file. 4. The test harness has two assertions per seed: - parse-cleanly: every external_declaration must structure (no declKind === 'unknown'), - fixture-match: re-running the parser yields byte-identical fixture JSON to what's committed. Regenerating after a deliberate parser change: npx tsc --build src test node dist-test/csmith-gen.js fixtures # rebuilds *.json.gz Result: 285 tests, 200 from csmith (100 parse + 100 fixture), 85 from the existing unit suite. All pass.

Split test/csmith-gen.ts into: test/csmith-common.ts — pure helpers (STDINT_TYPEDEFS, parseCsmithSource, path constants). Importing has zero side effects. test/csmith-gen.ts — CLI only: imports common + csmith-fixture, runs corpus/fixture generation guarded by require.main === module. test/csmith.test.ts now imports from csmith-common, never from csmith-gen. The test runner therefore never reaches code that calls execSync('csmith ...') or mkdirSync at module-load time. Verified by removing the csmith binary from PATH and rerunning the full suite: 285/285 pass. Restoring the binary still allows `node dist-test/csmith-gen.js all` to regenerate the corpus.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 890bf3b989

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T15:41:52Z

@@ -22,16 +22,19 @@
    "watch": "tsc --build src test -w",
    "build": "node embed-grammar.js && tsc --build src test",


Remove deleted embed step from build command

The build script still invokes node embed-grammar.js, but this commit deletes embed-grammar.js, so npm run build fails immediately with MODULE_NOT_FOUND on a clean checkout and the package cannot be compiled or tested through the documented workflow. Please either restore the script or remove this step from build/embed scripts.

Useful? React with 👍 / 👎.

The earlier slice-1 cleanup removed embed-grammar.js (the JSONC plugin embedded its grammar at build time; the C plugin doesn't need that step) and the go/ directory, but two surfaces still referenced them: - package.json's build/embed scripts ran `node embed-grammar.js`, which fails on a fresh checkout with MODULE_NOT_FOUND. - .github/workflows/build.yml had a build-go job that ran `go build ./...` in a directory that no longer exists. Strip both. The build script is now simply `tsc --build src test`. Verified by running npm clean / install / build / test on a fresh node_modules: 285/285 still pass.

Windows CI was failing for seed 100's fixture-match because git's autocrlf rewrites the .c corpus to CRLF on checkout, while the fixtures (built on Linux) encode the parser's output for LF source. The token .src strings divergence then propagates into the fixture-byte comparison. Two fixes, applied together: - .gitattributes pins .c (and other text files) to `text eol=lf`, so future Windows checkouts keep LF regardless of core.autocrlf. - normaliseEol() in test/csmith.test.ts collapses any \r\n / \r sequences in the corpus to \n before parsing. Belt-and-suspenders: if a Windows clone slipped past the gitattributes (e.g. cloned before this commit landed), the test still passes. Verified locally by injecting CRLF into seed-001.c and rerunning: both `parse seed 001` and `fixture seed 001` still pass.

claude added 23 commits April 30, 2026 13:34

Update README to reflect the structured CST capabilities

0916eeb

Update README: backlog items now structured

235d0d2

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

claude added 2 commits April 30, 2026 15:43

rjrodger merged commit 36567bf into main Apr 30, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C language parser plugin for Jsonic#1

Add C language parser plugin for Jsonic#1
rjrodger merged 25 commits into
mainfrom
claude/c-parser-concrete-ast-1B23d

rjrodger commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -22,16 +22,19 @@
		"watch": "tsc --build src test -w",
		"build": "node embed-grammar.js && tsc --build src test",

Conversation

rjrodger commented Apr 30, 2026

Summary

Key Changes

Implementation Details

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants