Skip to content

spec: generic syntax-highlight definition mechanism (#9955)#10129

Open
lonexreb wants to merge 4 commits intowarpdotdev:masterfrom
lonexreb:spec/9955-syntax-highlight-mechanism
Open

spec: generic syntax-highlight definition mechanism (#9955)#10129
lonexreb wants to merge 4 commits intowarpdotdev:masterfrom
lonexreb:spec/9955-syntax-highlight-mechanism

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 5, 2026

Summary

Spec for issue #9955 — adding a new language to Warp's syntax highlighting today requires changes in 5+ places in crates/languages/src/lib.rs (SUPPORTED_LANGUAGES array, language_by_filename, to_arborium_name, get_arborium_highlight_query, plus a grammars/<lang>/ folder), all of which require modifying compiled Rust code and shipping a Warp release. This blocks the most-requested kind of community contribution: "I use $LANG and would happily contribute the highlighting definition."

Investigation

  • The closed registry: SUPPORTED_LANGUAGES: [&str; 32] (lib.rs:23).
  • Hand-written match statements at language_by_filename (lib.rs:115), to_arborium_name (lib.rs:226), get_arborium_highlight_query (lib.rs:239).
  • Grammar substrate: arborium::tree_sitter::{Language, Query} (lib.rs:7) — tree-sitter is the right substrate; this spec preserves it.
  • Per-language directories at crates/languages/grammars/<lang>/ already exist with config.yaml, identifiers.scm, indents.scm — embedded via RustEmbed.
  • Adding a language Warp doesn't yet support requires either upstream arborium support OR vendoring a tree-sitter grammar.

What's in the spec

product.md — 8 testable behavior invariants (B1–B8), 7 acceptance criteria (A1–A7), explicit non-goals (no TextMate-style regex grammars, no native dynamic-library loading, no sub-language injection in V1, no hot-reload), and 6 risks with concrete TECH decisions.

tech.md — picks the three-source discovery architecture, the language.toml schema, the loader, the file-association index, and the staged migration strategy. 5+4 unit tests plus 3 integration tests with a real tree-sitter fixture.

Architectural choices

  • Three-source discovery in priority order: Hardcoded > Bundled > User-local. V1 adds discovery beside the existing hardcoded path; the 32 existing languages keep working unchanged. No flag day — each language migrates in its own independently-revertable PR.
  • Schema-driven language.toml — one contract a contributor learns; everything else is standard tree-sitter files (highlights.scm, indents.scm, optional identifiers.scm).
  • WASM-only for runtime-loaded user grammars. Native dynamic libraries (.so/.dylib/.dll) explicitly rejected with a clear error and no dlopen call exists on the loader path. Bundled grammars can be either WASM or a Cargo dependency on a Rust grammar crate.
  • Validation with clear failure modes: a malformed grammar does NOT break Warp startup. Surfaces via log::error! + Settings → Editor → Languages notification. Other grammars load normally.
  • Tree-sitter substrate preserved. The issue's references to TextMate / Sublime / Midnight Commander are treated as community-distribution exemplars (the goal is contributor velocity), not as recommended technology.
  • Per-language migration template for the 32 existing languages, each migration as a single mechanical PR. The bundled_parsers.rs compile-time map is the only hand-edited file for adding bundled grammars (with an open question about whether to use inventory-style auto-registration to eliminate even that).

Test plan

  • Schema unit T1–T5 (parse, validate, reject native_lib, reject ABI mismatch, reject schema_version mismatch)
  • Loader unit T6–T9 (failed grammar surfaces as LoadFailure not panic, collision dedup, ABI mismatch reporting)
  • Integration IT1–IT3 with a real tree-sitter-toml fixture (load via env var, malformed grammar isolation, registry merge)
  • Regression: existing crates/languages/src/lib_tests.rs and crates/syntax_tree/src/queries/*_tests.rs pass unchanged
  • Manual: drop a ~/.warp/grammars/zig/ directory with language.toml, highlights.scm, grammar.wasm; restart; .zig files render
  • Manual: attempt to load grammar.so; verify clear error, no dlopen

Open questions for maintainer review

  1. WASM grammars require a tree-sitter version supporting WasmStore. Verify against current Cargo.lock.
  2. User-local grammar dir: $XDG_CONFIG_HOME/warp/grammars/ (when set) vs ~/.warp/grammars/ (fallback). Confirm precedence.
  3. Should the Languages settings page allow disabling individual loaded languages? Spec defers this to a follow-up.
  4. Should bundled_parsers.rs use an inventory-style auto-registration pattern to eliminate the only remaining hand-edit? Adds a build-time crate dep but removes the last manual step.

Closes (spec-only) #9955 — implementation PR will follow once spec direction is confirmed. The 32 existing languages will migrate via independent follow-up PRs, one per language.

Adds product.md + tech.md for issue warpdotdev#9955: a contributor-friendly
mechanism for adding new languages to Warp's syntax highlighting
without modifying compiled Rust code and without releasing Warp.

Investigation: today, adding a language requires changes in 5+
places in crates/languages/src/lib.rs (SUPPORTED_LANGUAGES array,
language_by_filename match, to_arborium_name match,
get_arborium_highlight_query match, plus a grammars/<lang>/ folder).
The closed registry blocks the most-requested kind of community
contribution: "I use $LANG and would happily contribute the
highlighting definition." That contribution today requires touching
the internal arborium crate dependency and shipping a Warp release.

Spec proposes:
- Three-source discovery (compile-time hardcoded, bundled directory,
  user-local directory) with explicit priority order. Hardcoded >
  bundled > user-local. Staged migration: V1 adds the discovery
  layer beside the hardcoded path; existing 32 languages keep
  working unchanged. No flag day.
- Schema-driven language.toml (display_name, internal_name,
  comment_prefix, indent_unit, file_associations [extensions,
  filenames, shebangs, aliases], brackets, parser, ts_abi). One
  contract a contributor learns; everything else is standard
  tree-sitter files.
- WASM-only for runtime-loaded user grammars; native dynamic
  libraries (.so/.dylib/.dll) explicitly rejected with a clear
  error message and no dlopen call exists on the loader path.
- Validation with clear failure modes: a malformed grammar does
  NOT break Warp startup; surfaces via log + Settings page
  notification. Other grammars load normally.
- Settings > Editor > Languages page lists all loaded grammars,
  their source, and any failures.
- Tree-sitter substrate preserved (rejects switching to
  TextMate/Sublime regex grammars referenced in the issue as
  community-distribution exemplars only, not as recommended tech).
- Per-language migration template for the 32 existing languages,
  each as an independently revertable PR.

Test plan covers five schema-validation unit tests, four loader
unit tests (including ABI mismatch and collision dedup), and
three integration tests with a real tree-sitter grammar fixture.

Six risks called out (WASM perf cost, schema versioning, capture-
name standard set, ABI mismatch detection, sub-language injection,
theme integration) with concrete TECH decisions or recommended
follow-ups for each.

Four open questions for maintainer review on tree-sitter version
prerequisites, XDG path fallback, per-language disable in V1, and
inventory-style auto-registration for bundled parsers.
@cla-bot cla-bot Bot added the cla-signed label May 5, 2026
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 5, 2026

@lonexreb

I'm starting a first review of this spec-only pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@github-actions github-actions Bot added the external-contributor Indicates that a PR has been opened by someone outside the Warp team. label May 5, 2026
Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

The spec defines a tree-sitter-based registry for bundled and user-local syntax highlighting definitions, with a product contract, loader architecture, migration plan, and tests. The direction addresses the linked issue, but several requirements contradict each other or leave security-critical implementation details unresolved.

Concerns

  • The goal promises no compiled Rust changes/no release for both bundled and user-local paths, while the bundled path still ships in Warp and the tech spec requires Cargo and parser-map changes.
  • The migration plan alternates between a single mechanical migration PR and independently revertable per-language migrations.
  • The loader/API model does not cleanly represent failed grammar loads, yet the product requires failed grammars to appear in diagnostics.
  • The product allows missing highlights.scm, but the tech loader rejects highlight-query load failures.

Security

  • User-local WASM is treated as sufficient sandboxing without specifying resource limits or host capabilities for untrusted parsers loaded at startup.
  • Failure diagnostics log and display full grammar directory paths even though the telemetry section identifies paths as PII.

Verdict

Found: 0 critical, 6 important, 1 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH9955/product.md Outdated
## Goal

A contributor can add a new language to Warp's syntax highlighting
**without modifying compiled Rust code and without releasing Warp**,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This goal conflates the bundled and user-local paths: a source-tree bundled grammar still requires a Warp release, and the tech spec later requires Cargo/parser-map changes. Split the goal so only user-local grammars satisfy the no-release/no-compiled-code path.

Comment thread specs/GH9955/product.md Outdated
match statements are replaced with a registry-driven lookup. The
existing 32 languages get their associations migrated from the
match statements to per-language `language.toml` files in a
single mechanical PR (this spec calls out that PR as a follow-up,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This says the 32 languages migrate in a single mechanical PR, but B4 and the tech spec require independently revertable one-language migrations. Pick one migration strategy so implementers do not plan incompatible rollout paths.

Comment thread specs/GH9955/product.md Outdated
libraries (`.so`, `.dylib`, `.dll`) are explicitly rejected and
never loaded. The WASM is loaded via tree-sitter's existing WASM
runtime. WASM provides the sandboxing that makes user-local
grammars safe.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] WASM sandboxing alone does not define the safety contract for untrusted parsers loaded at startup. Specify host capabilities plus CPU/memory/input-size limits or timeout behavior so a malformed grammar cannot hang or exhaust Warp.

Comment thread specs/GH9955/product.md Outdated
A grammar directory that fails to load (malformed `language.toml`,
WASM that fails to instantiate, `highlights.scm` that fails to
parse against the grammar) does NOT break Warp startup. Instead:
- A `log::error!` fires with the directory path and the failure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] Logging the full grammar directory path conflicts with the tech spec's path-is-PII telemetry rule and can expose usernames or private project paths in shared logs. Require redaction or basename-only diagnostics for logs, with full paths only in local UI if needed.

Comment thread specs/GH9955/tech.md Outdated

# User-local grammars: WASM file path relative to the grammar dir.
# Mutually exclusive with [parser.rust_crate].
wasm = "grammar.wasm"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 [SUGGESTION] Split this schema block into separate bundled and user-local examples; as written, the canonical example sets both rust_crate and wasm even though the comments say they are mutually exclusive.

Comment thread specs/GH9955/tech.md Outdated
pub struct LoadedLanguage {
pub language: Arc<Language>,
pub source: LanguageSource,
pub failure: Option<LoadFailure>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] LoadedLanguage cannot represent a grammar that fails before a Language is constructed because language is mandatory, yet failed grammars must be returned and listed in Settings. Define a separate LoadResult/FailedGrammar variant before implementation.

Comment thread specs/GH9955/tech.md Outdated
Reject if `WasmStore` reports an ABI mismatch with the host's
`tree_sitter::TREE_SITTER_LANGUAGE_VERSION`.
3. Compile `highlights.scm` against the resolved grammar via
`Query::new`. On failure: record `LoadFailure`, return.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] The product spec says a valid grammar with missing highlights.scm still loads without coloring, but this loader step treats highlight-query failure as LoadFailure and returns. Specify missing-file handling separately from invalid-query handling.

Bot review (warpdotdev#10129) identified 6 important + 1 suggestion concerns:

- Goal conflated bundled and user-local paths: bundled still ships
  in Warp and the tech spec requires Cargo / parser-map changes.
- Migration plan alternated between "single mechanical PR" and
  "independently revertable per-language migrations."
- LoadedLanguage couldn't represent grammars that fail before a
  Language is constructed — yet failed grammars must appear in the
  Settings page diagnostics.
- Product allowed missing highlights.scm to load without coloring,
  but the tech loader treated highlight-query failure as
  LoadFailure and returned. Missing-file vs invalid-query were
  conflated.
- WASM treated as sufficient sandboxing without specifying
  resource limits or host capabilities for untrusted parsers.
- Logs and notifications used the full grammar directory path
  while telemetry called paths PII.
- Schema example set both `rust_crate` and `wasm` even though
  comments said they were mutually exclusive.

Fixes:

- Goal split into G1 (user-local: no Warp release) and G2 (bundled:
  no hand-written match arms but does ship with Warp). The original
  issue's actual unblocking outcome — contributor velocity — is now
  served by both paths even though only G1 satisfies "no release."
- Single migration strategy: one PR per language, independently
  revertable, V1 migrates zero. The product spec is now consistent.
- New LoadResult sum type with Loaded(LoadedLanguage) and
  Failed(FailedGrammar) variants. FailedGrammar carries a
  best-effort internal_name (None if even the TOML parse failed)
  and a typed LoadFailureReason. Settings page lists both.
- Missing-vs-invalid split: missing optional .scm files produce
  LoadWarning::HighlightsScmMissing/etc. and the grammar still
  loads. Present-but-invalid queries produce LoadResult::Failed
  and the grammar is rejected (shipping a broken query is worse
  than no query). Tests T10/T11/T12 cover both.
- WASM safety contract: no host capabilities, parse timeout
  (100ms default), input-size cap (8MiB default), runtime memory
  cap (64MiB default), instantiation timeout (5s). All tunable
  via env vars. Worker-thread isolation called out as a known
  V1 limitation. Tests T13/T14/IT5 cover the limits.
- Path redaction: logs use directory basenames only; full paths
  appear ONLY in the local Settings UI (not log output, not
  telemetry). Telemetry payloads enumerated explicitly.
- Schema example split into bundled-only (`rust_crate`) and
  bundled-or-user-local (`wasm`) shapes. validate() now also
  rejects rust_crate in user-local grammars and rejects unknown
  top-level keys.

Each correction includes a "Correction (review warpdotdev#10129)" callout so
reviewers can cross-reference the bot's concerns.
@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 5, 2026

Pushed corrections addressing all 6 important + 1 suggestion concerns from the bot review:

  • Goal split into G1 (user-local: no Warp release required) and G2 (bundled: no hand-written match arms but does ship with Warp). The original issue's actual unblocking outcome — contributor velocity — is served by both paths even though only G1 satisfies "no release."
  • Single migration strategy: one PR per language, independently revertable, V1 migrates zero. Removed the contradictory "single mechanical PR" wording.
  • New LoadResult sum type with Loaded(LoadedLanguage) and Failed(FailedGrammar) variants. FailedGrammar carries a best-effort internal_name (None if TOML parse failed) and a typed LoadFailureReason. Failed grammars are first-class in Settings.
  • Missing-vs-invalid split: missing optional .scm files produce LoadWarning::HighlightsScmMissing and the grammar still loads (B6's "minimum viable" path); present-but-invalid queries produce LoadResult::Failed and the grammar is rejected (shipping a broken query is worse than no query). T10/T11/T12 cover both.
  • WASM safety contract: no host capabilities (rejected at load), parse timeout (100ms), input-size cap (8MiB), runtime memory cap (64MiB), instantiation timeout (5s). Tunable via env vars. Worker-thread isolation called out as a known V1 limitation. T13/T14/IT5 cover the limits.
  • Path redaction: logs use directory basenames only. Full paths appear ONLY in the local Settings UI. Telemetry payloads enumerated — no paths anywhere.
  • Schema example split into bundled-only (rust_crate) and bundled-or-user-local (wasm) shapes. validate() now also rejects rust_crate in user-local grammars and rejects unknown top-level keys to surface contributor typos.

Each correction includes a "Correction (review #10129)" callout in the spec for cross-reference. Ready for re-review.

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 5, 2026

@lonexreb

I'm re-reviewing this spec-only pull request in response to a review request.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec defines a schema-driven mechanism for bundled and user-local tree-sitter grammars, including migration strategy, validation, settings visibility, and privacy/security constraints. The direction aligns with the linked issue, but several design details need to be resolved before implementation can proceed safely.

Concerns

  • The WASM safety contract bounds parser execution but not execution of untrusted highlight/indent/symbol queries.
  • The missing-highlights.scm behavior conflicts with the existing Language API constraints the spec says will remain unchanged.
  • The core user-local WASM path is still gated on an unresolved dependency/version question.

Security

  • User-supplied .scm queries need explicit CPU/memory/output bounds or a fallback path, not just parser timeouts.

Verdict

Found: 0 critical, 3 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH9955/product.md
filesystem, network, or process capabilities to grammar code by
design. The loader rejects any WASM module that attempts to import
symbols outside tree-sitter's required exports.
- **CPU bound — parse timeout.** Each parse invocation is gated by
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] The safety contract only bounds parser execution; untrusted highlights.scm, indents.scm, and identifiers.scm queries can also consume CPU/memory or emit excessive matches. Specify query execution limits (for example match limits, viewport/input caps, timeout/fallback behavior) before treating user-local grammars as bounded.

Comment thread specs/GH9955/product.md

The existing `editor.indent_unit` per-language settings, the
`renderer.theme` highlight color mappings, and any other downstream
consumer of `Language` continues to work. The `Language` struct
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This conflicts with B6's missing-highlights.scm path: the current Language has a mandatory highlight_query, and consumers construct highlight state from it unconditionally. Either make the query optional across the API or specify an empty-query fallback so a no-coloring language can actually be constructed.

Comment thread specs/GH9955/tech.md

## Open questions for maintainer review

1. WASM grammars require a tree-sitter version that supports
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This is a core feasibility dependency for the G1 user-local path, not an optional follow-up. Resolve the WasmStore version question in the spec and include the concrete dependency/upgrade plan if Warp's current tree-sitter stack does not already support it.

…t spec

Round-2 bot review identified 3 remaining concerns:

- The WASM safety contract bounded parser execution but not
  query execution. Tree-sitter Query::matches / Query::captures
  runs in a separate code path with its own potential
  pathologies and was unbounded.
- "Missing highlights.scm loads without coloring" conflicts with
  the actual Language API — pub highlight_query: Query (NOT
  Option<Query>) at crates/languages/src/lib.rs. A Language
  cannot be constructed without one.
- The user-local WASM path is gated on an unresolved tree-sitter
  version question. Warp's bundled grammars come through the
  internal arborium crate (version 2); confirming WASM support
  needs maintainer input.

Fixes:

- Added WARP_GRAMMAR_QUERY_TIMEOUT_MS (default 50ms) bound on
  query execution via a QueryCursor wrapper polling an
  AtomicBool from a watchdog thread. Added
  WARP_GRAMMAR_MAX_QUERY_CAPTURES (default 100k) cap on per-buffer
  query output to bound memory for pathological highlight queries.
  Both apply to highlight, indent, and identifiers queries
  uniformly.
- Missing-highlights.scm now synthesizes an empty highlight query
  via Query::new(grammar, "") instead of leaving the field unset.
  Tree-sitter accepts empty source (zero patterns); matches at
  runtime return zero captures so no coloring is applied. The
  Language API stays unchanged, preserving B8. Indent and
  identifiers queries are already Option<Query> in the codebase
  so they keep their None semantics for missing files.
- G1 (user-local grammars, no Warp release) is explicitly deferred
  out of V1 until the arborium / tree-sitter version question
  resolves. V1 ships only G2 (bundled-grammar discovery). User-
  local WASM is wired through the loader as
  LoadResult::Failed { reason: UserLocalWasmNotYetSupported } so
  the API shape stabilizes without enabling the path. A follow-up
  PR flips the gate once the WASM-tree-sitter version is
  confirmed.

Each correction includes a "Correction (re-review warpdotdev#10129)" callout.
@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 5, 2026

Round-2 corrections pushed (commit 05b4a80):

  • Query execution bounds: the parse timeout bounded parsing, not query matching. Added WARP_GRAMMAR_QUERY_TIMEOUT_MS (default 50ms per buffer per query type) via a QueryCursor wrapper polling an AtomicBool from a watchdog thread, plus WARP_GRAMMAR_MAX_QUERY_CAPTURES (default 100k) for memory bound. Applied uniformly to highlight, indent, and identifiers queries.
  • Missing-highlights.scm vs. Language API: I was wrong — pub struct Language { ..., pub highlight_query: Query } is not Option<Query>. Corrected design: synthesize an empty Query::new(grammar, "") for missing files (tree-sitter accepts zero patterns), so the Language API stays unchanged (preserving B8) and matches just return zero captures. Indent and identifiers queries are already Option<Query> in the codebase so they keep their None semantics for missing.
  • WASM tree-sitter version unresolved: explicitly deferred G1 (user-local grammars, no Warp release) out of V1. V1 ships only G2 (bundled-grammar discovery layer). User-local WASM is wired through the loader as LoadResult::Failed { reason: UserLocalWasmNotYetSupported } so the API shape stabilizes without enabling the path. A follow-up PR flips the gate once the arborium / tree-sitter version question resolves.

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 5, 2026

@lonexreb

I'm re-reviewing this spec-only pull request in response to a review request.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec proposes a product and technical design for schema-driven syntax-highlight grammar discovery, including bundled grammar definitions and an eventual user-local WASM grammar path.

Concerns

  • The V1 scope is internally inconsistent: the goal section defers user-local WASM grammars, while the behavior contract, acceptance criteria, architecture, and tests still require user-local grammars to load and render in V1.
  • The bundled grammar path still has wording that understates the required Rust/Cargo changes, which would mislead follow-up implementers and contributors.
  • The diagnostic surface is left as either a CLI command or settings page in one section, while acceptance criteria require the settings page, so the required deliverable should be made singular.

Security

  • Telemetry includes user-controlled internal_name values for grammar load events. For user-local grammars, this can disclose private project/customer language names to analytics unless the spec restricts, hashes, or omits that field for user-local sources.

Verdict

Found: 1 critical, 2 important, 1 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH9955/product.md Outdated
`.nim` files render with syntax highlighting in Warp.
No edits to `lib.rs` were required.

A2. A user drops `~/.warp/grammars/zig/` containing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 [CRITICAL] A2 requires user-local WASM grammars to render after restart, but G1 explicitly says V1 does not ship this path and should return UserLocalWasmNotYetSupported; make the user-local acceptance criteria/tests follow-up-only or bring G1 back into V1 and resolve the WasmStore dependency question.

Comment thread specs/GH9955/product.md

**Bundled:** A `crates/languages/grammars/<lang>/` directory is
discovered at compile time via the existing `RustEmbed` mechanism.
A new language directory is the only required Rust change; no
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This says the directory is the only required Rust change, but G2 and tech.md still require a Cargo dependency or bundled_parsers.rs parser-map entry; reword this to promise no lib.rs match-arm edits rather than no other Rust/Cargo changes.

Comment thread specs/GH9955/tech.md Outdated
> across both surfaces.

**Telemetry events** (sent to Warp's analytics):
- `grammar_loaded`: one-time at startup. Payload: `internal_name`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] internal_name is user-controlled for user-local grammars and may reveal private project/customer names in analytics; specify that user-local values are omitted, allowlisted, or hashed before telemetry is emitted.

Comment thread specs/GH9955/product.md Outdated

### B7 — Discoverability of installed grammars

A new command `warp_grammars list` (or a settings-page surface,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 [SUGGESTION] Pick one required diagnostic surface here; A7 requires Settings → Editor → Languages, so leaving warp_grammars list as an alternative makes the implementation target ambiguous.

…t (round 3)

Round-3 bot review identified 1 critical + 2 important + 1 security:

- V1 scope inconsistent: goal section deferred user-local WASM,
  but acceptance criteria A2/A4/A5 still required user-local
  grammars to load and render in V1.
- Bundled grammar G2 wording understated the actual Rust/Cargo
  changes ("no edits to lib.rs were required" was too narrow).
- Diagnostic surface offered "CLI command OR settings page" while
  A7 required the settings page.
- Telemetry sent user-controlled `internal_name` for user-local
  grammars, which can disclose private/customer language names.

Fixes:

- A2/A4/A5 rewritten for V1 (bundled-only). Original user-local
  criteria preserved as A2.future / A4.future / A5.future for the
  follow-up PR. A7 unchanged (Settings page required).
- G2 expanded to honestly list the Cargo changes (Cargo.toml +
  bundled_parsers.rs entry per language) while clarifying these
  are mechanical one-line additions, not the five-place hand-coded
  match-statement spread the issue was asking us to remove.
- B7 narrowed: Settings → Editor → Languages page is the required
  deliverable. CLI command is a non-V1 follow-up.
- Telemetry split by source: Hardcoded/Bundled keep internal_name
  (public, ship in binary); UserLocal events strip internal_name
  entirely. Aggregate counts of user-local adoption without
  learning which grammars individuals installed. reason_kind alone
  is enough to identify systemic failure modes.

Each correction includes a "Correction (re-review warpdotdev#10129)" callout.
@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 5, 2026

Round-3 corrections pushed (commit 9a5a5c0):

  • V1 scope: A2/A4/A5 rewritten for V1 (bundled-only). The user-local criteria are preserved as A2.future / A4.future / A5.future for the follow-up PR that flips the gate.
  • G2 honesty: expanded to list the actual Cargo changes (Cargo.toml + one bundled_parsers.rs entry per language), while clarifying these are mechanical one-line additions in two places — not the five-place hand-coded match spread the issue was asking us to remove.
  • Diagnostic surface: narrowed to Settings → Editor → Languages page as the required V1 deliverable. CLI command is a follow-up.
  • Telemetry user-controlled internal_name (security): split events by source. Hardcoded/Bundled keep internal_name (public, ships in Warp). UserLocal events strip internal_name entirely — aggregate counts only, no disclosure of customer-named DSLs to analytics.

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 5, 2026

@lonexreb

I'm re-reviewing this spec-only pull request in response to a review request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This spec defines a path toward schema-driven syntax-highlight grammars while preserving the existing hardcoded languages. The bundled-language direction is generally scoped, but the V1 boundary is still internally inconsistent after deferring user-local WASM support.

Concerns

  • The product spec still describes user-local grammars as loaded in V1 even though the goals and acceptance criteria say they must surface as UserLocalWasmNotYetSupported.
  • The tech spec cannot represent the required deferred user-local state because LoadFailureReason omits the required variant and the loader algorithm still instantiates WASM unconditionally.
  • The integration tests still assert future G1 behavior, which would force implementation of a path the product spec says is not in V1.

Security

  • The user-local WASM safety contract documents that same-thread parser hangs can still starve syntax-tree refresh. Before user-local grammar loading is enabled, the spec needs a fail-closed or isolated execution requirement rather than accepting this as a V1 limitation.

Verdict

Found: 1 critical, 5 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH9955/product.md

**Bundled:** A `crates/languages/grammars/<lang>/` directory is
discovered at compile time via the existing `RustEmbed` mechanism.
A new language directory is the only required Rust change; no
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This still says the directory is the only required change, but G2 above requires a Cargo.toml dependency and a bundled_parsers.rs entry for Rust-crate parsers. Update B2 so the product contract matches the documented contributor steps.

Comment thread specs/GH9955/product.md

**User-local:** A `~/.warp/grammars/<lang>/` (or
`$XDG_CONFIG_HOME/warp/grammars/<lang>/`) directory is discovered at
startup. User-local grammars are loaded after bundled grammars and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] V1 defers user-local parser loading to UserLocalWasmNotYetSupported, but this behavior contract says user-local grammars are loaded after bundled and collision-deduped. Rewrite B2 as detect-and-report-only for V1, moving loading and override semantics to the future G1 criteria.

Comment thread specs/GH9955/product.md
- **Startup-load timeout.** WASM module instantiation is wrapped in
a 5-second hard timeout. A grammar that fails to instantiate in
time is treated as a load failure (B6).
- **No worker isolation in V1.** All parsers share the editor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] The safety contract cannot accept same-thread hangs for user-local grammars: a malicious or broken grammar can still starve syntax-tree refresh despite the timeout. Gate G1 on worker isolation or killable execution, or state that any non-cancellable hang disables the grammar before user-local loading is enabled.

Comment thread specs/GH9955/tech.md
pub schema_version: Option<u32>,
}

pub enum LoadFailureReason {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 [CRITICAL] A2 requires LoadResult::Failed { reason: UserLocalWasmNotYetSupported }, but this enum has no such variant. Add the variant and specify where the loader returns it, otherwise the V1 acceptance criteria cannot be implemented as written.

Comment thread specs/GH9955/tech.md
- `rust_crate`: look up via the compile-time `bundled_parsers.rs`
map. On miss: return `LoadResult::Failed { reason:
ParserCrateNotFound }`.
- `wasm`: `tree_sitter::WasmStore::load_language(&wasm_bytes)`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] The loading algorithm still instantiates every wasm parser, but V1 must not load user-local WASM. Add a source-specific branch before parser resolution that returns UserLocalWasmNotYetSupported for UserLocal, and clarify whether bundled WASM is also gated or supported in V1.

Comment thread specs/GH9955/tech.md
tree-sitter-toml grammar shrunk to a minimal subset), point
`WARP_USER_GRAMMAR_DIR` env var at it, call
`discover_grammars()`. Assert the language returns as
`LoadResult::Loaded(...)` and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] IT1 still requires a user-local grammar to load and register, contradicting A2's V1 behavior. Change the V1 integration test to assert Failed(UserLocalWasmNotYetSupported) and no language_by_filename association; move the loaded-parser assertion to the G1 follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed external-contributor Indicates that a PR has been opened by someone outside the Warp team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant