Skip to content

docs: tag bare BrightScript code fences + add brightscript-fence-required lint rule#54

Draft
bblietz wants to merge 1 commit into
rokudev:v2.0from
bblietz:fix/brightscript-fence-sweep
Draft

docs: tag bare BrightScript code fences + add brightscript-fence-required lint rule#54
bblietz wants to merge 1 commit into
rokudev:v2.0from
bblietz:fix/brightscript-fence-sweep

Conversation

@bblietz
Copy link
Copy Markdown

@bblietz bblietz commented May 20, 2026

Summary

Audit step 2 from dev-doc-review-report-2026-05-16.md §16. Tags every bare ``` fence whose body is confidently BrightScript so the renderer can apply syntax highlighting, AND ships a docs-lint rule (`brightscript-fence-required`) to prevent regression.

The new classifier is the same module used by both the one-shot sweep tool and the new lint rule, so the rule's notion of "looks like BrightScript" is exactly what produced this PR's tags.

Tag breakdown

568 fences auto-tagged across 154 files:

Language Count Notes
brightscript 499 315 strong-signal, 143 directory-context, 8 hand-patched (deep-indented / blockquote fences missed by sweep regex; lint rule caught them via mdast)
c 25 Debugger-protocol struct/enum in socket-based-debugger.md. Tagging as brightscript would actively miscolor them.
text 22 Roku channel manifest snippets, BRS debug dumps (<Component: roAssociativeArray> = { ... }), URL templates
json 7
http 6
xml 1

76 fences intentionally left bare. They're Robot Framework test files (*** Settings ***), single-line pkg:/... paths, RAF API schema descriptions (adPods : [{viewed : Boolean, ...}]), math notation (M = C(-1) S R C T), release-notes literal dumps (a from ' {...}). Leaving them bare prevents the renderer from trying to highlight them as something they aren't.

New files

  • .github/scripts/docs-lint/lib/code-fence-classifier.mjs -- shared classifier returning {lang, confidence, signals}. Conservative; prefers unknown over guessing. Used by both the one-shot sweep tool AND the lint rule below.
  • .github/scripts/docs-lint/rules/code-fences.mjs -- new brightscript-fence-required lint rule: any bare fence whose body classifies as BrightScript with confidence >= 0.7 is reported as an error. Wired into index.mjs RULES array.

How the classifier works

BrightScript signals (any one high-spec or two weak signals -> tag as brightscript):

  • Statement keywords: sub/function/end sub/end function/endif/endwhile/endfor
  • as <Type> annotations (String, Integer, Boolean, Object, Dynamic, Float, ...)
  • CreateObject(\"ro...\") -- case-insensitive on the Ro prefix
  • m.top/m.global/m.video and any m.field reference, assignment, or method call
  • ro-prefixed object names (roSGNode, roMessagePort, roByteArray, ...)
  • BRS REPL prompt (BrightScript> )
  • BRS line comments (')
  • Hex literals (&h...)
  • Library \"...brs\" statement
  • Roku_Ads() framework call
  • observeFieldScoped / callFunc SceneGraph node methods

Non-BrightScript filters (decisive -- prevents BRS misclassification):

  • XML opener (<?xml, SceneGraph component tags with attribute/close-bracket boundary -- explicitly NOT matching <Component: ...> BRS debug-dump syntax)
  • C struct/enum declarations + C int types (debugger protocol)
  • Shebang or shell prompts
  • HTTP request lines (GET /path OR POST https://...)
  • Roku channel manifest key=value lines (title=, mm_icon_focus_hd=, requires_*_drm=, ...)
  • Single URL line / long base64-like token
  • HLS m3u8 directives (#EXT-X-...)
  • Single CamelCase identifier per line (ECP button names like ChannelUp / ChannelDown)
  • Robot Framework markers (*** Settings ***, *** Variables ***, ...)
  • RAF schema descriptions (key : Type shape definitions)
  • Component callback output dumps (init(): v2 style)

Directory-context pass (sweep tool only -- the lint rule is path-blind):

In confirmed BRS directories (REFERENCES/brightscript/, REFERENCES/scenegraph/, DEVELOPER/core-concepts/, DEVELOPER/advertising/, DEVELOPER/media-playback/, DEVELOPER/getting-started/, DEVELOPER/dev-tools/, DEVELOPER/discovery/, DEVELOPER/voice/, DEVELOPER/performance-guide/), fences that don't hit strong classifier signals AND don't contain explicit non-BRS content (above) are promoted to brightscript with directory-context justification. Catches 1-line BRS method calls like adIface.setAdUrl(myAdUrl) that have no distinctive BRS keyword on their own but live in unambiguously-BRS docs.

Test plan

  • node .github/scripts/docs-lint/index.mjs shows ZERO brightscript-fence-required errors after the sweep
  • Lint baseline preserved: same 3 pre-existing findings in REFERENCES/brightscript/interfaces/ifscreen.md (1 pipe-no-blank-above error + 2 html-blank-between-tags warnings)
  • Dry-run sample of 25 dir-context promotions reviewed; 0 false positives in final classifier config
  • All 8 lint-flagged fences my sweep regex missed (blockquote-prefixed > \``and deep-indented ```` inside list items) hand-patched and re-verified
  • CI run on this PR

Out of scope

  • Audit's roku-pay-best_practices underscore mismatch claim -- already fixed upstream; not visible in current v2.0.
  • One pre-existing malformed fence in scenegraph control-nodes docs (createobject(\"roprogramguide\")\``` -- info string has code in it, looks like a missing close fence). Flagged for manual review; not in scope here.
  • The 76 truly-ambiguous fences left bare (Robot Framework, schemas, paths, release-notes literals).

Related

Part of the May 16 audit cleanup plan. Step 1 (typos + _order.yaml + empty-link) is in flight as PR #52. Next sweep PRs in the queue: terminology canonicalization (~25 hits), broken-absolute-link sweep (252 remaining).

…ired lint rule

Audit step 2 of dev-doc-review-report-2026-05-16.md. Tags every bare ```
fence whose body is confidently BrightScript so the renderer can apply
syntax highlighting, and ships a docs-lint rule to prevent regression.

Sweep results (across docs/DEVELOPER, docs/REFERENCES, docs/SPECIFICATIONS,
docs/THE ROKU CHANNEL):

- Total bare fences identified: 644 (regex+mdast aware of 0-3 space and
  blockquote indent)
- Auto-tagged 568 fences in 154 files:
  - brightscript: 499 (315 strong-signal, 143 directory-context, 8 hand-patched
    for deep-indented or blockquote fences the regex sweep missed)
  - c: 25 (debugger protocol struct/enum in socket-based-debugger.md;
    tagging as `brightscript` would have actively miscolored them)
  - text: 22 (Roku channel manifest snippets, BRS debug dumps like
    `<Component: roAssociativeArray> = { ... }`, URL templates)
  - json: 7
  - http: 6
  - xml: 1
- Left bare: 76 fences with truly ambiguous content (Robot Framework test
  files, single-line `pkg:/...` paths, RAF API schema descriptions, math
  notation, release-notes literal dumps)

New tooling:

- `.github/scripts/docs-lint/lib/code-fence-classifier.mjs` - shared
  classifier returning {lang, confidence, signals}. Conservative; prefers
  'unknown' over guessing. Used by the one-shot sweep tool AND the new
  lint rule below.
- `.github/scripts/docs-lint/rules/code-fences.mjs` - new
  `brightscript-fence-required` rule: any bare fence whose body classifies
  as BrightScript with confidence >= 0.7 is reported as an error.
- `.github/scripts/docs-lint/index.mjs` - wired the new rule into the
  RULES array.

Classifier signals (BrightScript): `sub`/`function`/`end sub`/`end function`/
`endif`/`endwhile`/`endfor` keywords; `as Type` annotations (String, Integer,
Boolean, Object, Dynamic, Float, etc.); `CreateObject("ro...")`; `m.top`/
`m.global`/`m.video`/`m.field` references; ro-prefixed object names
(roSGNode, roMessagePort, roByteArray, ...); BRS REPL prompt
(`BrightScript> `); BRS line comments (`'`); hex literals (`&h...`);
`Library "...brs"` statement; `Roku_Ads()` framework call;
`observeFieldScoped` / `callFunc` methods. Two or more signals -> 0.95
confidence; one high-specificity signal -> 0.85; one weak signal alone
stays below the 0.7 act-threshold.

Classifier signals (non-BRS): XML opener (`<?xml`, SceneGraph component
tags); C struct/enum declarations (the debugger-protocol pages);
shebang/shell prompts; HTTP request lines (`GET /path`, `POST https://...`);
Roku channel manifest key=value lines; HLS m3u8 directives (`#EXT-X-...`);
list of single CamelCase identifiers (ECP button names). The classifier
deliberately routes `<Component: ...>` BRS debug-dump format to `text`
not `xml` -- audit caught those as false-positives during dry-run.

Directory-context pass (sweep tool only, NOT the lint rule):

In confirmed BRS directories (REFERENCES/brightscript/, REFERENCES/scenegraph/,
DEVELOPER/core-concepts/, DEVELOPER/advertising/, DEVELOPER/media-playback/,
DEVELOPER/getting-started/, DEVELOPER/dev-tools/, DEVELOPER/discovery/,
DEVELOPER/voice/, DEVELOPER/performance-guide/), fences that don't hit
strong classifier signals AND don't contain explicit non-BRS content are
promoted to brightscript with `directory-context` justification. Catches
1-line BRS method calls like `adIface.setAdUrl(myAdUrl)` that have no
distinctive BRS keywords on their own but live in unambiguously-BRS docs.

Lint baseline preserved: docs-lint shows the same 3 findings as before
(all in `REFERENCES/brightscript/interfaces/ifscreen.md`, unchanged) and
ZERO `brightscript-fence-required` errors after the sweep.

Out of scope (separate follow-up):

- Audit's `roku-pay-best_practices` underscore mismatch claim - already
  fixed upstream before this PR; not visible.
- One pre-existing malformed fence with code in the info string
  (`createobject("roprogramguide")```` not properly closed) - flagged
  for manual review; not in scope here.
- The remaining 76 truly-ambiguous fences (Robot Framework, schema
  descriptions, paths, release-notes literals) - intentionally left
  bare so the renderer treats them as plain text rather than try to
  syntax-highlight them as BRS or another language.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant