feat(core-agent): add stripUnreadableSymbols for TTS text sanitization by vi70x4 · Pull Request #14 · airi-os/core

vi70x4 · 2026-06-06T21:21:38Z

Extends stripMarkdownFromSpeech with additional passes to strip emoji, decorative Unicode, standalone special chars, math operators, and repeated punctuation from TTS input text. All passes are configurable via options with sensible defaults (strip everything by default).

Streaming control tokens (<|ACT|>, <|DELAY|>, <|CALL|>) are preserved via Private Use Area placeholder extraction.

Closes: extends plaintext-response-format spec

Summary

Extends the plaintext-response-format spec by adding a new stripUnreadableSymbols function that strips emoji, decorative Unicode, standalone special characters, math operators, and repeated punctuation from TTS input text — complementing the existing stripMarkdownFromSpeech function.

What changed

New file: packages/core-agent/src/runtime/unreadable-symbols-stripper.ts
- stripUnreadableSymbols(text, options?) — 6-pass sanitizer (Markdown + emoji + decorative Unicode + standalone special chars + math operators + repeated punctuation collapsing)
- StripUnreadableSymbolsOptions interface with 5 configurable boolean flags (all default true)
- Streaming control tokens (<|ACT|>, <|DELAY|>, <|CALL|>) preserved via Private Use Area placeholder extraction
- stripMarkdownFromSpeech remains exported unchanged (backward compatibility)
New file: packages/core-agent/src/runtime/unreadable-symbols-stripper.test.ts
- 63 tests covering all stripping categories, token preservation, options behavior, edge cases, and backward compatibility
Modified: packages/core-agent/src/runtime/chat-orchestrator-runtime.ts
- Both stripMarkdownFromSpeech calls (streaming path + final categorization path) replaced with stripUnreadableSymbols
Modified: packages/core-agent/src/index.ts
- Added exports for stripUnreadableSymbols and StripUnreadableSymbolsOptions
New spec: .roo/specs/unreadable-symbols-stripper/ (requirements.md, design.md, tasks.md)

How tested

pnpm -F @proj-airi/core-agent typecheck — passed
pnpm -F @proj-airi/core-agent exec vitest run — 164 tests passed (14 test files)

Summary by Sourcery

Introduce a configurable TTS text sanitization utility and integrate it into the chat orchestrator to replace Markdown-only stripping.

New Features:

Add a new stripUnreadableSymbols function with configurable options for removing unreadable symbols from TTS input text and preserving streaming control tokens.
Export stripUnreadableSymbols and its StripUnreadableSymbolsOptions type from the core-agent public API.

Enhancements:

Update the chat orchestrator runtime to use the new stripUnreadableSymbols sanitizer instead of the Markdown-only stripper for both streaming and final categorization paths.

Documentation:

Add design, requirements, and task specs for the unreadable symbols stripper module.

Tests:

Add comprehensive unit tests for stripUnreadableSymbols covering symbol categories, options behavior, token preservation, edge cases, and backward compatibility.

Extends stripMarkdownFromSpeech with additional passes to strip emoji, decorative Unicode, standalone special chars, math operators, and repeated punctuation from TTS input text. All passes are configurable via options with sensible defaults (strip everything by default). Streaming control tokens (<|ACT|>, <|DELAY|>, <|CALL|>) are preserved via Private Use Area placeholder extraction. Closes: extends plaintext-response-format spec

mergeguards · 2026-06-06T21:21:41Z

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

sourcery-ai · 2026-06-06T21:21:44Z

Reviewer's Guide

Adds a new configurable TTS sanitization utility stripUnreadableSymbols, wires it into the chat orchestrator in place of stripMarkdownFromSpeech, exports it from core-agent, and documents the behavior with a dedicated spec and tests.

Flow diagram for stripUnreadableSymbols sanitization pipeline

flowchart TD
  A[Input text with streaming tokens and markdown] --> B[extractStreamingTokens]
  B --> C[safeText]
  C --> D[stripMarkdownFromSpeech]
  D --> E[Pass 2: stripEmoji]
  E --> F[Pass 3: stripDecorativeUnicode]
  F --> G[Pass 4: stripStandaloneSpecialChars]
  G --> H[Pass 5: stripMathOperators]
  H --> I[Pass 6: collapseRepeatedPunctuation]
  I --> J[Collapse multiple spaces]
  J --> K[restoreStreamingTokens]
  K --> L[Trim result]
  L --> M[Sanitized TTS text output]

File-Level Changes

Change	Details	Files
Introduce stripUnreadableSymbols TTS sanitization utility with configurable passes and streaming token preservation.	Define StripUnreadableSymbolsOptions with 5 boolean flags defaulting to true. Implement stripUnreadableSymbols to first delegate to stripMarkdownFromSpeech, then run emoji, decorative Unicode, standalone special char, math-operator, and repeated-punctuation passes in sequence. Add extractStreamingTokens/restoreStreamingTokens helpers using Private Use Area placeholders to protect <	...
Add comprehensive tests for the unreadable symbols stripper behavior and options.	Cover emoji, decorative Unicode, standalone special chars, math operators, repeated punctuation collapsing, and their combinations. Verify streaming control tokens like <	ACT
Wire stripUnreadableSymbols into the chat orchestrator runtime instead of stripMarkdownFromSpeech for TTS speech text.	Import stripUnreadableSymbols alongside stripMarkdownFromSpeech in chat-orchestrator-runtime. Replace the streaming path call that sanitized categorizer.filterToSpeech(...) with stripUnreadableSymbols. Replace the final categorization speech sanitization with stripUnreadableSymbols while leaving reasoning handling unchanged.	`packages/core-agent/src/runtime/chat-orchestrator-runtime.ts`
Expose the new stripper utility and options type from the core-agent public API.	Export stripUnreadableSymbols from the package index next to stripMarkdownFromSpeech. Export the StripUnreadableSymbolsOptions type for external configuration.	`packages/core-agent/src/index.ts`
Document the unreadable symbols stripper design, requirements, and implementation tasks.	Describe the multi-pass architecture, Unicode ranges used, and streaming token protection strategy in the design doc. Capture functional requirements for each stripping category and integration expectations in requirements.md. Record the concrete implementation and testing tasks and their completion status in tasks.md.	`.roo/specs/unreadable-symbols-stripper/design.md` `.roo/specs/unreadable-symbols-stripper/requirements.md` `.roo/specs/unreadable-symbols-stripper/tasks.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

deepsource-io · 2026-06-06T21:22:28Z

DeepSource Code Review

We reviewed changes in 3bdd1ac...77f95c8 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

Important

Some issues found as part of this review are outside of the diff in this pull request and aren't shown in the inline review comments due to GitHub's API limitations. You can see those issues on the DeepSource dashboard.

PR Report Card

Overall Grade Focus Area: Reliability	Security Reliability Complexity Hygiene

Feedback

Logging pattern in browser-facing code

The same console usage appears across multiple browser runtime files, which is why it shows up as nine occurrences of the same reliability issue.
Might be worth deciding when/where you want logging in these paths, since TTS and chat runtimes are going to be user-facing and potentially noisy if this sticks around.

Code Review Summary

Analyzer	Updated (UTC)	Details
JavaScript	Jun 6, 2026 11:25p.m.	Review ↗
Shell	Jun 6, 2026 11:25p.m.	Review ↗
C & C++	Jun 6, 2026 11:25p.m.	Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

sourcery-ai

Hey - I've left some high level feedback:

The streaming token placeholder implementation (TOKEN_PLACEHOLDER_BASE = '\uE0000' with surrounding the index) deviates from the spec’s null-byte approach and relies on a specific Private Use Area codepoint; consider switching to a delimiter that cannot plausibly appear in user text (e.g. \x00-wrapped index) to avoid accidental collisions and simplify the restore regex.
The stripMathOperators regex uses a lookbehind (?<=\s), which can be brittle across JS runtimes and is slightly inconsistent with the boundary-based approach documented in the design; you might want to rework this to use an explicit (^|\s)-style grouping so behavior is both more portable and easier to reason about at string boundaries.
The emoji/decorative Unicode passes chain many overlapping and partially duplicated ranges with multiple replace calls; it may be easier to maintain and reason about if you consolidate these into a smaller number of well-documented ranges (or a single compiled regex) that more directly reflects the categories described in the design.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The streaming token placeholder implementation (`TOKEN_PLACEHOLDER_BASE = '\uE0000'` with surrounding the index) deviates from the spec’s null-byte approach and relies on a specific Private Use Area codepoint; consider switching to a delimiter that cannot plausibly appear in user text (e.g. `\x00`-wrapped index) to avoid accidental collisions and simplify the restore regex.
- The `stripMathOperators` regex uses a lookbehind `(?<=\s)`, which can be brittle across JS runtimes and is slightly inconsistent with the boundary-based approach documented in the design; you might want to rework this to use an explicit `(^|\s)`-style grouping so behavior is both more portable and easier to reason about at string boundaries.
- The emoji/decorative Unicode passes chain many overlapping and partially duplicated ranges with multiple `replace` calls; it may be easier to maintain and reason about if you consolidate these into a smaller number of well-documented ranges (or a single compiled regex) that more directly reflects the categories described in the design.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces the stripUnreadableSymbols utility to sanitize LLM response text for TTS by stripping emoji, decorative Unicode, standalone special characters, math operators, and collapsing repeated punctuation. It integrates this utility into the chat orchestrator runtime and exports it from the core agent package. Feedback on the implementation highlights a bug in the Unicode escape sequence for the token placeholder base, which evaluates to a two-character string instead of a single character. Additionally, it is recommended to combine the numerous consecutive .replace() calls for emoji and decorative Unicode stripping into single regular expressions to improve streaming performance and eliminate redundant range matches.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-06T21:23:39Z

+  if (opts.stripDecorativeUnicode) {
+    // U+2190-U+21FF: Arrows
+    result = result.replace(/[\u{2190}-\u{21FF}]/gu, '')
+    // U+2500-U+257F: Box drawing
+    result = result.replace(/[\u{2500}-\u{257F}]/gu, '')
+    // U+2580-U+259F: Block elements
+    result = result.replace(/[\u{2580}-\u{259F}]/gu, '')
+    // U+25A0-U+25FF: Geometric shapes
+    result = result.replace(/[\u{25A0}-\u{25FF}]/gu, '')
+    // Specific decorative chars: © ® ™ § ¶ † ‡ • ‣ ⁃
+    result = result.replace(/[©®™§¶†‡•‣⁃]/g, '')
+    // U+2100-U+214F: Letterlike symbols
+    result = result.replace(/[\u{2100}-\u{214F}]/gu, '')
+  }


Similarly to the emoji pass, we can combine all decorative Unicode ranges and characters into a single regular expression to reduce the number of .replace() calls from 6 to 1, improving performance during streaming.

if (opts.stripDecorativeUnicode) { result = result.replace(/[\u{2190}-\u{21FF}\u{2500}-\u{257F}\u{2580}-�\u{2580}-\u{259F}\u{25A0}-\u{25FF}©®™§¶†‡•‣⁃\u{2100}-\u{214F}]/gu, '') }

deepsource-io · 2026-06-06T21:23:59Z

@@ -21,6 +21,7 @@ import { createChatHooks } from './agent-hooks'
 import { useLlmmarkerParser } from './llm-marker-parser'
 import { categorizeResponse, createStreamingCategorizer } from './response-categoriser'
 import { stripMarkdownFromSpeech } from './markdown-stripper'


'stripMarkdownFromSpeech' is defined but never used

Unused variables are generally considered a code smell and should be avoided.

deepsource-io · 2026-06-06T21:23:59Z

+function extractStreamingTokens(text: string): { processed: string, tokens: string[] } {
+  const tokens: string[] = []
+  const processed = text.replace(/<\|[^|]+\|>/g, (match) => {
+    const index = tokens.length
+    tokens.push(match)
+    return `${TOKEN_PLACEHOLDER_BASE}${index.toString(36)}${TOKEN_PLACEHOLDER_BASE}`
+  })
+  return { processed, tokens }
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T21:23:59Z

+function restoreStreamingTokens(text: string, tokens: string[]): string {
+  return text.replace(
+    new RegExp(`${TOKEN_PLACEHOLDER_BASE}([0-9a-z]+)${TOKEN_PLACEHOLDER_BASE}`, 'g'),
+    (_, indexStr) => tokens[Number.parseInt(indexStr, 36)] ?? '',
+  )
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T21:23:59Z

+export function stripUnreadableSymbols(
+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  const opts: Required<StripUnreadableSymbolsOptions> = { ...DEFAULT_OPTIONS, ...options }
+
+  // Protect streaming control tokens from stripping
+  const { processed: safeText, tokens } = extractStreamingTokens(text)
+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+
+  // Pass 2: Strip emoji and Unicode pictographic symbols
+  if (opts.stripEmoji) {
+    // Remove variation selectors, ZWJ, keycap combining chars first
+    result = result.replace(/\uFE0F/gu, '')
+    result = result.replace(/\u200D/gu, '')
+    result = result.replace(/\u20E3/gu, '')
+
+    // Remove emoji Unicode ranges
+    // U+1F300-U+1F9FF: Misc symbols, emoticons, transport, supplemental
+    result = result.replace(/[\u{1F300}-\u{1F9FF}]/gu, '')
+    // U+2600-U+26FF: Misc symbols
+    result = result.replace(/[\u{2600}-\u{26FF}]/gu, '')
+    // U+2700-U+27BF: Dingbats
+    result = result.replace(/[\u{2700}-\u{27BF}]/gu, '')
+    // U+1F3FB-U+1F3FF: Skin tone modifiers
+    result = result.replace(/[\u{1F3FB}-\u{1F3FF}]/gu, '')
+    // U+1F1E0-U+1F1FF: Regional indicator symbols (flags)
+    result = result.replace(/[\u{1F1E0}-\u{1F1FF}]/gu, '')
+    // U+1F600-U+1F64F: Emoticons (faces)
+    result = result.replace(/[\u{1F600}-\u{1F64F}]/gu, '')
+    // U+1F680-U+1F6FF: Transport and map symbols
+    result = result.replace(/[\u{1F680}-\u{1F6FF}]/gu, '')
+    // U+1FA00-U+1FAFF: Extended-A and beyond
+    result = result.replace(/[\u{1FA00}-\u{1FAFF}]/gu, '')
+    // U+2702-U+27B0: Dingbats (additional)
+    result = result.replace(/[\u{2702}-\u{27B0}]/gu, '')
+    // U+231A-U+231B: Watch, hourglass
+    result = result.replace(/[\u{231A}-\u{231B}]/gu, '')
+    // U+23E9-U+23F3: Media controls, clocks
+    result = result.replace(/[\u{23E9}-\u{23F3}]/gu, '')
+    // U+23F8-U+23FA: Media controls
+    result = result.replace(/[\u{23F8}-\u{23FA}]/gu, '')
+    // U+25AA-U+25AB, U+25B6, U+25C0, U+25FB-U+25FE: Geometric shapes
+    result = result.replace(/[\u{25AA}-\u{25AB}\u{25B6}\u{25C0}\u{25FB}-\u{25FE}]/gu, '')
+    // U+2614-U+2615: Umbrella, hot beverage
+    result = result.replace(/[\u{2614}-\u{2615}]/gu, '')
+    // U+2648-U+2653: Zodiac
+    result = result.replace(/[\u{2648}-\u{2653}]/gu, '')
+    // U+267F, U+2693, U+26A1, U+26AA-U+26AB, U+26BD-U+26BE, U+26C4-U+26C5, U+26CE, U+26D4, U+26EA, U+26F2-U+26F3, U+26F5, U+26FA, U+26FD: Misc
+    result = result.replace(/[\u{267F}\u{2693}\u{26A1}\u{26AA}-\u{26AB}\u{26BD}-\u{26BE}\u{26C4}-\u{26C5}\u{26CE}\u{26D4}\u{26EA}\u{26F2}-\u{26F3}\u{26F5}\u{26FA}\u{26FD}]/gu, '')
+    // U+2934-U+2935: Arrows
+    result = result.replace(/[\u{2934}-\u{2935}]/gu, '')
+    // U+2B05-U+2B07: Arrows
+    result = result.replace(/[\u{2B05}-\u{2B07}]/gu, '')
+    // U+2B1B-U+2B1C, U+2B50, U+2B55: Geometric shapes
+    result = result.replace(/[\u{2B1B}-\u{2B1C}\u{2B50}\u{2B55}]/gu, '')
+    // U+3030, U+303D, U+3297, U+3299: CJK symbols
+    result = result.replace(/[\u{3030}\u{303D}\u{3297}\u{3299}]/gu, '')
+  }
+
+  // Pass 3: Strip decorative Unicode (arrows, box-drawing, shapes, dingbats)
+  if (opts.stripDecorativeUnicode) {
+    // U+2190-U+21FF: Arrows
+    result = result.replace(/[\u{2190}-\u{21FF}]/gu, '')
+    // U+2500-U+257F: Box drawing
+    result = result.replace(/[\u{2500}-\u{257F}]/gu, '')
+    // U+2580-U+259F: Block elements
+    result = result.replace(/[\u{2580}-\u{259F}]/gu, '')
+    // U+25A0-U+25FF: Geometric shapes
+    result = result.replace(/[\u{25A0}-\u{25FF}]/gu, '')
+    // Specific decorative chars: © ® ™ § ¶ † ‡ • ‣ ⁃
+    result = result.replace(/[©®™§¶†‡•‣⁃]/g, '')
+    // U+2100-U+214F: Letterlike symbols
+    result = result.replace(/[\u{2100}-\u{214F}]/gu, '')
+  }
+
+  // Pass 4: Strip standalone special characters
+  if (opts.stripStandaloneSpecialChars) {
+    // Matches standalone special chars surrounded by whitespace or at string boundaries.
+    // Consumes the surrounding whitespace to avoid double spaces.
+    result = result.replace(/(^|\s)[*#@|\\/~^`]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 5: Strip standalone math/operator symbols
+  if (opts.stripMathOperators) {
+    // Matches standalone operator sequences surrounded by whitespace or boundaries.
+    // Uses lookbehind for whitespace/start and lookahead for whitespace/end.
+    // Does NOT match when adjacent to non-whitespace characters (e.g., C++, A&B).
+    // Consumes the trailing whitespace to avoid double spaces.
+    result = result.replace(/(?:^|(?<=\s))[+=\-<>&^~|\\/%]+(?=\s|$)/g, '')
+  }
+
+  // Pass 6: Collapse repeated punctuation
+  if (opts.collapseRepeatedPunctuation) {
+    result = result.replace(/!{3,}/g, '!')
+    result = result.replace(/\?{3,}/g, '?')
+    result = result.replace(/\.{4,}/g, '…')
+    result = result.replace(/-{3,}/g, '—')
+    result = result.replace(/~{2,}/g, '~')
+  }
+
+  // Collapse multiple spaces into one
+  result = result.replace(/ {2,}/g, ' ')
+
+  // Restore streaming control tokens
+  result = restoreStreamingTokens(result, tokens)
+
+  // Trim leading/trailing whitespace from the overall result
+  return result.trim()
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T21:23:59Z

+ * stripUnreadableSymbols('Price is $5!!! Really???', { collapseRepeatedPunctuation: false })
+ * // -> 'Price is $5!!! Really???'
+ */
+export function stripUnreadableSymbols(


`stripUnreadableSymbols` has a cyclomatic complexity of 6 with "medium" risk

A function with high cyclomatic complexity can be hard to understand and
maintain. Cyclomatic complexity is a software metric that measures the number of
independent paths through a function. A higher cyclomatic complexity indicates
that the function has more decision points and is more complex.

deepsource-io · 2026-06-06T21:23:59Z

+    // U+25A0-U+25FF: Geometric shapes
+    result = result.replace(/[\u{25A0}-\u{25FF}]/gu, '')
+    // Specific decorative chars: © ® ™ § ¶ † ‡ • ‣ ⁃
+    result = result.replace(/[©®™§¶†‡•‣⁃]/g, '')


Use the 'u' flag with regular expressions

It is recommended to use the u flag with regular expressions.

- Switch streaming token placeholders from PUA codepoints to null-byte delimiters (\x00) to match the spec and avoid collision risk - Rework math operator regex to use explicit (^|\s) grouping instead of lookbehind for better portability across JS runtimes - Consolidate emoji/decorative Unicode ranges into fewer, well-documented regex calls (2 passes each instead of ~20 individual replace calls)

mergeguards · 2026-06-06T21:25:10Z

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

…regex calls Deduplicate overlapping ranges (e.g. \u{1F600}-\u{1F64F} is a subset of \u{1F300}-\u{1F9FF}, \u{2614}-\u{2615} is a subset of \u{2600}-\u{26FF}) and combine variation selectors, ZWJ, and keycap chars into the same regex. Reduces from ~23 individual .replace() calls to 2 for all emoji/decorative symbol stripping.

mergeguards · 2026-06-06T21:27:29Z

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

deepsource-io · 2026-06-06T21:28:10Z

+export function stripUnreadableSymbols(
+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  const opts: Required<StripUnreadableSymbolsOptions> = { ...DEFAULT_OPTIONS, ...options }
+
+  // Protect streaming control tokens from stripping
+  const { processed: safeText, tokens } = extractStreamingTokens(text)
+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+
+  // Pass 2: Strip emoji and Unicode pictographic symbols
+  if (opts.stripEmoji) {
+    // Remove variation selectors, ZWJ, keycap combining chars first
+    result = result.replace(/\uFE0F/gu, '')
+    result = result.replace(/\u200D/gu, '')
+    result = result.replace(/\u20E3/gu, '')
+
+    // Emoji ranges: emoticons, faces, transport, misc symbols, dingbats,
+    // skin tones, regional indicators, extended-A/B, supplemental symbols
+    result = result.replace(
+      /[\u{1F300}-\u{1F9FF}\u{1F600}-\u{1F64F}\u{1F680}-\u{1F6FF}\u{1FA00}-\u{1FAFF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}\u{1F3FB}-\u{1F3FF}\u{1F1E0}-\u{1F1FF}\u{2702}-\u{27B0}\u{231A}-\u{231B}\u{23E9}-\u{23F3}\u{23F8}-\u{23FA}\u{25AA}-\u{25AB}\u{25B6}\u{25C0}\u{25FB}-\u{25FE}\u{2614}-\u{2615}\u{2648}-\u{2653}\u{267F}\u{2693}\u{26A1}\u{26AA}-\u{26AB}\u{26BD}-\u{26BE}\u{26C4}-\u{26C5}\u{26CE}\u{26D4}\u{26EA}\u{26F2}-\u{26F3}\u{26F5}\u{26FA}\u{26FD}\u{2934}-\u{2935}\u{2B05}-\u{2B07}\u{2B1B}-\u{2B1C}\u{2B50}\u{2B55}\u{3030}\u{303D}\u{3297}\u{3299}]/gu,
+      '',
+    )
+  }
+
+  // Pass 3: Strip decorative Unicode (arrows, box-drawing, shapes, dingbats)
+  if (opts.stripDecorativeUnicode) {
+    // Arrows, box-drawing, block elements, geometric shapes, letterlike symbols
+    result = result.replace(
+      /[\u{2190}-\u{21FF}\u{2500}-\u{257F}\u{2580}-\u{259F}\u{25A0}-\u{25FF}\u{2100}-\u{214F}]/gu,
+      '',
+    )
+    // Specific decorative chars: © ® ™ § ¶ † ‡ • ‣ ⁃
+    result = result.replace(/[©®™§¶†‡•‣⁃]/g, '')
+  }
+
+  // Pass 4: Strip standalone special characters
+  if (opts.stripStandaloneSpecialChars) {
+    // Matches standalone special chars surrounded by whitespace or at string boundaries.
+    // Consumes the surrounding whitespace to avoid double spaces.
+    result = result.replace(/(^|\s)[*#@|\\/~^`]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 5: Strip standalone math/operator symbols
+  if (opts.stripMathOperators) {
+    // Matches standalone operator sequences at string boundaries or surrounded by whitespace.
+    // Uses explicit (^|\s) grouping for boundary matching — more portable than lookbehind.
+    // Does NOT match when adjacent to non-whitespace characters (e.g., C++, A&B).
+    result = result.replace(/(^|\s)[+=\-<>&^~|\\/%]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 6: Collapse repeated punctuation
+  if (opts.collapseRepeatedPunctuation) {
+    result = result.replace(/!{3,}/g, '!')
+    result = result.replace(/\?{3,}/g, '?')
+    result = result.replace(/\.{4,}/g, '…')
+    result = result.replace(/-{3,}/g, '—')
+    result = result.replace(/~{2,}/g, '~')
+  }
+
+  // Collapse multiple spaces into one
+  result = result.replace(/ {2,}/g, ' ')
+
+  // Restore streaming control tokens
+  result = restoreStreamingTokens(result, tokens)
+
+  // Trim leading/trailing whitespace from the overall result
+  return result.trim()
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T21:28:10Z

+    // Emoji ranges: emoticons, faces, transport, misc symbols, dingbats,
+    // skin tones, regional indicators, extended-A/B, supplemental symbols
+    result = result.replace(
+      /[\u{1F300}-\u{1F9FF}\u{1F600}-\u{1F64F}\u{1F680}-\u{1F6FF}\u{1FA00}-\u{1FAFF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}\u{1F3FB}-\u{1F3FF}\u{1F1E0}-\u{1F1FF}\u{2702}-\u{27B0}\u{231A}-\u{231B}\u{23E9}-\u{23F3}\u{23F8}-\u{23FA}\u{25AA}-\u{25AB}\u{25B6}\u{25C0}\u{25FB}-\u{25FE}\u{2614}-\u{2615}\u{2648}-\u{2653}\u{267F}\u{2693}\u{26A1}\u{26AA}-\u{26AB}\u{26BD}-\u{26BE}\u{26C4}-\u{26C5}\u{26CE}\u{26D4}\u{26EA}\u{26F2}-\u{26F3}\u{26F5}\u{26FA}\u{26FD}\u{2934}-\u{2935}\u{2B05}-\u{2B07}\u{2B1B}-\u{2B1C}\u{2B50}\u{2B55}\u{3030}\u{303D}\u{3297}\u{3299}]/gu,


Unexpected modified Emoji in character class

Unicode includes the characters which are made with multiple code points. RegExp character class syntax (/[abc]/) cannot handle characters which are made by multiple code points as a character; those characters will be dissolved to each code point. Probably the most important concept about Unicode in JavaScript is to treat strings as sequences of code units, as they really are. The confusion appears when the developer thinks that strings are composed of graphemes (or symbols), ignoring the code unit sequence concept.

deepsource-io · 2026-06-06T21:29:07Z

+export function stripUnreadableSymbols(
+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  const opts: Required<StripUnreadableSymbolsOptions> = { ...DEFAULT_OPTIONS, ...options }
+
+  // Protect streaming control tokens from stripping
+  const { processed: safeText, tokens } = extractStreamingTokens(text)
+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+
+  // Pass 2: Strip emoji, pictographic symbols, and decorative Unicode.
+  // Ranges are deduplicated: \u{1F300}-\u{1F9FF} already covers
+  // \u{1F600}-\u{1F64F} (emoticons) and \u{1F680}-\u{1F6FF} (transport),
+  // so those subsets are omitted. \u{2600}-\u{26FF} covers \u{2614}-\u{2615},
+  // \u{2648}-\u{2653}, etc. Variation selectors and ZWJ are included in the
+  // same regex to minimize .replace() calls.
+  if (opts.stripEmoji || opts.stripDecorativeUnicode) {
+    // Build a combined character class from all needed ranges
+    const ranges: string[] = []
+
+    if (opts.stripEmoji) {
+      // Variation selectors, ZWJ, keycap combining chars
+      ranges.push('\uFE0F', '\u200D', '\u20E3')
+      // Emoji & pictographic symbols (deduplicated — no subsets of the above)
+      ranges.push(
+        '\\u{1F300}-\\u{1F9FF}', // Misc symbols, emoticons, transport, supplemental
+        '\\u{1F1E0}-\\u{1F1FF}', // Regional indicator symbols (flags)
+        '\\u{1F3FB}-\\u{1F3FF}', // Skin tone modifiers
+        '\\u{1FA00}-\\u{1FAFF}', // Extended-A and beyond
+        '\\u{2600}-\\u{26FF}', // Misc symbols (covers \u{2614}-\u{2615}, \u{2648}-\u{2653}, etc.)
+        '\\u{2700}-\\u{27BF}', // Dingbats
+        '\\u{231A}-\\u{231B}', // Watch, hourglass
+        '\\u{23E9}-\\u{23F3}', // Media controls, clocks
+        '\\u{23F8}-\\u{23FA}', // Media controls
+        '\\u{25AA}-\\u{25AB}', '\\u{25B6}', '\\u{25C0}', '\\u{25FB}-\\u{25FE}', // Geometric shapes
+        '\\u{2934}-\\u{2935}', // Arrows
+        '\\u{2B05}-\\u{2B07}', // Arrows
+        '\\u{2B1B}-\\u{2B1C}', '\\u{2B50}', '\\u{2B55}', // Geometric shapes
+        '\\u{3030}', '\\u{303D}', '\\u{3297}', '\\u{3299}', // CJK symbols
+      )
+    }
+
+    if (opts.stripDecorativeUnicode) {
+      // Arrows, box-drawing, block elements, geometric shapes, letterlike symbols
+      ranges.push(
+        '\\u{2190}-\\u{21FF}', // Arrows
+        '\\u{2500}-\\u{257F}', // Box drawing
+        '\\u{2580}-\\u{259F}', // Block elements
+        '\\u{25A0}-\\u{25FF}', // Geometric shapes
+        '\\u{2100}-\\u{214F}', // Letterlike symbols
+        '\u00A9', '\u00AE', '\\u{2122}', // © ® ™
+        '\\u{00A7}', '\\u{00B6}', '\\u{2020}', '\\u{2021}', // § ¶ † ‡
+        '\\u{2022}', '\\u{2023}', '\\u{2043}', // • ‣ ⁃
+      )
+    }
+
+    result = result.replace(new RegExp(`[${ranges.join('')}]`, 'gu'), '')
+  }
+
+  // Pass 3: Strip standalone special characters
+  if (opts.stripStandaloneSpecialChars) {
+    // Matches standalone special chars surrounded by whitespace or at string boundaries.
+    // Consumes the surrounding whitespace to avoid double spaces.
+    result = result.replace(/(^|\s)[*#@|\\/~^`]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 4: Strip standalone math/operator symbols
+  if (opts.stripMathOperators) {
+    // Matches standalone operator sequences at string boundaries or surrounded by whitespace.
+    // Uses explicit (^|\s) grouping for boundary matching — more portable than lookbehind.
+    // Does NOT match when adjacent to non-whitespace characters (e.g., C++, A&B).
+    result = result.replace(/(^|\s)[+=\-<>&^~|\\/%]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 5: Collapse repeated punctuation
+  if (opts.collapseRepeatedPunctuation) {
+    result = result.replace(/!{3,}/g, '!')
+    result = result.replace(/\?{3,}/g, '?')
+    result = result.replace(/\.{4,}/g, '…')
+    result = result.replace(/-{3,}/g, '—')
+    result = result.replace(/~{2,}/g, '~')
+  }
+
+  // Collapse multiple spaces into one
+  result = result.replace(/ {2,}/g, ' ')
+
+  // Restore streaming control tokens
+  result = restoreStreamingTokens(result, tokens)
+
+  // Trim leading/trailing whitespace from the overall result
+  return result.trim()
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T21:29:07Z

+ * stripUnreadableSymbols('Price is $5!!! Really???', { collapseRepeatedPunctuation: false })
+ * // -> 'Price is $5!!! Really???'
+ */
+export function stripUnreadableSymbols(


`stripUnreadableSymbols` has a cyclomatic complexity of 8 with "medium" risk

A function with high cyclomatic complexity can be hard to understand and
maintain. Cyclomatic complexity is a software metric that measures the number of
independent paths through a function. A higher cyclomatic complexity indicates
that the function has more decision points and is more complex.

…tch em/en dashes, arrows, and other symbols Em dash (U+2014), en dash (U+2013), and other punctuation in the General Punctuation block were leaking through to TTS. Adding \u{2000}-\u{206F} to the decorative Unicode ranges ensures these symbols are stripped along with arrows and box-drawing characters.

deepsource-io · 2026-06-06T21:59:17Z

+export function stripUnreadableSymbols(
+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  const opts: Required<StripUnreadableSymbolsOptions> = { ...DEFAULT_OPTIONS, ...options }
+
+  // Protect streaming control tokens from stripping
+  const { processed: safeText, tokens } = extractStreamingTokens(text)
+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+
+  // Pass 2: Strip emoji, pictographic symbols, and decorative Unicode.
+  // Ranges are deduplicated: \u{1F300}-\u{1F9FF} already covers
+  // \u{1F600}-\u{1F64F} (emoticons) and \u{1F680}-\u{1F6FF} (transport),
+  // so those subsets are omitted. \u{2600}-\u{26FF} covers \u{2614}-\u{2615},
+  // \u{2648}-\u{2653}, etc. Variation selectors and ZWJ are included in the
+  // same regex to minimize .replace() calls.
+  if (opts.stripEmoji || opts.stripDecorativeUnicode) {
+    // Build a combined character class from all needed ranges
+    const ranges: string[] = []
+
+    if (opts.stripEmoji) {
+      // Variation selectors, ZWJ, keycap combining chars
+      ranges.push('\uFE0F', '\u200D', '\u20E3')
+      // Emoji & pictographic symbols (deduplicated — no subsets of the above)
+      ranges.push(
+        '\\u{1F300}-\\u{1F9FF}', // Misc symbols, emoticons, transport, supplemental
+        '\\u{1F1E0}-\\u{1F1FF}', // Regional indicator symbols (flags)
+        '\\u{1F3FB}-\\u{1F3FF}', // Skin tone modifiers
+        '\\u{1FA00}-\\u{1FAFF}', // Extended-A and beyond
+        '\\u{2600}-\\u{26FF}', // Misc symbols (covers \u{2614}-\u{2615}, \u{2648}-\u{2653}, etc.)
+        '\\u{2700}-\\u{27BF}', // Dingbats
+        '\\u{231A}-\\u{231B}', // Watch, hourglass
+        '\\u{23E9}-\\u{23F3}', // Media controls, clocks
+        '\\u{23F8}-\\u{23FA}', // Media controls
+        '\\u{25AA}-\\u{25AB}', '\\u{25B6}', '\\u{25C0}', '\\u{25FB}-\\u{25FE}', // Geometric shapes
+        '\\u{2934}-\\u{2935}', // Arrows
+        '\\u{2B05}-\\u{2B07}', // Arrows
+        '\\u{2B1B}-\\u{2B1C}', '\\u{2B50}', '\\u{2B55}', // Geometric shapes
+        '\\u{3030}', '\\u{303D}', '\\u{3297}', '\\u{3299}', // CJK symbols
+      )
+    }
+
+    if (opts.stripDecorativeUnicode) {
+      // Arrows, box-drawing, block elements, geometric shapes, letterlike symbols,
+      // general punctuation (em/en dashes, typographic quotes, ellipsis, etc.)
+      ranges.push(
+        '\\u{2190}-\\u{21FF}', // Arrows
+        '\\u{2500}-\\u{257F}', // Box drawing
+        '\\u{2580}-\\u{259F}', // Block elements
+        '\\u{25A0}-\\u{25FF}', // Geometric shapes
+        '\\u{2100}-\\u{214F}', // Letterlike symbols
+        '\\u{2000}-\\u{206F}', // General punctuation (em/en dashes, quotes, ellipsis, etc.)
+        '\u00A9', '\u00AE', '\\u{2122}', // © ® ™
+        '\\u{00A7}', '\\u{00B6}', '\\u{2020}', '\\u{2021}', // § ¶ † ‡
+        '\\u{2022}', '\\u{2023}', '\\u{2043}', // • ‣ ⁃
+      )
+    }
+
+    result = result.replace(new RegExp(`[${ranges.join('')}]`, 'gu'), '')
+  }
+
+  // Pass 3: Strip standalone special characters
+  if (opts.stripStandaloneSpecialChars) {
+    // Matches standalone special chars surrounded by whitespace or at string boundaries.
+    // Consumes the surrounding whitespace to avoid double spaces.
+    result = result.replace(/(^|\s)[*#@|\\/~^`]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 4: Strip standalone math/operator symbols
+  if (opts.stripMathOperators) {
+    // Matches standalone operator sequences at string boundaries or surrounded by whitespace.
+    // Uses explicit (^|\s) grouping for boundary matching — more portable than lookbehind.
+    // Does NOT match when adjacent to non-whitespace characters (e.g., C++, A&B).
+    result = result.replace(/(^|\s)[+=\-<>&^~|\\/%]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 5: Collapse repeated punctuation
+  if (opts.collapseRepeatedPunctuation) {
+    result = result.replace(/!{3,}/g, '!')
+    result = result.replace(/\?{3,}/g, '?')
+    result = result.replace(/\.{4,}/g, '…')
+    result = result.replace(/-{3,}/g, '—')
+    result = result.replace(/~{2,}/g, '~')
+  }
+
+  // Collapse multiple spaces into one
+  result = result.replace(/ {2,}/g, ' ')
+
+  // Restore streaming control tokens
+  result = restoreStreamingTokens(result, tokens)
+
+  // Trim leading/trailing whitespace from the overall result
+  return result.trim()
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

- Add stripMarkdown option to TtsInputChunkOptions and SpeechPipelineOptions - Add stripMarkdownFromText function to strip Markdown formatting - Fix starsUnclosed to detect unclosed ** (bold) patterns by counting ** - Fix early return in segmenter to also check stripMarkdown option - Pass stripMarkdown to segmenter in speech-pipeline.ts - Add tests for stripMarkdown option in speech-pipeline.test.ts Fixes TTS reading **bold** markers as STARSTARboldSTARSTAR when split across chunks.

deepsource-io · 2026-06-06T22:32:37Z

+function stripMarkdownFromText(text: string): string {
+  let result = text
+
+  // Code fences (```...```) — must run before inline code
+  result = result.replace(/^```.*\n([\s\S]*?)^```$/gm, '$1')
+
+  // Inline code (`code`) — preserve inner text
+  result = result.replace(/`([^`]+)`/g, '$1')
+
+  // Bold (**text**) — preserve inner text
+  result = result.replace(/\*\*([^*]+?)\*\*/g, '$1')
+
+  // Strikethrough (~~text~~) — preserve inner text
+  result = result.replace(/~~([^~]+?)~~/g, '$1')
+
+  // Headings (# Heading) — remove # markers at line start, preserve text
+  result = result.replace(/^#{1,6}\s+/gm, '')
+
+  // Bullet lists (- item or * item) — remove marker at line start, preserve text
+  result = result.replace(/^[-*]\s+/gm, '')
+
+  // Numbered lists (1. item) — remove number+dot at line start, preserve text
+  result = result.replace(/^\d+\.\s+/gm, '')
+
+  // Blockquotes (> quote) — remove > marker at line start, preserve text
+  result = result.replace(/^>\s+/gm, '')
+
+  // Italic (*text*) — preserve inner text
+  result = result.replace(/\*([^*]+?)\*/g, '$1')
+
+  // Italic (_text_) — preserve inner text
+  result = result.replace(/_([^_]+?)_/g, '$1')
+
+  // Links [text](url) — preserve link text only
+  result = result.replace(/\[([^\]]+?)\]\([^)]+?\)/g, '$1')
+
+  // Horizontal rules (---, ***, ___) — remove entirely
+  result = result.replace(/^---+$/gm, '')
+  result = result.replace(/^\*\*\*+$/gm, '')
+  result = result.replace(/^___+$/gm, '')
+
+  return result
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T22:32:37Z

+        async tts(request) {
+          ttsRequests.push(request)
+          return request.text
+        },


Found `async` function without any `await` expressions

A function that does not contain any await expressions should not be async (except for some edge cases in TypeScript which are discussed below). Asynchronous functions in JavaScript behave differently than other functions in two important ways:

Remove any remaining ** markers that survived the markdown pass. This handles split markers like **bold + text** where neither chunk has a complete **...** pattern.

deepsource-io · 2026-06-06T23:20:42Z

+export function stripUnreadableSymbols(
+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  const opts: Required<StripUnreadableSymbolsOptions> = { ...DEFAULT_OPTIONS, ...options }
+
+  // Protect streaming control tokens from stripping
+  const { processed: safeText, tokens } = extractStreamingTokens(text)
+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+
+  // Aggressive star stripping for TTS: remove any ** that survived the markdown pass.
+  // This handles split markers like **bold + text** where neither chunk has complete **...**.
+  result = result.replace(/\*\*/g, '')
+
+  // Pass 2: Strip emoji, pictographic symbols, and decorative Unicode.
+  // Ranges are deduplicated: \u{1F300}-\u{1F9FF} already covers
+  // \u{1F600}-\u{1F64F} (emoticons) and \u{1F680}-\u{1F6FF} (transport),
+  // so those subsets are omitted. \u{2600}-\u{26FF} covers \u{2614}-\u{2615},
+  // \u{2648}-\u{2653}, etc. Variation selectors and ZWJ are included in the
+  // same regex to minimize .replace() calls.
+  if (opts.stripEmoji || opts.stripDecorativeUnicode) {
+    // Build a combined character class from all needed ranges
+    const ranges: string[] = []
+
+    if (opts.stripEmoji) {
+      // Variation selectors, ZWJ, keycap combining chars
+      ranges.push('\uFE0F', '\u200D', '\u20E3')
+      // Emoji & pictographic symbols (deduplicated — no subsets of the above)
+      ranges.push(
+        '\\u{1F300}-\\u{1F9FF}', // Misc symbols, emoticons, transport, supplemental
+        '\\u{1F1E0}-\\u{1F1FF}', // Regional indicator symbols (flags)
+        '\\u{1F3FB}-\\u{1F3FF}', // Skin tone modifiers
+        '\\u{1FA00}-\\u{1FAFF}', // Extended-A and beyond
+        '\\u{2600}-\\u{26FF}', // Misc symbols (covers \u{2614}-\u{2615}, \u{2648}-\u{2653}, etc.)
+        '\\u{2700}-\\u{27BF}', // Dingbats
+        '\\u{231A}-\\u{231B}', // Watch, hourglass
+        '\\u{23E9}-\\u{23F3}', // Media controls, clocks
+        '\\u{23F8}-\\u{23FA}', // Media controls
+        '\\u{25AA}-\\u{25AB}', '\\u{25B6}', '\\u{25C0}', '\\u{25FB}-\\u{25FE}', // Geometric shapes
+        '\\u{2934}-\\u{2935}', // Arrows
+        '\\u{2B05}-\\u{2B07}', // Arrows
+        '\\u{2B1B}-\\u{2B1C}', '\\u{2B50}', '\\u{2B55}', // Geometric shapes
+        '\\u{3030}', '\\u{303D}', '\\u{3297}', '\\u{3299}', // CJK symbols
+      )
+    }
+
+    if (opts.stripDecorativeUnicode) {
+      // Arrows, box-drawing, block elements, geometric shapes, letterlike symbols,
+      // general punctuation (em/en dashes, typographic quotes, ellipsis, etc.)
+      ranges.push(
+        '\\u{2190}-\\u{21FF}', // Arrows
+        '\\u{2500}-\\u{257F}', // Box drawing
+        '\\u{2580}-\\u{259F}', // Block elements
+        '\\u{25A0}-\\u{25FF}', // Geometric shapes
+        '\\u{2100}-\\u{214F}', // Letterlike symbols
+        '\\u{2000}-\\u{206F}', // General punctuation (em/en dashes, quotes, ellipsis, etc.)
+        '\u00A9', '\u00AE', '\\u{2122}', // © ® ™
+        '\\u{00A7}', '\\u{00B6}', '\\u{2020}', '\\u{2021}', // § ¶ † ‡
+        '\\u{2022}', '\\u{2023}', '\\u{2043}', // • ‣ ⁃
+      )
+    }
+
+    result = result.replace(new RegExp(`[${ranges.join('')}]`, 'gu'), '')
+  }
+
+  // Pass 3: Strip standalone special characters
+  if (opts.stripStandaloneSpecialChars) {
+    // Matches standalone special chars surrounded by whitespace or at string boundaries.
+    // Consumes the surrounding whitespace to avoid double spaces.
+    result = result.replace(/(^|\s)[*#@|\\/~^`]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 4: Strip standalone math/operator symbols
+  if (opts.stripMathOperators) {
+    // Matches standalone operator sequences at string boundaries or surrounded by whitespace.
+    // Uses explicit (^|\s) grouping for boundary matching — more portable than lookbehind.
+    // Does NOT match when adjacent to non-whitespace characters (e.g., C++, A&B).
+    result = result.replace(/(^|\s)[+=\-<>&^~|\\/%]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 5: Collapse repeated punctuation
+  if (opts.collapseRepeatedPunctuation) {
+    result = result.replace(/!{3,}/g, '!')
+    result = result.replace(/\?{3,}/g, '?')
+    result = result.replace(/\.{4,}/g, '…')
+    result = result.replace(/-{3,}/g, '—')
+    result = result.replace(/~{2,}/g, '~')
+  }
+
+  // Collapse multiple spaces into one
+  result = result.replace(/ {2,}/g, ' ')
+
+  // Restore streaming control tokens
+  result = restoreStreamingTokens(result, tokens)
+
+  // Trim leading/trailing whitespace from the overall result
+  return result.trim()
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

deepsource-io · 2026-06-06T23:22:56Z

+export function stripUnreadableSymbols(
+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  console.debug('[TTS DEBUG] stripUnreadableSymbols INPUT:', JSON.stringify(text))
+  const opts: Required<StripUnreadableSymbolsOptions> = { ...DEFAULT_OPTIONS, ...options }
+
+  // Protect streaming control tokens from stripping
+  const { processed: safeText, tokens } = extractStreamingTokens(text)
+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+  console.debug('[TTS DEBUG] after stripMarkdownFromSpeech:', JSON.stringify(result))
+
+  // Aggressive star stripping for TTS: remove any ** that survived the markdown pass.
+  // This handles split markers like **bold + text** where neither chunk has complete **...**.
+  result = result.replace(/\*\*/g, '')
+  console.debug('[TTS DEBUG] after aggressive star strip:', JSON.stringify(result))
+
+  // Pass 2: Strip emoji, pictographic symbols, and decorative Unicode.
+  // Ranges are deduplicated: \u{1F300}-\u{1F9FF} already covers
+  // \u{1F600}-\u{1F64F} (emoticons) and \u{1F680}-\u{1F6FF} (transport),
+  // so those subsets are omitted. \u{2600}-\u{26FF} covers \u{2614}-\u{2615},
+  // \u{2648}-\u{2653}, etc. Variation selectors and ZWJ are included in the
+  // same regex to minimize .replace() calls.
+  if (opts.stripEmoji || opts.stripDecorativeUnicode) {
+    // Build a combined character class from all needed ranges
+    const ranges: string[] = []
+
+    if (opts.stripEmoji) {
+      // Variation selectors, ZWJ, keycap combining chars
+      ranges.push('\uFE0F', '\u200D', '\u20E3')
+      // Emoji & pictographic symbols (deduplicated — no subsets of the above)
+      ranges.push(
+        '\\u{1F300}-\\u{1F9FF}', // Misc symbols, emoticons, transport, supplemental
+        '\\u{1F1E0}-\\u{1F1FF}', // Regional indicator symbols (flags)
+        '\\u{1F3FB}-\\u{1F3FF}', // Skin tone modifiers
+        '\\u{1FA00}-\\u{1FAFF}', // Extended-A and beyond
+        '\\u{2600}-\\u{26FF}', // Misc symbols (covers \u{2614}-\u{2615}, \u{2648}-\u{2653}, etc.)
+        '\\u{2700}-\\u{27BF}', // Dingbats
+        '\\u{231A}-\\u{231B}', // Watch, hourglass
+        '\\u{23E9}-\\u{23F3}', // Media controls, clocks
+        '\\u{23F8}-\\u{23FA}', // Media controls
+        '\\u{25AA}-\\u{25AB}', '\\u{25B6}', '\\u{25C0}', '\\u{25FB}-\\u{25FE}', // Geometric shapes
+        '\\u{2934}-\\u{2935}', // Arrows
+        '\\u{2B05}-\\u{2B07}', // Arrows
+        '\\u{2B1B}-\\u{2B1C}', '\\u{2B50}', '\\u{2B55}', // Geometric shapes
+        '\\u{3030}', '\\u{303D}', '\\u{3297}', '\\u{3299}', // CJK symbols
+      )
+    }
+
+    if (opts.stripDecorativeUnicode) {
+      // Arrows, box-drawing, block elements, geometric shapes, letterlike symbols,
+      // general punctuation (em/en dashes, typographic quotes, ellipsis, etc.)
+      ranges.push(
+        '\\u{2190}-\\u{21FF}', // Arrows
+        '\\u{2500}-\\u{257F}', // Box drawing
+        '\\u{2580}-\\u{259F}', // Block elements
+        '\\u{25A0}-\\u{25FF}', // Geometric shapes
+        '\\u{2100}-\\u{214F}', // Letterlike symbols
+        '\\u{2000}-\\u{206F}', // General punctuation (em/en dashes, quotes, ellipsis, etc.)
+        '\u00A9', '\u00AE', '\\u{2122}', // © ® ™
+        '\\u{00A7}', '\\u{00B6}', '\\u{2020}', '\\u{2021}', // § ¶ † ‡
+        '\\u{2022}', '\\u{2023}', '\\u{2043}', // • ‣ ⁃
+      )
+    }
+
+    result = result.replace(new RegExp(`[${ranges.join('')}]`, 'gu'), '')
+  }
+
+  // Pass 3: Strip standalone special characters
+  if (opts.stripStandaloneSpecialChars) {
+    // Matches standalone special chars surrounded by whitespace or at string boundaries.
+    // Consumes the surrounding whitespace to avoid double spaces.
+    result = result.replace(/(^|\s)[*#@|\\/~^`]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 4: Strip standalone math/operator symbols
+  if (opts.stripMathOperators) {
+    // Matches standalone operator sequences at string boundaries or surrounded by whitespace.
+    // Uses explicit (^|\s) grouping for boundary matching — more portable than lookbehind.
+    // Does NOT match when adjacent to non-whitespace characters (e.g., C++, A&B).
+    result = result.replace(/(^|\s)[+=\-<>&^~|\\/%]+(?=\s|$)/g, '$1')
+  }
+
+  // Pass 5: Collapse repeated punctuation
+  if (opts.collapseRepeatedPunctuation) {
+    result = result.replace(/!{3,}/g, '!')
+    result = result.replace(/\?{3,}/g, '?')
+    result = result.replace(/\.{4,}/g, '…')
+    result = result.replace(/-{3,}/g, '—')
+    result = result.replace(/~{2,}/g, '~')
+  }
+
+  // Collapse multiple spaces into one
+  result = result.replace(/ {2,}/g, ' ')
+
+  // Restore streaming control tokens
+  result = restoreStreamingTokens(result, tokens)
+
+  // Trim leading/trailing whitespace from the overall result
+  return result.trim()
+}


Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

It is considered a best practice to avoid 'polluting' the global scope with variables that are intended to be local to the script. Global variables created from a script can produce name collisions with global variables created from another script, which will usually lead to runtime errors or unexpected behavior. It is mostly useful for browser scripts.

- Log in chat-orchestrator-runtime.ts (onLiteral, filterToSpeech, stripUnreadableSymbols) - Log in unreadable-symbols-stripper.ts (input, after markdown, after aggressive strip) - Log in pipeline-runtime.ts (applyToken) - Log in streaming-pipeline.ts (appendText) - Log in tts-session.ts (appendText segmenter path)

deepsource-io · 2026-06-06T23:27:13Z

        onLiteral: async (literal) => {
          if (shouldAbort()) return

+          console.log('[TTS DEBUG] onLiteral received:', JSON.stringify(literal))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z


-          const speechOnly = stripMarkdownFromSpeech(categorizer.filterToSpeech(literal, streamPosition))
+          const filtered = categorizer.filterToSpeech(literal, streamPosition)
+          console.log('[TTS DEBUG] after filterToSpeech:', JSON.stringify(filtered))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z

+          const filtered = categorizer.filterToSpeech(literal, streamPosition)
+          console.log('[TTS DEBUG] after filterToSpeech:', JSON.stringify(filtered))
+          const speechOnly = stripUnreadableSymbols(filtered)
+          console.log('[TTS DEBUG] after stripUnreadableSymbols:', JSON.stringify(speechOnly))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z

+  text: string,
+  options?: StripUnreadableSymbolsOptions,
+): string {
+  console.log('[TTS DEBUG] stripUnreadableSymbols INPUT:', JSON.stringify(text))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z

+
+  // Pass 1: Strip Markdown syntax (always run)
+  let result = stripMarkdownFromSpeech(safeText)
+  console.log('[TTS DEBUG] after stripMarkdownFromSpeech:', JSON.stringify(result))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z

+  // Aggressive star stripping for TTS: remove any ** that survived the markdown pass.
+  // This handles split markers like **bold + text** where neither chunk has complete **...**.
+  result = result.replace(/\*\*/g, '')
+  console.log('[TTS DEBUG] after aggressive star strip:', JSON.stringify(result))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z

  return {
    appendText(text: string) {
      if (text.length === 0) return
+      console.log('[TTS STREAMING] appendText:', JSON.stringify(text))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z

    intentId: intent.intentId,
-    appendText: intent.writeLiteral,
+    appendText: (text) => {
+      console.log('[TTS SESSION] appendText (segmenter path):', JSON.stringify(text))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

deepsource-io · 2026-06-06T23:27:13Z


    const applyToken = (payload: SpeechIntentTokenPayload, writer: (intent: IntentHandle, value?: string) => void) => {
      if (!payload || payload.originId === originId) return
+      console.log('[TTS PIPELINE] applyToken:', JSON.stringify(payload.value))


Avoid using console in code that runs on the browser

It is considered a best practice to avoid the use of any console methods in JavaScript code that will run on the browser.

NOTE: If your repository contains a server side project, you can add "nodejs" to the environment property of analyzer meta in .deepsource.toml.
This will prevent this issue from getting raised.
Documentation for the analyzer meta can be found here.
Alternatively, you can silence this issue for your repository as shown here.

If a specific console call is meant to stay for other reasons, you can add a skipcq comment to that line.
This will inform other developers about the reason behind the log's presence, and prevent DeepSource from flagging it.

sourcery-ai Bot reviewed Jun 6, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

deepsource-io Bot reviewed Jun 6, 2026

View reviewed changes

fix(unreadable-symbols-stripper): add aggressive star stripping for TTS

d961c3f

Remove any remaining ** markers that survived the markdown pass. This handles split markers like **bold + text** where neither chunk has a complete **...** pattern.

deepsource-io Bot reviewed Jun 6, 2026

View reviewed changes

debug: add TTS debug logging to trace star star issue

930d47a

deepsource-io Bot reviewed Jun 6, 2026

View reviewed changes

vi70x3 closed this Jun 6, 2026

deepsource-io Bot reviewed Jun 6, 2026

View reviewed changes

Conversation

vi70x4 commented Jun 6, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

How tested

Summary by Sourcery

Uh oh!

mergeguards Bot commented Jun 6, 2026

Uh oh!

sourcery-ai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Flow diagram for stripUnreadableSymbols sanitization pipeline

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

deepsource-io Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DeepSource Code Review

PR Report Card

Feedback

Code Review Summary

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

'stripMarkdownFromSpeech' is defined but never used

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

`stripUnreadableSymbols` has a cyclomatic complexity of 6 with "medium" risk

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Use the 'u' flag with regular expressions

Uh oh!

mergeguards Bot commented Jun 6, 2026

Uh oh!

mergeguards Bot commented Jun 6, 2026

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Unexpected modified Emoji in character class

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

Unexpected function declaration in the global scope, wrap in an IIFE for a local variable, assign as global property for a global variable

Uh oh!

deepsource-io Bot Jun 6, 2026

Choose a reason for hiding this comment

vi70x4 commented Jun 6, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 6, 2026 •

edited

Loading

deepsource-io Bot commented Jun 6, 2026 •

edited

Loading