Skip to content

Fix !s command replacing characters inside custom Discord emoji IDs#55

Closed
davelindsay84 wants to merge 18 commits into
mainfrom
fix/replacer-bugs
Closed

Fix !s command replacing characters inside custom Discord emoji IDs#55
davelindsay84 wants to merge 18 commits into
mainfrom
fix/replacer-bugs

Conversation

@davelindsay84

@davelindsay84 davelindsay84 commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Custom Discord emoji (e.g. <a:aniblobsweat:488851906022825677>) contain numeric IDs that could accidentally match a user's search term in !s
  • !s 6/bo would turn <a:aniblobsweat:488851906022825677> into <a:aniblobsweat:48885190bo22825bo777> in the bot's output
  • Adds extractEmoji() to strip custom emoji out before replacement runs and re-insert them afterward — same pattern as the existing extractUsers() and extractUrls()

Test plan

  • New regression test: !s 6/bo against a message containing only custom emoji does not corrupt the emoji IDs
  • New unit test: extractEmoji() correctly extracts both animated (<a:name:id>) and static (<:name:id>) emoji
  • All existing replacer tests still pass

🤖 Generated with Claude Code

David Lindsay and others added 18 commits April 22, 2026 12:40
- Split on first / only so search phrases containing / (e.g. AC/DC) work correctly
- Guard isBlockedSearchPhrase against undefined replacement to prevent crash when no / is used
- Fix message skip filter to use === 0 so only !s commands are skipped, not any message containing !s

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Custom emoji like <a:aniblobsweat:48885190**6**022825677> contain numeric IDs
that could match a user's search term, mangling the emoji in the output.
Adds extractEmoji() to pull emoji out before string replacement runs,
mirroring the existing extractUsers() and extractUrls() pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces separate extractUsers/extractEmoji with a single extractDiscordEntities
covering user mentions (<@id>, <@!id>), role mentions (<@&id>), channel mentions
(<#id>), custom emoji (<:name:id>, <a:name:id>), and timestamps (<t:unix:format>).

Extraction now also happens in replaceFirstMessage so entity IDs are treated as
opaque placeholders during string replacement, not just during markdown processing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Placeholder collision: replace |{|url|}| and |{|entity|}| with Unicode
   private-use characters (U+E001/U+E002) which cannot appear in Discord
   messages, eliminating false matches (e.g. !s url/link corrupting re-insertion).

2. URL regex false positives: require an alphabetic TLD (2+ letters) so version
   numbers (v1.2.3), prices (5.00), and decimals (3.14) are no longer extracted
   as URLs and made unsearchable.

3. Markdown link URLs lost: deMarkDown now expands [text](url) to "text url"
   and extracts URLs before removeMd runs, so the URL survives into output
   instead of being silently stripped.

4. LESS_THAN/GREATER_THAN single-replace: switch to /g regex flag so all
   angle-bracket pairs are restored in multi-line messages, not just the first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Source:
- config.js: fix variable name typos (messesageFetchCount, realestOneBlockedPercentneBlockedPercent),
  rewrite JSDoc as concise inline comments, add Token documentation
- index.js: remove duplicate ready handler, clean up brace/spacing style, simplify
  getScore call to use return value directly
- scoring.js: rename schemaCreated flag, prepare insertStmt once inside ensureSchema()
  instead of on every processScores call, fix getTrending label to respect the limit
  parameter, replace findIndex antipattern with .some()
- replacer.js: rename replaceAll → replaceOccurrences to avoid shadowing the built-in,
  document the vertical-tab bold-marker trick, standardize comment style
- one-blocked-message.js: fix triggerPecentage typo, consistent brace style, cleaner JSDoc
- package.json: move jest/nodemon to devDependencies, move remove-markdown to dependencies

Tests:
- scoring.test.js: fix process.enve typo (env reset was silently broken), remove
  unnecessary async/await on synchronous functions, remove duplicate multiline test,
  update getScore calls to use return value
- config.test.js: fix AlloeConfigDump/AllowConigDump description typos, add Token test
- one-blocked-message.test.js: remove empty beforeAll, add DisableOneBlockedMessage test,
  extract makeQuery helper, add regular-user trigger threshold test
- replacer.test.js: import updated exports after rename

Docs:
- CLAUDE.md: new file documenting commands, module map, the full replacer pipeline
  (including why two extraction passes are needed), placeholder rationale,
  env vars, and testing patterns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tecture

Converts unicodeToMerica and deMarkDown from String prototype extensions to
named exported functions (normalizeUnicode, stripMarkdown) so the module
surface is explicit, testable, and free of global side effects. Companion
renames: extractGtAndLt → escapeAngleBrackets, replaceAll → replaceOccurrences,
isBlockedSearchPhrase → isBlockedPhrase. Functions are now declared in strict
dependency order. JSDoc is complete and accurate throughout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bugs fixed:
- escapeAngleBrackets: greedy <(.*)> matched from first < to last > on a
  line, corrupting strings like "<foo> and <bar>". Fixed with <([^>]*)>.
- Bold marker collapse: .replace('\v\v','') only removed the first adjacent
  pair, leaving dangling ** on 3+ consecutive matches. Fixed with /\v\v/g.

Hygiene:
- index.js: indexOf()===0 → startsWith() throughout.
- scoring.js: all prepared statements hoisted into ensureSchema() so each is
  compiled once; sparkle regex compiled once at module scope.

Tests:
- Two regression tests added for each bug, using inputs that specifically
  expose the failure mode (same-line multi-bracket, 3+ consecutive matches).
- Misleading RegExp-as-string test replaced with a proper no-match test plus
  a separate type-guard test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tency

scoring.js: extract a pure parseScoreLine() helper so the parsing logic is
separate from the DB write. Switches processScores to for...of and hoists
the author toString() call out of the loop.

replacer.test.js: reorganize into four focused describe blocks
(splitReplaceCommand, extractUrls, extractDiscordEntities, replaceFirstMessage)
so failures point directly at the broken layer. Fix author field throughout
the shared messages array — using a plain string relied on undefined .bot
being falsy by coincidence. Add missing assertion to the markdown-strip test.
Rewrite test descriptions to be precise and imperative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hout

replacer.js:
- Hoist ENTITY_RE, URL_RE, and BLOCKED_PHRASE_RES to module scope so they
  are compiled exactly once rather than on every call. BLOCKED_PHRASE_RES
  also replaces the per-call new RegExp() in isBlockedPhrase.
- Remove the dead ignoreCase parameter from replaceOccurrences — every call
  site passed true; the false branch was unreachable. Function now always
  matches literally and case-insensitively, as all callers required.
- isBlockedPhrase now uses re.test() instead of phrase.match(re) — test()
  is the correct API when only a boolean is needed.

scoring.js:
- Replace lazy ensureSchema() with module-level eager initialization. All
  statement variables become const. Jest's resetModules: true gives each test
  a fresh module anyway, so the test isolation guarantee is unchanged.
- processScores no longer calls ensureSchema() per invocation.

index.js:
- Rename initialQuery → message (matches Discord.js convention and the rest
  of the codebase). Rename messages → history to avoid shadowing the
  module-level term. Inline getScore() return into the send call.
- Replace string concatenation with template literals throughout.

scoring.test.js:
- Add getTrending overlap test: exercises the filter(r => !top.includes(r))
  false branch, which fires when a phrase appears in both the top-N slice
  and the bottom-N slice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLAUDE.md: replace all stale function names (unicodeToMerica → normalizeUnicode,
deMarkDown → stripMarkdown); update pipeline steps to match current code;
add section on module-scope constants and their restart implication; clarify
{LESS_THAN}/{GREATER_THAN} as string tokens (not PUA); document eager
scoring.js initialization and its effect on test isolation; raise test
instructions to first-class step; note parseScoreLine as internal.

README.md: remove incorrect 'npm install -g discord.js dotenv' instruction
(these are project dependencies, not globals); add npm test step; clean up
formatting.

scoring.js: add comment explaining why reference equality is correct in the
getTrending bottom filter.

jest.config.json: raise coverage thresholds from 59/77/66/66 to 65/88/73/73
to match actual coverage — the old values would not catch a meaningful
regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests, fix cosmetic issues

- Extract applyReplacement() from replaceFirstMessage for single-responsibility clarity
- scoring.test.js: restructure into nested describe blocks (getScore/processScores/getTrending)
- one-blocked-message.test.js: rename makeQuery/query → makeMessage/message throughout
- one-blocked-message.js: convert string concat to template literal
- scoring.js: correct SPARKLE_RE comment (parseScoreLine, not processScores)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
replacer.js:
- Replace {LESS_THAN}/{GREATER_THAN} string tokens with PUA characters
  \uE003/\uE004 — now all four placeholder kinds are PUA-consistent and
  immune to a user typing the token text literally in Discord
- Remove redundant outer capture group from URL_RE (match() with /g flag
  returns full-match strings; inner groups are never read)
- Rename afterEntities → withoutEntities for consistency with stripMarkdown

scoring.js:
- Replace SPARKLE_RE.test()+match() double-execution with a single exec()
- Extract 7-day window to SEVEN_DAYS_MS module constant
- Rename shadowed `rows` parameter in fmt() to `entries`

CLAUDE.md: replace prose placeholder description with a four-row table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
replacer.js:
- Replace 5-chain .replace() in normalizeUnicode with a single compiled
  regex + Map — one pass through the string, and new mappings require
  exactly one new line in the table with no function changes
- Extract reinsert() helper — replaces the ?.forEach mutation pattern at
  both call sites with a named reduce, making the extract→transform→restore
  lifecycle explicit and purely functional
- Fix stripMarkdown JSDoc step 6 to match actual restoration order

scoring.js:
- Format CREATE TABLE as multi-line SQL with aligned column definitions
- Fix parseScoreLine @returns property order to match the code

index.js:
- Use channel alias consistently; remove stray message.channel.send

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…en tests

scoring.js:
- SPARKLE_RE: use non-capturing groups for the sparkle delimiters — groups 1
  and 3 were captured but never read; phrase is now sparkle[1] not sparkle[2]

index.js:
- Parse command and check isBlockedPhrase before fetching message history;
  blocked commands previously paid an unnecessary Discord API call
- Replace regex prefix-strip in !score handler with slice('!score '.length)

config.js:
- Pass explicit radix 10 to Number.parseInt

replacer.test.js:
- Tighten two entity-protection tests from not.toHaveBeenCalledWith(specific
  corruption) to not.toHaveBeenCalled() — the search term exists only inside
  entity IDs, so no match should be found at all
- Update stale regression comment that referenced old {GREATER_THAN} token

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
reactions.js (new):
- SQLite table with PRIMARY KEY (message_id, reactor_id, emoji); at most
  one row per combination at any time
- recordReaction(): INSERT OR IGNORE — silently discards duplicate add events;
  excludes self-reactions at insert time
- removeReaction(): DELETE — if the user re-adds, a fresh row is inserted;
  the count at query time always reflects reality
- getLeaderboard(): returns ranked mentions via <@author_id> for Discord
  rendering; returns "Who is one <emoji> message" for zero results
- parseLeaderCommand(): handles Unicode emoji and custom emoji (<:name:id>,
  <a:name:id>); returns null when no emoji is provided

index.js:
- Add GuildMessageReactions intent and Partials (Message, Channel, Reaction)
  so reactions on uncached messages are received and resolved
- messageReactionAdd / messageReactionRemove event handlers with partial fetch
- !leader command handler

jest.config.json:
- Exclude index.js from coverage — it is a Discord client entrypoint whose
  business logic lives entirely in the tested modules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lvedReaction helper

- reactions.js: rewrite recordReaction JSDoc to explain PRIMARY KEY / INSERT OR IGNORE
  invariant; fix extra whitespace; getLeaderboard returns array spread + join
- reactions.test.js: hoist thumbsUp/makeReaction to outer describe scope, eliminating
  duplication across recordReaction, removeReaction, and getLeaderboard suites
- index.js: extract withResolvedReaction(reaction, user, { fetchMessage, handler })
  helper; collapse both reaction event handlers to single-line registrations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- reactions.js: add proxyAuthors Map and registerProxyMessage(messageId, authorId);
  recordReaction checks the map first so reactions to bot !s replies credit the
  command issuer rather than the bot; self-reaction guard compares against the
  credited author, not the bot
- replacer.js: replaceFirstMessage becomes async and returns Promise<Message|null>
  (null = no match) instead of a boolean; uses for...of instead of every() to
  support await; callers can now obtain the sent Message to register it as a proxy
- index.js: await replaceFirstMessage, call registerProxyMessage on success
- tests: update replacer suite for async + null/not-null assertions; add
  registerProxyMessage tests to reactions suite (118 tests total, up from 116)
- CLAUDE.md: document proxy message mechanism and updated return type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@davelindsay84

Copy link
Copy Markdown
Collaborator Author

Superseded by #56 which includes all these commits plus additional features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant