Skip to content

iWhatty/text-sanctifier

Repository files navigation

text-sanctifier

npm downloads bundle size license stars

Brutal text normalizer and invisible Unicode scrubber for modern web projects. ~1.66 KB gzipped.

Features

  • Purges zero-width Unicode garbage
  • Normalizes line endings (CRLF, CR, LF) to LF
  • Collapses unwanted spaces and paragraphs
  • Nukes control characters (if enabled)
  • Smart normalization of typographic junk (quotes, dashes, bullets, full-width punctuation)
  • Keyboard-only filtering (retain printable ASCII + full emoji sequences)
    • Preserves ZWJ emoji clusters (👨‍👩‍👧‍👦)
    • Preserves VS16 emoji presentation variants (✌️, ‼️)
  • Configurable via fine-grained flags or ready-made presets
  • Includes strict, loose, and keyboard-only modes
  • Deterministic RegExp usage (no global lastIndex state leaks)

Install

pnpm add text-sanctifier

Quick start

Custom config:

import { summonSanctifier } from 'text-sanctifier';

const clean = summonSanctifier({
  purgeInvisibleChars: true,
  purgeEmojis: true,
  collapseSpaces: true,
  collapseNewLines: true,
  preserveParagraphs: true,
  finalTrim: true,
});

const output = clean(rawText);

Strict preset:

const output = summonSanctifier.strict(rawText);

Loose preset:

const output = summonSanctifier.loose(rawText);

Keyboard-only (no emojis):

const output = summonSanctifier.keyboardOnly(userInput);

Keyboard-only (with emojis):

const output = summonSanctifier.keyboardOnlyEmoji(commentText);

Security notes

  • Not an HTML/XSS sanitizer. This library normalizes and filters plain text.
  • If you need to render untrusted content, render it as text (e.g. textContent), not HTML (innerHTML).
  • If you need to sanitize HTML, use a dedicated HTML sanitizer (e.g. DOMPurify / sanitize-html).
  • Like any text-processing library, extremely large untrusted inputs can be used for CPU/DoS pressure; consider input size limits in high-risk environments.

API

summonSanctifier(options?: SanctifyOptions): (text: string) => string

Creates a reusable sanitizer from an option object.

summonSanctifier.strict

Aggressively purges: emojis, control characters, extra spacing, and newlines.

summonSanctifier.loose

Gently normalizes spacing and newlines while preserving emojis and paragraphs.

summonSanctifier.keyboardOnly

Restricts to printable ASCII only (removes emojis).

summonSanctifier.keyboardOnlyEmoji

Restricts to printable ASCII + full emoji sequences. Preserves ZWJ emoji clusters and emoji presentation variants.

inspectText(text: string): UnicodeTrashReport

Returns a structural report of control codes, invisible chars, newline styles, and more.

import { inspectText } from 'text-sanctifier';

const report = inspectText(input);

/*
{
  hasControlChars: true,
  hasInvisibleChars: true,
  hasMixedNewlines: false,
  newlineStyle: 'LF',
  hasEmojis: true,
  hasNonKeyboardChars: false,
  summary: [
    'Control characters detected.',
    'Invisible Unicode characters detected.',
    'Emojis detected.',
    'Consistent newline style: LF'
  ]
}
*/

Use inspectText to preflight text content before rendering, storing, or linting. It's a diagnostic tool to help inform sanitization needs. Pass the report to getRecommendedSanctifierOptions(report) to auto-generate config flags for summonSanctifier().


Notes

Runtime requirements

Requires a modern JavaScript runtime with ES2020+ support:

  • Node.js 14+
  • Modern evergreen browsers

Package and build info

  • Source (src/): ES2020+ ESM modules with JSDoc
  • Browser build (dist/): Minified ESM bundle for <script type="module">
  • Tree-shake friendly: ships sideEffects: false
  • Zero transpilation: no built-in polyfills or runtime overhead
  • Bundler ready: works with Vite, Rollup, Webpack, Parcel, esbuild

License

Licensed under AGPL-3.0 with WATT3D Additional Terms. See LICENSE and ADDITIONAL_TERMS.md. Commercial AI/model-training use requires compliance with those terms or a separate WATT3D license. © WATT3D.

About

A brutal text normalizer and invisible trash scrubber for modern web projects.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors