Brutal text normalizer and invisible Unicode scrubber for modern web projects. ~1.66 KB gzipped.
- Purges zero-width Unicode garbage
- Normalizes line endings (CRLF, CR, LF) to LF
- Collapses unwanted spaces and paragraphs
- Nukes control characters (if enabled)
- Smart normalization of typographic junk (quotes, dashes, bullets, full-width punctuation)
- Keyboard-only filtering (retain printable ASCII + full emoji sequences)
- Preserves ZWJ emoji clusters (👨👩👧👦)
- Preserves VS16 emoji presentation variants (✌️,
‼️ )
- Configurable via fine-grained flags or ready-made presets
- Includes strict, loose, and keyboard-only modes
- Deterministic RegExp usage (no global
lastIndexstate leaks)
pnpm add text-sanctifierCustom config:
import { summonSanctifier } from 'text-sanctifier';
const clean = summonSanctifier({
purgeInvisibleChars: true,
purgeEmojis: true,
collapseSpaces: true,
collapseNewLines: true,
preserveParagraphs: true,
finalTrim: true,
});
const output = clean(rawText);Strict preset:
const output = summonSanctifier.strict(rawText);Loose preset:
const output = summonSanctifier.loose(rawText);Keyboard-only (no emojis):
const output = summonSanctifier.keyboardOnly(userInput);Keyboard-only (with emojis):
const output = summonSanctifier.keyboardOnlyEmoji(commentText);- Not an HTML/XSS sanitizer. This library normalizes and filters plain text.
- If you need to render untrusted content, render it as text (e.g.
textContent), not HTML (innerHTML). - If you need to sanitize HTML, use a dedicated HTML sanitizer (e.g. DOMPurify / sanitize-html).
- Like any text-processing library, extremely large untrusted inputs can be used for CPU/DoS pressure; consider input size limits in high-risk environments.
Creates a reusable sanitizer from an option object.
Aggressively purges: emojis, control characters, extra spacing, and newlines.
Gently normalizes spacing and newlines while preserving emojis and paragraphs.
Restricts to printable ASCII only (removes emojis).
Restricts to printable ASCII + full emoji sequences. Preserves ZWJ emoji clusters and emoji presentation variants.
Returns a structural report of control codes, invisible chars, newline styles, and more.
import { inspectText } from 'text-sanctifier';
const report = inspectText(input);
/*
{
hasControlChars: true,
hasInvisibleChars: true,
hasMixedNewlines: false,
newlineStyle: 'LF',
hasEmojis: true,
hasNonKeyboardChars: false,
summary: [
'Control characters detected.',
'Invisible Unicode characters detected.',
'Emojis detected.',
'Consistent newline style: LF'
]
}
*/Use inspectText to preflight text content before rendering, storing, or linting. It's a diagnostic tool to help inform sanitization needs. Pass the report to getRecommendedSanctifierOptions(report) to auto-generate config flags for summonSanctifier().
Requires a modern JavaScript runtime with ES2020+ support:
- Node.js 14+
- Modern evergreen browsers
- Source (
src/): ES2020+ ESM modules with JSDoc - Browser build (
dist/): Minified ESM bundle for<script type="module"> - Tree-shake friendly: ships
sideEffects: false - Zero transpilation: no built-in polyfills or runtime overhead
- Bundler ready: works with Vite, Rollup, Webpack, Parcel, esbuild
Licensed under AGPL-3.0 with WATT3D Additional Terms. See LICENSE and ADDITIONAL_TERMS.md. Commercial AI/model-training use requires compliance with those terms or a separate WATT3D license. © WATT3D.