Kaomoji dictionaries for HeliBoard and AOSP keyboards

Kaomoji (顔文字) are Japanese-style emoticons built from text characters, e.g. ¯╲_(ツ)_╱¯ or (╯°□°)╯︵┻━┻. Wikipedia

Build

git submodule update --init --recursive
./build_all.sh

build_all.sh produces two .dict files per locale:

File	Tags used
`kaomoji_en.dict`	locale-specific (`en`)
`kaomoji_en_all_locales.dict`	all locales merged (`en` + `da` + ...)

The _all_locales variant has more trigger words per Kaomoji at the cost of mixing languages, so a Danish tag can trigger an English Kaomoji suggestion.

Requires java on PATH.

Format

A single kaomoji.json contains all locales with per-locale tags and descriptions:

{
  "locales": ["en", "da"],
  "description": {
    "en": "English Kaomoji dictionary",
    "da": "Dansk Kaomoji-ordbog"
  },
  "version": 1,
  "kaomoji": {
    "(◕‿◕)": {
      "en": ["happy", "cute"],
      "da": ["glad", "sød"]
    },
    "(╯°□°)╯︵┻━┻": {
      "*": ["flip"],
      "en": ["tableflip", "rage"],
      "da": ["bordvæltning", "raseri"]
    }
  }
}

A special "*" locale adds tags shared by all locales. These are prepended before each locale's specific tags. Use --no-star-locale to exclude them.

Build one locale at a time:

./build_kaomoji_dict.py kaomoji.json --locale en
./build_kaomoji_dict.py --locale da

Note: It defaults to using kaomoji.json if none is given.

Or use --all-locales to merge all locales' tags into a single dictionary with more trigger words for each Kaomoji:

./build_kaomoji_dict.py --locale en --all-locales

Version is not written back to kaomoji.json by default. Use --bump to increment the version in the JSON file after building:

./build_kaomoji_dict.py kaomoji.json --locale en --bump

Unicode Word Joiners (U+2060) can be inserted between each character of the Kaomoji to attempt to prevent line-breaking in the suggestion strip. Use --word-joiner to enable this (disabled by default).

./build_kaomoji_dict.py kaomoji.json --word-joiner

Use --sanitize-input to clean up the input JSON: lowercase all tags, remove duplicates, and promote tags that appear in every locale to the shared "*" locale:

./build_kaomoji_dict.py kaomoji.json --sanitize-input

The input file is modified in place. The process lowercases all tags, removes duplicates (within each locale and across locales), and promotes tags shared by all locales to "*".

Before:

{
  "(◕‿◕)": {
    "*": ["SMILE"],
    "en": ["Happy!", "Cute", "cute"],
    "da": ["happy", "GLAD"],
    "es": ["¿Sonrisa?", "happy"]
  },
  "¯╲_(ツ)_╱¯": ["SHRUG!", "shrug", "SHRUG"]
}

After:

{
  "(◕‿◕)": {
    "*": ["happy", "smile"],
    "en": ["cute"],
    "da": ["glad"],
    "es": ["sonrisa"]
  },
  "¯╲_(ツ)_╱¯": ["shrug"]
}

All changes visible here:

Lowercased:
- "Happy!" becomes "happy"
- "Cute"/"cute" becomes "cute"
- "GLAD" becomes "glad"
- "SMILE" becomes "smile"
- "SHRUG!" becomes "shrug"
- "¿Sonrisa?" becomes "sonrisa"
Deduplicated: ["Happy!", "Cute", "cute"] collapsed to {"cute", "happy"}, so only "cute" remains
Star promotion: "happy" appears in en, da, and es so it is moved to "*"
Star preserved: existing "SMILE" in "*" kept as "smile"
Star moved first: the "*" locale is always placed first in the output
Flat entry handled: ["SHRUG!", "shrug", "SHRUG"] lowercased, punctuation-stripped, and deduped
Punctuation & whitespace stripped: "Happy!" becomes "happy" (exclamation removed), "¿Sonrisa?" becomes "sonrisa" (inverted ¿ and ? removed), "SHRUG!" becomes "shrug" (exclamation removed)

Merge with upstream emoji dictionaries

To get Kaomoji suggestions alongside the official upstream emoji entries, download the .combined wordlists for each locale:

wget https://codeberg.org/Helium314/aosp-dictionaries/raw/branch/main/emoji_cldr_signal_wordlists/emoji_en.combined
wget https://codeberg.org/Helium314/aosp-dictionaries/raw/branch/main/emoji_cldr_signal_wordlists/emoji_da.combined

Then run the merge step manually using --merge-combined / -m:

./build_kaomoji_dict.py --locale en --merge-combined emoji_en.combined \
                        --output kaomoji_en.dict
./build_kaomoji_dict.py --locale en --all-locales --merge-combined emoji_en.combined \
                        --output kaomoji_en_combined.dict

Kaomoji entries are appended to the upstream wordlist, producing a single .dict per locale with both emoji and Kaomoji. Both standalone and merged dictionaries use kaomoji:<locale> as the dictionary type prefix. Note that Kaomoji appear as text suggestions, not rendered emoji. They consist of multiple Unicode code points (e.g., (╯°□°)╯︵┻━┻), so HeliBoard displays them inline as text.

The merged description follows the format: <kaomoji_desc> [<all locales>] (<orig_desc> v<orig_version>).

Note: build_all.sh produces standalone dicts only (kaomoji_en.dict and kaomoji_en_all_locales.dict). It does not merge with upstream combined files. Use the commands above for that.

Tests

Run all unit tests:

python -m pytest tests/

Run ./check.sh to run all linters (flake8, bandit, vulture, pylint, mypy, vermin, shellcheck) and unit tests (pytest).

Acknowledgments

Thanks to HeliBoard for making an awesome keyboard app, and to remi0s for aosp-dictionary-tools, which this project uses to build .dict files.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
aosp-dictionary-tools @ 1e69dd2		aosp-dictionary-tools @ 1e69dd2
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build_all.sh		build_all.sh
build_kaomoji_dict.py		build_kaomoji_dict.py
check.sh		check.sh
kaomoji.json		kaomoji.json
kaomoji_da.dict		kaomoji_da.dict
kaomoji_da_all_locales.dict		kaomoji_da_all_locales.dict
kaomoji_en.dict		kaomoji_en.dict
kaomoji_en_all_locales.dict		kaomoji_en_all_locales.dict
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaomoji dictionaries for HeliBoard and AOSP keyboards

Build

Format

Merge with upstream emoji dictionaries

Tests

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kaomoji dictionaries for HeliBoard and AOSP keyboards

Build

Format

Merge with upstream emoji dictionaries

Tests

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages