Skip to content

netromdk/kaomojidict

Repository files navigation

Kaomoji dictionaries for HeliBoard and AOSP keyboards

Kaomoji (顔文字) are Japanese-style emoticons built from text characters, e.g. ¯╲_(ツ)_╱¯ or (╯°□°)╯︵┻━┻. Wikipedia

Build

git submodule update --init --recursive
./build_all.sh

build_all.sh produces two .dict files per locale:

File Tags used
kaomoji_en.dict locale-specific (en)
kaomoji_en_all_locales.dict all locales merged (en + da + ...)

The _all_locales variant has more trigger words per Kaomoji at the cost of mixing languages, so a Danish tag can trigger an English Kaomoji suggestion.

Requires java on PATH.

Format

A single kaomoji.json contains all locales with per-locale tags and descriptions:

{
  "locales": ["en", "da"],
  "description": {
    "en": "English Kaomoji dictionary",
    "da": "Dansk Kaomoji-ordbog"
  },
  "version": 1,
  "kaomoji": {
    "(◕‿◕)": {
      "en": ["happy", "cute"],
      "da": ["glad", "sød"]
    },
    "(╯°□°)╯︵┻━┻": {
      "*": ["flip"],
      "en": ["tableflip", "rage"],
      "da": ["bordvæltning", "raseri"]
    }
  }
}

A special "*" locale adds tags shared by all locales. These are prepended before each locale's specific tags. Use --no-star-locale to exclude them.

Build one locale at a time:

./build_kaomoji_dict.py kaomoji.json --locale en
./build_kaomoji_dict.py --locale da

Note: It defaults to using kaomoji.json if none is given.

Or use --all-locales to merge all locales' tags into a single dictionary with more trigger words for each Kaomoji:

./build_kaomoji_dict.py --locale en --all-locales

Version is not written back to kaomoji.json by default. Use --bump to increment the version in the JSON file after building:

./build_kaomoji_dict.py kaomoji.json --locale en --bump

Unicode Word Joiners (U+2060) can be inserted between each character of the Kaomoji to attempt to prevent line-breaking in the suggestion strip. Use --word-joiner to enable this (disabled by default).

./build_kaomoji_dict.py kaomoji.json --word-joiner

Use --sanitize-input to clean up the input JSON: lowercase all tags, remove duplicates, and promote tags that appear in every locale to the shared "*" locale:

./build_kaomoji_dict.py kaomoji.json --sanitize-input

The input file is modified in place. The process lowercases all tags, removes duplicates (within each locale and across locales), and promotes tags shared by all locales to "*".

Before:

{
  "(◕‿◕)": {
    "*": ["SMILE"],
    "en": ["Happy!", "Cute", "cute"],
    "da": ["happy", "GLAD"],
    "es": ["¿Sonrisa?", "happy"]
  },
  "¯╲_(ツ)_╱¯": ["SHRUG!", "shrug", "SHRUG"]
}

After:

{
  "(◕‿◕)": {
    "*": ["happy", "smile"],
    "en": ["cute"],
    "da": ["glad"],
    "es": ["sonrisa"]
  },
  "¯╲_(ツ)_╱¯": ["shrug"]
}

All changes visible here:

  • Lowercased:
    • "Happy!" becomes "happy"
    • "Cute"/"cute" becomes "cute"
    • "GLAD" becomes "glad"
    • "SMILE" becomes "smile"
    • "SHRUG!" becomes "shrug"
    • "¿Sonrisa?" becomes "sonrisa"
  • Deduplicated: ["Happy!", "Cute", "cute"] collapsed to {"cute", "happy"}, so only "cute" remains
  • Star promotion: "happy" appears in en, da, and es so it is moved to "*"
  • Star preserved: existing "SMILE" in "*" kept as "smile"
  • Star moved first: the "*" locale is always placed first in the output
  • Flat entry handled: ["SHRUG!", "shrug", "SHRUG"] lowercased, punctuation-stripped, and deduped
  • Punctuation & whitespace stripped: "Happy!" becomes "happy" (exclamation removed), "¿Sonrisa?" becomes "sonrisa" (inverted ¿ and ? removed), "SHRUG!" becomes "shrug" (exclamation removed)

Merge with upstream emoji dictionaries

To get Kaomoji suggestions alongside the official upstream emoji entries, download the .combined wordlists for each locale:

wget https://codeberg.org/Helium314/aosp-dictionaries/raw/branch/main/emoji_cldr_signal_wordlists/emoji_en.combined
wget https://codeberg.org/Helium314/aosp-dictionaries/raw/branch/main/emoji_cldr_signal_wordlists/emoji_da.combined

Then run the merge step manually using --merge-combined / -m:

./build_kaomoji_dict.py --locale en --merge-combined emoji_en.combined \
                        --output kaomoji_en.dict
./build_kaomoji_dict.py --locale en --all-locales --merge-combined emoji_en.combined \
                        --output kaomoji_en_combined.dict

Kaomoji entries are appended to the upstream wordlist, producing a single .dict per locale with both emoji and Kaomoji. Both standalone and merged dictionaries use kaomoji:<locale> as the dictionary type prefix. Note that Kaomoji appear as text suggestions, not rendered emoji. They consist of multiple Unicode code points (e.g., (╯°□°)╯︵┻━┻), so HeliBoard displays them inline as text.

The merged description follows the format: <kaomoji_desc> [<all locales>] (<orig_desc> v<orig_version>).

Note: build_all.sh produces standalone dicts only (kaomoji_en.dict and kaomoji_en_all_locales.dict). It does not merge with upstream combined files. Use the commands above for that.

Tests

Run all unit tests:

python -m pytest tests/

Run ./check.sh to run all linters (flake8, bandit, vulture, pylint, mypy, vermin, shellcheck) and unit tests (pytest).

Acknowledgments

Thanks to HeliBoard for making an awesome keyboard app, and to remi0s for aosp-dictionary-tools, which this project uses to build .dict files.

About

Build Kaomoji dictionaries for HeliBoard / AOSP keyboards.

Topics

Resources

License

Stars

Watchers

Forks

Contributors