Kaomoji (顔文字) are Japanese-style emoticons built from text characters,
e.g. ¯╲_(ツ)_╱¯ or (╯°□°)╯︵┻━┻. Wikipedia
git submodule update --init --recursive
./build_all.shbuild_all.sh produces two .dict files per locale:
| File | Tags used |
|---|---|
kaomoji_en.dict |
locale-specific (en) |
kaomoji_en_all_locales.dict |
all locales merged (en + da + ...) |
The _all_locales variant has more trigger words per Kaomoji at the cost of
mixing languages, so a Danish tag can trigger an English Kaomoji suggestion.
Requires java on PATH.
A single kaomoji.json contains all locales with per-locale tags and descriptions:
{
"locales": ["en", "da"],
"description": {
"en": "English Kaomoji dictionary",
"da": "Dansk Kaomoji-ordbog"
},
"version": 1,
"kaomoji": {
"(◕‿◕)": {
"en": ["happy", "cute"],
"da": ["glad", "sød"]
},
"(╯°□°)╯︵┻━┻": {
"*": ["flip"],
"en": ["tableflip", "rage"],
"da": ["bordvæltning", "raseri"]
}
}
}A special "*" locale adds tags shared by all locales. These are prepended
before each locale's specific tags. Use --no-star-locale to exclude them.
Build one locale at a time:
./build_kaomoji_dict.py kaomoji.json --locale en
./build_kaomoji_dict.py --locale daNote: It defaults to using kaomoji.json if none is given.
Or use --all-locales to merge all locales' tags into a single dictionary with
more trigger words for each Kaomoji:
./build_kaomoji_dict.py --locale en --all-localesVersion is not written back to kaomoji.json by default. Use --bump to
increment the version in the JSON file after building:
./build_kaomoji_dict.py kaomoji.json --locale en --bumpUnicode Word Joiners (U+2060) can be inserted between each character of the
Kaomoji to attempt to prevent line-breaking in the suggestion strip. Use
--word-joiner to enable this (disabled by default).
./build_kaomoji_dict.py kaomoji.json --word-joinerUse --sanitize-input to clean up the input JSON: lowercase all tags, remove
duplicates, and promote tags that appear in every locale to the shared "*"
locale:
./build_kaomoji_dict.py kaomoji.json --sanitize-inputThe input file is modified in place. The process lowercases all tags, removes
duplicates (within each locale and across locales), and promotes tags shared by
all locales to "*".
Before:
{
"(◕‿◕)": {
"*": ["SMILE"],
"en": ["Happy!", "Cute", "cute"],
"da": ["happy", "GLAD"],
"es": ["¿Sonrisa?", "happy"]
},
"¯╲_(ツ)_╱¯": ["SHRUG!", "shrug", "SHRUG"]
}After:
{
"(◕‿◕)": {
"*": ["happy", "smile"],
"en": ["cute"],
"da": ["glad"],
"es": ["sonrisa"]
},
"¯╲_(ツ)_╱¯": ["shrug"]
}All changes visible here:
- Lowercased:
"Happy!"becomes"happy""Cute"/"cute"becomes"cute""GLAD"becomes"glad""SMILE"becomes"smile""SHRUG!"becomes"shrug""¿Sonrisa?"becomes"sonrisa"
- Deduplicated:
["Happy!", "Cute", "cute"]collapsed to{"cute", "happy"}, so only"cute"remains - Star promotion:
"happy"appears inen,da, andesso it is moved to"*" - Star preserved: existing
"SMILE"in"*"kept as"smile" - Star moved first: the
"*"locale is always placed first in the output - Flat entry handled:
["SHRUG!", "shrug", "SHRUG"]lowercased, punctuation-stripped, and deduped - Punctuation & whitespace stripped:
"Happy!"becomes"happy"(exclamation removed),"¿Sonrisa?"becomes"sonrisa"(inverted¿and?removed),"SHRUG!"becomes"shrug"(exclamation removed)
To get Kaomoji suggestions alongside the official upstream emoji entries,
download the .combined wordlists for each locale:
wget https://codeberg.org/Helium314/aosp-dictionaries/raw/branch/main/emoji_cldr_signal_wordlists/emoji_en.combined
wget https://codeberg.org/Helium314/aosp-dictionaries/raw/branch/main/emoji_cldr_signal_wordlists/emoji_da.combinedThen run the merge step manually using --merge-combined / -m:
./build_kaomoji_dict.py --locale en --merge-combined emoji_en.combined \
--output kaomoji_en.dict
./build_kaomoji_dict.py --locale en --all-locales --merge-combined emoji_en.combined \
--output kaomoji_en_combined.dictKaomoji entries are appended to the upstream wordlist, producing a single
.dict per locale with both emoji and Kaomoji. Both standalone and merged
dictionaries use kaomoji:<locale> as the dictionary type prefix.
Note that Kaomoji appear as text suggestions, not rendered emoji.
They consist of multiple Unicode code points (e.g., (╯°□°)╯︵┻━┻), so
HeliBoard displays them inline as text.
The merged description follows the format:
<kaomoji_desc> [<all locales>] (<orig_desc> v<orig_version>).
Note: build_all.sh produces standalone dicts only (kaomoji_en.dict and
kaomoji_en_all_locales.dict). It does not merge with upstream combined
files. Use the commands above for that.
Run all unit tests:
python -m pytest tests/Run ./check.sh to run all linters (flake8, bandit, vulture, pylint,
mypy, vermin, shellcheck) and unit tests (pytest).
Thanks to HeliBoard for making an
awesome keyboard app, and to remi0s for
aosp-dictionary-tools,
which this project uses to build .dict files.