refactor: extract language rules to hook-based injection + add hangeul mode by claudianus · Pull Request #299 · JuliusBrussee/caveman

claudianus · 2026-04-26T21:43:46Z

Problem

Caveman ultra cannot compress Korean. English rules target articles, be-verbs, short synonyms — all useless for Korean grammar. Users who write Korean prompts get verbose Korean responses even with caveman active.

Also, SKILL.md inline language rules burden English users with token cost. At 5+ languages, this pattern breaks.

Solution

Hook-based dynamic rule injection. Language rules move from skills/caveman/SKILL.md to rules/<lang>-compression.md. The caveman-activate.js hook loads only the active mode's rules. Wenyan rules also extracted (cleanup).

Hangeul mode adds Korean-specific compression:

Honorific drop (~합니다 → ~함)
Particle drop (은/는/이/가/을/를)
Noun endings (~됨, ~완료)
Connective → symbols (때문에 → →)
반말 default, fragments OK

Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean)

Arm	Avg tokens	Savings
Baseline (normal Korean)	1,519	—
Caveman Ultra (English rules)	550	-64%
Hangeul Ultra (Korean rules)	408	-73%

Hangeul saves extra 26% vs English caveman rules alone.

English user impact: ZERO

SKILL.md unchanged in behavior. Language rules load via hook only when that language mode is active. English users never see Korean rules in their system prompt.

Wenyan: no change

Wenyan rules extracted to rules/wenyan-compression.md, loaded by hook — identical user experience.

Files changed (11)

skills/caveman/SKILL.md — +hangeul intensity rows, +examples, +Language Rules section
rules/hangeul-compression.md — new Korean rules (hook-loaded)
rules/wenyan-compression.md — new (extracted from SKILL.md)
hooks/caveman-activate.js — +hangeul alias, +per-mode rule loading
hooks/caveman-config.js — +hangeul-* to VALID_MODES
hooks/caveman-mode-tracker.js — +hangeul parsing, +Korean per-turn reminder
rules/caveman-activate.md — +hangeul to switch line
README.md — +hangeul before/after, +benchmark, +honesty clause
evals/prompts/ko.txt — 10 Korean eval prompts
benchmarks/run_deepseek.py — DeepSeek API benchmark harness
docs/honesty-clause.md — phonetic hangul limitation documented

Builds on

This approach settles the open language PRs (#54 Korean, #85 Japanese) by establishing a scalable pattern: each new language = 3-line file additions. No SKILL.md bloat. No English user impact.

Benchmark data available at benchmarks/results/hangeul_proof.json.

…l mode Problem: SKILL.md inline language rules burden English users with token cost. Caveman ultra alone cannot compress Korean (English rules target articles/be-verbs — useless for Korean grammar). Solution: Hook-based dynamic rule injection. Language rules move from SKILL.md to rules/<lang>-compression.md. caveman-activate.js loads only the active mode's rules. Wenyan rules also extracted (cleanup). Korean-specific rules: - Honorific drop (~합니다→~함) - Particle drop (은/는/이/가/을/를) - Use noun endings (~함/~됨) - Connective → symbols (때문에→→, 그리고→+) - 반말 default, fragments OK Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean): baseline: 1,519 tokens caveman ultra: 550 tokens (-64%) hangeul ultra: 408 tokens (-73%) Hangeul saves extra 26% vs English caveman rules alone. English users: ZERO impact — SKILL.md unchanged in behavior. Wenyan users: identical experience — rules loaded from hook. Future languages: same pattern — 3-line file additions. Changes: - skills/caveman/SKILL.md: +hangeul intensity rows, +examples, +Language Rules section - rules/hangeul-compression.md: new — Korean compression rules (hook-loaded) - rules/wenyan-compression.md: new — wenyan rules extracted from SKILL.md - hooks/caveman-activate.js: +hangeul alias, +per-mode rule loading - hooks/caveman-config.js: +hangeul-* to VALID_MODES - hooks/caveman-mode-tracker.js: +hangeul command parsing, +Korean per-turn reminder - rules/caveman-activate.md: +hangeul to switch line - README.md: +hangeul before/after, +benchmark table, +honesty clause - evals/prompts/ko.txt: new — 10 Korean eval prompts - benchmarks/run_deepseek.py: new — DeepSeek benchmark harness - docs/honesty-clause.md: new — phonetic hangul limitation documented

🔴 Blocker fixes: - Move language rule loading AFTER if/else block (standalone gets rules) - Add resolveRulesDir() with multiple path tries for all install types - Add inline fallback rules when hangeul/wenyan .md files not found 🟡 Logic fixes: - mode-tracker writes canonical names (hangeul-full, wenyan-full) to flag file instead of short aliases - Add wenyan per-turn reminder (previously got English reminder) - SKILL.md Language Rules: 4 lines → 2 lines (less noise for EN users) 🟡 UX fix: - Add 🇰🇷 Hangeul to README 'Pick your level' 5-column grid (was missing)

claudianus added 5 commits May 3, 2026 08:07

fix: address hangeul hook review findings

e82656b

chore: sync hangeul skill surfaces

936eca5

fix: load language rules from plugin layouts

38fa624

claudianus force-pushed the feat/hangeul-hook-injection branch from 1d7b470 to 38fa624 Compare May 2, 2026 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: extract language rules to hook-based injection + add hangeul mode#299

refactor: extract language rules to hook-based injection + add hangeul mode#299
claudianus wants to merge 5 commits intoJuliusBrussee:mainfrom
claudianus:feat/hangeul-hook-injection

claudianus commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

claudianus commented Apr 26, 2026

Problem

Solution

Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean)

English user impact: ZERO

Wenyan: no change

Files changed (11)

Builds on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant