Skip to content

refactor: extract language rules to hook-based injection + add hangeul mode#299

Open
claudianus wants to merge 5 commits intoJuliusBrussee:mainfrom
claudianus:feat/hangeul-hook-injection
Open

refactor: extract language rules to hook-based injection + add hangeul mode#299
claudianus wants to merge 5 commits intoJuliusBrussee:mainfrom
claudianus:feat/hangeul-hook-injection

Conversation

@claudianus
Copy link
Copy Markdown

Problem

Caveman ultra cannot compress Korean. English rules target articles, be-verbs, short synonyms β€” all useless for Korean grammar. Users who write Korean prompts get verbose Korean responses even with caveman active.

Also, SKILL.md inline language rules burden English users with token cost. At 5+ languages, this pattern breaks.

Solution

Hook-based dynamic rule injection. Language rules move from skills/caveman/SKILL.md to rules/<lang>-compression.md. The caveman-activate.js hook loads only the active mode's rules. Wenyan rules also extracted (cleanup).

Hangeul mode adds Korean-specific compression:

  • Honorific drop (~ν•©λ‹ˆλ‹€ β†’ ~함)
  • Particle drop (은/λŠ”/이/κ°€/을/λ₯Ό)
  • Noun endings (~됨, ~μ™„λ£Œ)
  • Connective β†’ symbols (λ•Œλ¬Έμ— β†’ β†’)
  • 반말 default, fragments OK

Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean)

Arm Avg tokens Savings
Baseline (normal Korean) 1,519 β€”
Caveman Ultra (English rules) 550 -64%
Hangeul Ultra (Korean rules) 408 -73%

Hangeul saves extra 26% vs English caveman rules alone.

English user impact: ZERO

SKILL.md unchanged in behavior. Language rules load via hook only when that language mode is active. English users never see Korean rules in their system prompt.

Wenyan: no change

Wenyan rules extracted to rules/wenyan-compression.md, loaded by hook β€” identical user experience.

Files changed (11)

  • skills/caveman/SKILL.md β€” +hangeul intensity rows, +examples, +Language Rules section
  • rules/hangeul-compression.md β€” new Korean rules (hook-loaded)
  • rules/wenyan-compression.md β€” new (extracted from SKILL.md)
  • hooks/caveman-activate.js β€” +hangeul alias, +per-mode rule loading
  • hooks/caveman-config.js β€” +hangeul-* to VALID_MODES
  • hooks/caveman-mode-tracker.js β€” +hangeul parsing, +Korean per-turn reminder
  • rules/caveman-activate.md β€” +hangeul to switch line
  • README.md β€” +hangeul before/after, +benchmark, +honesty clause
  • evals/prompts/ko.txt β€” 10 Korean eval prompts
  • benchmarks/run_deepseek.py β€” DeepSeek API benchmark harness
  • docs/honesty-clause.md β€” phonetic hangul limitation documented

Builds on

This approach settles the open language PRs (#54 Korean, #85 Japanese) by establishing a scalable pattern: each new language = 3-line file additions. No SKILL.md bloat. No English user impact.

Benchmark data available at benchmarks/results/hangeul_proof.json.

claudianus added 5 commits May 3, 2026 08:07
…l mode

Problem:
SKILL.md inline language rules burden English users with token cost.
Caveman ultra alone cannot compress Korean (English rules target
articles/be-verbs β€” useless for Korean grammar).

Solution:
Hook-based dynamic rule injection. Language rules move from SKILL.md to
rules/<lang>-compression.md. caveman-activate.js loads only the active
mode's rules. Wenyan rules also extracted (cleanup).

Korean-specific rules:
- Honorific drop (~ν•©λ‹ˆλ‹€β†’~함)
- Particle drop (은/λŠ”/이/κ°€/을/λ₯Ό)
- Use noun endings (~함/~됨)
- Connective β†’ symbols (λ•Œλ¬Έμ—β†’β†’, 그리고→+)
- 반말 default, fragments OK

Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean):
  baseline:       1,519 tokens
  caveman ultra:    550 tokens (-64%)
  hangeul ultra:    408 tokens (-73%)
  Hangeul saves extra 26% vs English caveman rules alone.

English users: ZERO impact β€” SKILL.md unchanged in behavior.
Wenyan users: identical experience β€” rules loaded from hook.
Future languages: same pattern β€” 3-line file additions.

Changes:
- skills/caveman/SKILL.md: +hangeul intensity rows, +examples, +Language Rules section
- rules/hangeul-compression.md: new β€” Korean compression rules (hook-loaded)
- rules/wenyan-compression.md: new β€” wenyan rules extracted from SKILL.md
- hooks/caveman-activate.js: +hangeul alias, +per-mode rule loading
- hooks/caveman-config.js: +hangeul-* to VALID_MODES
- hooks/caveman-mode-tracker.js: +hangeul command parsing, +Korean per-turn reminder
- rules/caveman-activate.md: +hangeul to switch line
- README.md: +hangeul before/after, +benchmark table, +honesty clause
- evals/prompts/ko.txt: new β€” 10 Korean eval prompts
- benchmarks/run_deepseek.py: new β€” DeepSeek benchmark harness
- docs/honesty-clause.md: new β€” phonetic hangul limitation documented
πŸ”΄ Blocker fixes:
- Move language rule loading AFTER if/else block (standalone gets rules)
- Add resolveRulesDir() with multiple path tries for all install types
- Add inline fallback rules when hangeul/wenyan .md files not found

🟑 Logic fixes:
- mode-tracker writes canonical names (hangeul-full, wenyan-full)
  to flag file instead of short aliases
- Add wenyan per-turn reminder (previously got English reminder)
- SKILL.md Language Rules: 4 lines β†’ 2 lines (less noise for EN users)

🟑 UX fix:
- Add πŸ‡°πŸ‡· Hangeul to README 'Pick your level' 5-column grid (was missing)
@claudianus claudianus force-pushed the feat/hangeul-hook-injection branch from 1d7b470 to 38fa624 Compare May 2, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant