refactor: extract language rules to hook-based injection + add hangeul mode#299
Open
claudianus wants to merge 5 commits intoJuliusBrussee:mainfrom
Open
refactor: extract language rules to hook-based injection + add hangeul mode#299claudianus wants to merge 5 commits intoJuliusBrussee:mainfrom
claudianus wants to merge 5 commits intoJuliusBrussee:mainfrom
Conversation
β¦l mode Problem: SKILL.md inline language rules burden English users with token cost. Caveman ultra alone cannot compress Korean (English rules target articles/be-verbs β useless for Korean grammar). Solution: Hook-based dynamic rule injection. Language rules move from SKILL.md to rules/<lang>-compression.md. caveman-activate.js loads only the active mode's rules. Wenyan rules also extracted (cleanup). Korean-specific rules: - Honorific drop (~ν©λλ€β~ν¨) - Particle drop (μ/λ/μ΄/κ°/μ/λ₯Ό) - Use noun endings (~ν¨/~λ¨) - Connective β symbols (λλ¬Έμββ, κ·Έλ¦¬κ³ β+) - λ°λ§ default, fragments OK Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean): baseline: 1,519 tokens caveman ultra: 550 tokens (-64%) hangeul ultra: 408 tokens (-73%) Hangeul saves extra 26% vs English caveman rules alone. English users: ZERO impact β SKILL.md unchanged in behavior. Wenyan users: identical experience β rules loaded from hook. Future languages: same pattern β 3-line file additions. Changes: - skills/caveman/SKILL.md: +hangeul intensity rows, +examples, +Language Rules section - rules/hangeul-compression.md: new β Korean compression rules (hook-loaded) - rules/wenyan-compression.md: new β wenyan rules extracted from SKILL.md - hooks/caveman-activate.js: +hangeul alias, +per-mode rule loading - hooks/caveman-config.js: +hangeul-* to VALID_MODES - hooks/caveman-mode-tracker.js: +hangeul command parsing, +Korean per-turn reminder - rules/caveman-activate.md: +hangeul to switch line - README.md: +hangeul before/after, +benchmark table, +honesty clause - evals/prompts/ko.txt: new β 10 Korean eval prompts - benchmarks/run_deepseek.py: new β DeepSeek benchmark harness - docs/honesty-clause.md: new β phonetic hangul limitation documented
π΄ Blocker fixes: - Move language rule loading AFTER if/else block (standalone gets rules) - Add resolveRulesDir() with multiple path tries for all install types - Add inline fallback rules when hangeul/wenyan .md files not found π‘ Logic fixes: - mode-tracker writes canonical names (hangeul-full, wenyan-full) to flag file instead of short aliases - Add wenyan per-turn reminder (previously got English reminder) - SKILL.md Language Rules: 4 lines β 2 lines (less noise for EN users) π‘ UX fix: - Add π°π· Hangeul to README 'Pick your level' 5-column grid (was missing)
1d7b470 to
38fa624
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Caveman ultra cannot compress Korean. English rules target articles, be-verbs, short synonyms β all useless for Korean grammar. Users who write Korean prompts get verbose Korean responses even with caveman active.
Also, SKILL.md inline language rules burden English users with token cost. At 5+ languages, this pattern breaks.
Solution
Hook-based dynamic rule injection. Language rules move from
skills/caveman/SKILL.mdtorules/<lang>-compression.md. Thecaveman-activate.jshook loads only the active mode's rules. Wenyan rules also extracted (cleanup).Hangeul mode adds Korean-specific compression:
Benchmark (DeepSeek v4-pro, 3 Korean prompts, forced Korean)
Hangeul saves extra 26% vs English caveman rules alone.
English user impact: ZERO
SKILL.md unchanged in behavior. Language rules load via hook only when that language mode is active. English users never see Korean rules in their system prompt.
Wenyan: no change
Wenyan rules extracted to
rules/wenyan-compression.md, loaded by hook β identical user experience.Files changed (11)
skills/caveman/SKILL.mdβ +hangeul intensity rows, +examples, +Language Rules sectionrules/hangeul-compression.mdβ new Korean rules (hook-loaded)rules/wenyan-compression.mdβ new (extracted from SKILL.md)hooks/caveman-activate.jsβ +hangeul alias, +per-mode rule loadinghooks/caveman-config.jsβ +hangeul-* to VALID_MODEShooks/caveman-mode-tracker.jsβ +hangeul parsing, +Korean per-turn reminderrules/caveman-activate.mdβ +hangeul to switch lineREADME.mdβ +hangeul before/after, +benchmark, +honesty clauseevals/prompts/ko.txtβ 10 Korean eval promptsbenchmarks/run_deepseek.pyβ DeepSeek API benchmark harnessdocs/honesty-clause.mdβ phonetic hangul limitation documentedBuilds on
This approach settles the open language PRs (#54 Korean, #85 Japanese) by establishing a scalable pattern: each new language = 3-line file additions. No SKILL.md bloat. No English user impact.
Benchmark data available at
benchmarks/results/hangeul_proof.json.