kv-cache-concept-page#206
Conversation
|
BLOCKING: GitHub reports this PR as Quality checks:
Project acceptance criteria:
Behavioral assertion check for
Docs writing standards:
Docs quality standards:
General website standards:
Graphing standards:
Review rules outcome:
Once the branch is rebased cleanly and pushed, this is ready for a short follow-up review rather than a redesign. |
… to nearby serving and attention docs]
…contract and KV-cache discovery]
0b040f2 to
cccb60f
Compare
|
Addressed the blocking mergeability feedback from 2026-06-22 04:29 UTC. What changed:
Validation run on this updated head:
PR state after push:
|
|
Superseding prior blocking mergeability feedback: the branch is now Quality checks:
Project acceptance criteria:
Behavioral assertion check for
Docs writing standards:
General website standards:
Graphing standards:
Review rules outcome:
No blocking issues remain. Merging. |
{
"project": "Model Atlas — KV Cache Canonical Concept Page",
"branchName": "kv-cache-concept-page",
"description": "Publish one canonical English
kv-cacheconcept page, backed by the existing registry record and localized messages, so readers can understand why caching keys and values matters for serving speed and move cleanly between nearby serving and attention docs.","context": {
"customerAsk": "Add the canonical English concept page for
kv-cache, because the site has the glossary term and related attention pages but still lacks the broad explainer that tells readers why caching keys and values matters for serving speed. Keep this as one small, mergeable page slice on top of currentmain. Scope: create the canonical concept page plusmessages/en.jsonand any neededassets.json, backed by the existing publishedconcept.kv-cacheregistry record; classify it correctly per the docs and data-model guidance; wire aliases, tags, and related-doc connections so it can be found fromprefill,decode,prefill-decode-split,autoregressive-generation,attention,multi-query-attention,grouped-query-attention, andsliding-window-attention; and update only the focused validation/tests needed for touched content and discovery integrity. The explanation should define the KV cache in layperson terms, explain why it removes repeated work during generation, and clarify how it changes latency and memory tradeoffs without collapsing into kernel-level detail. Keep the slice English-only and avoid unrelated locale or tokenizer churn. Acceptance criteria: thekv-cacheconcept page renders on currentmain, is backed by the existing registry record and discovery metadata, and focused touched checks pass.","problem": "The repository already ships the published
concept.kv-cacheregistry record, a glossary entry for KV cache, and nearby attention and serving pages, but it does not ship the canonical concept page that explains KV cache as the broad serving concept behind prompt reuse during generation. Readers therefore have a gap between the short glossary definition and the more specific pages forprefill,decode,prefill-decode-split,attention,multi-query-attention,grouped-query-attention, andsliding-window-attention, with no plain-language bridge that explains what is being saved, why repeated attention work can be skipped, and why the cache improves latency while increasing memory pressure.","solution": "Create a canonical concept page at
/docs/concepts/kv-cacheusing the standard concept-page structure, localized English messages, and the existingconcept.kv-cacheregistry record. Keep the copy simple and technical-layperson-friendly, connect the page to nearby glossary, generation-stage, and attention pages through focused discovery metadata and links, and add only the narrow validation needed to prove route, registry, message, and nearby discovery integrity."},
"acceptanceCriteria": [
"A canonical docs page exists for
kv-cacheunder the concepts docs tree, binds toregistryId: concept.kv-cache, and renders in the standard docs shell.","The page uses colocated
messages/en.jsonand any requiredassets.json, with reader-facing copy resolved through message keys rather than hard-coded prose inpage.mdx.","The opening summary and primary sections explain, in simple language, what keys and values are being cached, why that removes repeated work during generation, and how the cache trades lower latency for higher memory usage.",
"Readers can move cleanly between the new KV-cache concept page and
prefill,decode,prefill-decode-split,autoregressive-generation,attention,multi-query-attention,grouped-query-attention, andsliding-window-attentionthrough the touched related-doc, alias, tag, or focused inline-link surfaces.","The implementation stays English-only and does not reopen unrelated locale, tokenizer, or broad taxonomy work beyond what is required for this page to land cleanly.",
"Focused validation proves the canonical KV-cache concept page resolves against the existing published registry record and that the touched discovery surfaces remain valid.",
"Quality gate: typecheck, lint, and targeted tests pass."
],
"userStories": [
{
"id": "kv-cache-concept-page-001",
"title": "Publish the canonical KV-cache concept page",
"description": "As a reader trying to understand why generation speed depends on cached attention state, I want one canonical KV-cache page so I can learn the broad idea before diving into serving stages or attention variants.",
"acceptanceCriteria": [
"A canonical concept page exists at
/docs/concepts/kv-cachewith frontmatter that binds toconcept.kv-cache, plus colocatedmessages/en.jsonand any required localassets.json.","The page follows the concept template pattern and keeps
page.mdxstructural, with reader-facing text resolved throughmessages/en.json.","The page opens with one folded
openingSummaryand explains in plain language that earlier tokens produce keys and values that later decode steps can reuse instead of recomputing the whole prefix.","The page explains that the cache lowers repeated attention work during autoregressive generation while growing memory usage as sequence length and layer count increase.",
"The page avoids kernel-level implementation detail and remains understandable in isolation before narrowing into serving-stage or attention-variant examples.",
"The page renders in the standard docs shell.",
"Typecheck passes",
"Verify in browser using the Browser plugin"
],
"priority": 1,
"passes": true,
"notes": ""
},
{
"id": "kv-cache-concept-page-002",
"title": "Connect the KV-cache bridge page to nearby serving and attention docs",
"description": "As a reader exploring serving behavior, I want the canonical KV-cache page connected to nearby generation-stage and attention pages so I can move between the broad cache concept and concrete serving mechanisms without dead ends.",
"acceptanceCriteria": [
"The KV-cache concept page exposes clear navigation to
prefill,decode,prefill-decode-split,autoregressive-generation,attention,multi-query-attention,grouped-query-attention, andsliding-window-attentionthrough registry-backed related docs and any required focused inline links.","Touched discovery metadata keeps aliases and tags aligned with
concept.kv-cacheso representative queries such askv cache,key-value cache,attention cache, andcache for decodingresolve to the canonical concept surface appropriately.","The page clearly explains the practical tradeoff that lower repeated compute and lower latency come at the cost of more live serving memory, without drifting into kernel-level or benchmark-heavy detail.",
"Any supporting copy or relationship changes outside the new page remain narrowly scoped to reader movement to or from the KV-cache concept page.",
"Typecheck passes",
"Tests pass",
"Verify in browser using the Browser plugin"
],
"priority": 2,
"passes": true,
"notes": ""
},
{
"id": "kv-cache-concept-page-003",
"title": "Add focused validation for page contract and KV-cache discovery",
"description": "As a maintainer, I want targeted automated proof for the KV-cache concept slice so registry, message, route, and nearby discovery regressions are caught without unrelated test churn.",
"acceptanceCriteria": [
"Automated validation or tests confirm the canonical KV-cache concept page resolves against the existing published
concept.kv-cacheregistry record instead of creating a duplicate concept record.","Automated validation or tests cover the touched discovery surface for the page, such as route resolution, English message loading, and at least one related-doc or search-oriented expectation for the canonical KV-cache concept route.",
"Coverage remains focused on observable behavior for this page slice and does not expand into unrelated locale manifests, tokenizer pages, or broad inventory assertions.",
"Typecheck passes",
"Tests pass"
],
"priority": 3,
"passes": true,
"notes": ""
}
]
}