wordpiece-module-page#128
Conversation
|
Completed story What I verified:
What changed in this iteration:
CI is now running on this PR head. |
|
BLOCKING: the implementation itself looks correct from a code-review standpoint, but this PR is currently not mergeable. Live GitHub state:
Primary required fix:
Acceptance criteria:
Behavioral assertion check by story:
Docs writing standards checklist:
General website standards review checklist:
Review rules outcome:
Manual QA notes:
|
1c10740 to
bd4e75d
Compare
|
Addressed the blocking mergeability feedback on the current PR head What changed:
Validation on the rebased head:
Current GitHub state:
|
|
Follow-up mergeability update on head |
bd4e75d to
ce7188a
Compare
|
Addressed the new mergeability regression on the current PR head What changed:
Validation on the rebased head:
Current GitHub state:
|
|
Follow-up mergeability update on head
I will keep this unresolved until the rerun reaches a terminal required-check state. |
ce7188a to
22998f7
Compare
|
Addressed the latest mergeability regression on the current PR head What changed:
Validation on the rebased head:
Current GitHub state:
|
|
Follow-up mergeability update on head
I will keep this unresolved until the re-queued required checks reach a terminal state. |
22998f7 to
b4876a5
Compare
|
Addressed the latest mergeability regression on the current PR head What changed:
Validation on the rebased head:
Current GitHub state:
|
|
Follow-up mergeability update on head
I am keeping this unresolved until the rerun reaches a terminal required-check state. |
|
Addressed the latest mergeability blocker on the current PR head What changed:
Validation on this head:
Current GitHub state:
|
…eaching aid for tokenizer-family understanding]
…gh existing registry-backed metadata]
…ordPiece page contract]
…ordPiece page contract]
…ordPiece page contract]
c810366 to
98c0df3
Compare
|
Addressed the latest mergeability blocker on the current PR head What changed:
Validation on this head:
Current GitHub state:
|
|
Addressed the latest mergeability blocker on the current PR head What changed:
Current GitHub state:
I am keeping this unresolved until the new required checks reach a terminal state. |
|
Addressed the latest mergeability blocker on the current PR head What changed:
Current GitHub state:
|
{
"project": "Model Atlas — WordPiece Module Page",
"branchName": "wordpiece-module-page",
"description": "Publish the missing canonical English
WordPiecemodule page, backed by the existing registry record and localized messages, so readers can discover it through tokenizer-family search and related docs and understand how WordPiece differs from BPE and SentencePiece.","context": {
"customerAsk": "Add the missing canonical English module page for
WordPieceso tokenizer-family search and related-doc paths cover the main tokenization approaches. Treat this as one mergeable page slice on currentmain, not a tokenizer-family refactor. Scope: create the module page, colocatedmessages/en.json, any requiredassets.json, and use the existingmodule.wordpieceregistry record as the canonical backing record, correcting tags, aliases, and related ids only where needed for search and related-doc quality. The page should explain in layperson-friendly language how WordPiece builds subword units, how it differs from BPE and SentencePiece, and where readers will encounter it in model and tokenizer docs. Add only the focused graphs, tables, and tests needed for this page to teach clearly and pass registry and message validation. Keep the slice English-only.","problem": "The repository already has a
module.wordpieceregistry record and nearby tokenizer content, but it still lacks the canonical WordPiece page readers expect to find directly. That leaves a discovery gap in tokenizer-family search and related-doc paths and leaves technical lay readers without a clean explanation of how WordPiece forms reusable subword pieces or how it differs from nearby tokenizer algorithms.","solution": "Create a canonical
/docs/modules/wordpiecepage with English-only localized content and only the smallest local asset support needed to teach the topic clearly. Reusemodule.wordpieceas the source of truth for discovery metadata and relationships, updating only the fields required for publication, better search aliases, and coherent related links to nearby tokenizer, glossary, and model pages that already exist on the branch."},
"acceptanceCriteria": [
"A published canonical docs page exists at
/docs/modules/wordpiecewith matching frontmatter, English messages, and only the local assets required by the module-page template.","The page is backed by the existing
module.wordpieceregistry record, with only minimal metadata fixes needed for publication, discovery quality, and related-doc coherence.","The page explains in plain language how WordPiece builds subword units, why that approach was useful, and how it differs from both
bpeandsentencepiece.","Search, tags, aliases, and related-doc surfaces can route readers to the canonical WordPiece page from representative tokenizer-family queries and nearby shipped docs.",
"The page gives readers clear onward paths to nearby tokenizer, glossary, and model pages where those canonical targets already exist in the branch.",
"Focused validation covers the page route, registry and message linkage, and at least one WordPiece-specific discovery or related-link behavior.",
"Quality gate: typecheck, lint, and focused touched tests pass."
],
"userStories": [
{
"id": "wordpiece-module-page-001",
"title": "Publish the canonical WordPiece module page",
"description": "As a reader who searches for WordPiece directly, I want a dedicated canonical page so I can understand what it is and why it is used without piecing the answer together from adjacent tokenizer pages.",
"acceptanceCriteria": [
"A canonical page exists at
/docs/modules/wordpiecewith module frontmatter,messages/en.json, and any requiredassets.json.","The page opens with one concise
openingSummaryand remains understandable in isolation for a technical layperson.","The narrative explains how WordPiece starts from smaller units and builds reusable subword pieces with at least one concrete token-building example.",
"The page explains where readers are likely to encounter WordPiece in practice, especially in older encoder-style model or tokenizer discussions, without depending on unshipped pages.",
"Typecheck passes",
"Tests pass",
"Verify in browser using the Browser plugin"
],
"priority": 1,
"passes": true,
"notes": ""
},
{
"id": "wordpiece-module-page-002",
"title": "Add the minimum comparison and teaching aid for tokenizer-family understanding",
"description": "As a reader comparing tokenizer algorithms, I want WordPiece to show the smallest useful comparison or visual aid so I can understand how it differs from BPE and SentencePiece without reading an implementation tutorial.",
"acceptanceCriteria": [
"The page includes the minimum focused visual, comparison table, or equivalent teaching aid needed to make WordPiece's piece-selection behavior clear.",
"The comparison explains how WordPiece differs from
bpeandsentencepiecein reader-facing language rather than benchmark-only or implementation-only framing.","Any added graph or comparison follows the existing graphing and template standards and stays page-local or registry-backed without decorative churn.",
"The page remains responsive and readable on standard docs layouts without broken or overflowing comparison content.",
"Typecheck passes",
"Tests pass",
"Verify in browser using the Browser plugin"
],
"priority": 2,
"passes": true,
"notes": ""
},
{
"id": "wordpiece-module-page-003",
"title": "Make the page discoverable through existing registry-backed metadata",
"description": "As a reader navigating tokenizer topics, I want search, tags, and related docs to lead me into WordPiece so I can find the page even when I start from adjacent terms or nearby docs.",
"acceptanceCriteria": [
"The existing
module.wordpieceregistry record is reused as the canonical metadata source, with only the fields required for clean publication and discovery updated.","Representative queries such as
WordPiece,word piece,wordpiece tokenizer, orbert tokenizercan return the canonical WordPiece page as a relevant result when those aliases are accurate and non-misleading.","The page renders tags and related-doc links that connect it to nearby shipped pages such as
token,tokenizers-overview,bpe,sentencepiece, and relevant model or glossary pages where those routes already exist.","At least one neighboring shipped discovery surface or related-doc path can lead a reader into WordPiece without manually typing the slug.",
"Typecheck passes",
"Tests pass",
"Verify in browser using the Browser plugin"
],
"priority": 3,
"passes": true,
"notes": ""
},
{
"id": "wordpiece-module-page-004",
"title": "Add focused validation for the WordPiece page contract",
"description": "As a maintainer, I want narrow automated coverage for the WordPiece slice so future edits do not silently break the page, its metadata wiring, or its discovery behavior.",
"acceptanceCriteria": [
"Focused validation confirms the
wordpieceroute,module.wordpieceregistry record, and default English messages resolve together.","Focused validation asserts at least one WordPiece-specific search, tag, or related-doc expectation.",
"Coverage stays scoped to observable behavior for this page slice and does not require unrelated locale, taxonomy, or inventory assertions.",
"Typecheck passes",
"Tests pass"
],
"priority": 4,
"passes": true,
"notes": ""
}
]
}