Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,11 @@ the temporary legacy-id bridge.
* `src/lib/content/architecture.ts`
Architecture browse/index classification should treat ontology architecture
membership, including descendant branches such as activation, as canonical
evidence before any legacy `conceptType` fallback.
evidence before any legacy `conceptType` fallback. When a concept-backed
glossary term also gains a published concept-section page, architecture
browse expectations should move to the canonical concept route and index
counts should be updated from the runtime-derived published entries rather
than preserved as glossary-era constants.
* `src/lib/governance/typed-taxonomy-consumer-audit.ts`
Machine-checkable contract for remaining typed-taxonomy consumer clusters,
ownership, compatibility status, the recommended next migration target, and
Expand Down
1 change: 1 addition & 0 deletions src/content/docs/concepts/decoder/assets.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
36 changes: 36 additions & 0 deletions src/content/docs/concepts/decoder/messages/en.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"title": "Decoder",
"description": "The part of a model that turns context into output predictions, especially the left-to-right stack used by GPT-style language models.",
"openingSummary": "A decoder turns the context a model already has into output predictions. In modern decoder-only language models, the same stack repeatedly reads the tokens so far and predicts the next token, which is why decoders sit at the center of GPT-style text generation.",
"sections": {
"whatItIs": {
"title": "What It Is",
"body": "A decoder is the part of a model that reads internal context and produces output-facing predictions. In language systems, that usually means turning token representations into the next-token distribution. In other settings, a decoder may turn a latent or encoder memory into pixels, audio, or another structured output. The shared idea is that the decoder is the readout side of the system: it consumes representations and pushes them toward a final answer."
},
"whyItMatters": {
"title": "Why It Matters",
"body": "The word decoder explains where generation actually happens. In a GPT-style model, there is no separate encoder that first builds a frozen memory for the prompt. The decoder stack itself both builds the running context and turns that context into next-token probabilities. That is why pages about causal attention, decode, autoregressive generation, and GPT-family models keep pointing back to decoders: they are the architecture layer that makes token-by-token output possible."
},
"decoderOnlyLoop": {
"title": "Why Decoder-Only Fits Next-Token Generation",
"body": "A decoder-only transformer uses left-to-right, or causal, attention. Each position can read the tokens that already exist in the prefix, but it cannot read future tokens that have not been generated yet. That rule matches the job exactly: predict the next token from the tokens already seen. After one token is chosen, the prefix grows by one position and the same decoder stack runs again. Because the attention mask and the task follow the same left-to-right constraint, decoder-only models are a natural fit for autoregressive generation."
},
"comparedWithOtherLayouts": {
"title": "Compared With Encoder And Encoder-Decoder Models",
"body": "An encoder-only model is usually built to read the whole input and produce representations, not to keep emitting new tokens step by step. It can use bidirectional attention because every input token is already known. An encoder-decoder model splits the work: the encoder reads the full source input first, then the decoder generates outputs while attending both to earlier output tokens and to encoder memory. A decoder-only model folds those jobs into one causal stack, which is simpler for plain next-token continuation but less specialized for tasks that benefit from a separate full-input reader."
},
"commonConfusions": {
"title": "Common Confusions",
"body": "A decoder is not just the final softmax or language-model head; the head sits on top of the decoder stack. Decoder-only also does not mean the model only writes and never represents context. The same stack still builds rich internal states while it reads the prefix. Finally, not every decoder is autoregressive. Some image or latent decoders reconstruct outputs in one shot. The left-to-right token loop is specifically the language-model decoder pattern."
},
"related": {
"title": "Related Concepts And Modules"
},
"tags": {
"title": "Tags"
},
"references": {
"title": "References"
}
}
}
71 changes: 71 additions & 0 deletions src/content/docs/concepts/decoder/page.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: Decoder
description: The part of a model that turns context into output predictions, especially the left-to-right stack used by GPT-style language models.
kind: "concept"
registryId: "concept.decoder"
messageNamespace: "local"
assetNamespace: "local"
status: "published"
tags:
- foundations
- taxonomy
aliases:
- "Decoder"
- "decoder block"
- "decoder stack"
- "decoder-only stack"
- "decoding network"
updatedAt: "2026-06-22"
---

import { CitationList } from "@/features/docs/components/CitationList";
import { DerivedRelatedDocs } from "@/features/docs/components/DerivedRelatedDocs";
import { RelatedDocs } from "@/features/docs/components/RelatedDocs";
import { Section } from "@/features/docs/components/Section";
import { T } from "@/features/docs/components/T";
import { TagPillList } from "@/features/docs/components/TagPillList";

<Section id="what-it-is" titleKey="sections.whatItIs.title">
<T k="sections.whatItIs.body" />
</Section>

<Section id="why-it-matters" titleKey="sections.whyItMatters.title">
<T k="sections.whyItMatters.body" />
</Section>

<Section id="decoder-only-loop" titleKey="sections.decoderOnlyLoop.title">
<T k="sections.decoderOnlyLoop.body" />
</Section>

<Section
id="compared-with-other-layouts"
titleKey="sections.comparedWithOtherLayouts.title"
>
<T k="sections.comparedWithOtherLayouts.body" />
</Section>

<Section id="common-confusions" titleKey="sections.commonConfusions.title">
<T k="sections.commonConfusions.body" />
</Section>

<Section id="related" titleKey="sections.related.title">
<DerivedRelatedDocs
registryId="concept.decoder"
groups={[
"same-concept-type",
"shared-tags",
"used-by-models",
"curated-related",
]}
/>

<RelatedDocs registryId="concept.decoder" />
</Section>

<Section id="tags" titleKey="sections.tags.title">
<TagPillList registryId="concept.decoder" showDescriptions />
</Section>

<Section id="references" titleKey="sections.references.title">
<CitationList registryId="concept.decoder" />
</Section>
17 changes: 14 additions & 3 deletions src/content/registry/concepts/decoder.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,28 @@
"kind": "concept",
"defaultTitleKey": "title",
"defaultSummaryKey": "description",
"aliases": ["Decoder", "decoder block", "decoder stack", "decoding network"],
"aliases": [
"Decoder",
"decoder block",
"decoder stack",
"decoder-only stack",
"decoding network"
],
"tags": ["foundations", "taxonomy"],
"relatedIds": [
"concept.encoder",
"concept.encoder-decoder",
"concept.autoregressive-generation"
"concept.autoregressive-generation",
"concept.decode",
"concept.transformer",
"module.causal-attention",
"model.gpt-3",
"paper.gpt-2-report"
],
"citationIds": ["citation.attention-is-all-you-need", "citation.brown-gpt-3"],
"status": "published",
"createdAt": "2026-06-04T11:39:45.780Z",
"updatedAt": "2026-06-04T12:30:00.000Z",
"updatedAt": "2026-06-22T00:00:00.000Z",
"conceptType": "architecture",
"sidebarGrouping": {
"glossary": "sequence-and-attention"
Expand Down
1 change: 1 addition & 0 deletions src/content/registry/concepts/transformer.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"tags": ["taxonomy", "model-family"],
"relatedIds": [
"concept.architecture",
"concept.decoder",
"concept.encoder-decoder",
"concept.autoregressive-generation",
"concept.kv-cache",
Expand Down
3 changes: 2 additions & 1 deletion src/content/registry/models/gpt-3.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"aliases": ["GPT-3", "Generative Pre-trained Transformer 3"],
"tags": ["foundations", "model-family", "attention", "context-window"],
"relatedIds": [
"concept.decoder",
"concept.tokenizers-overview",
"concept.transformer-architecture",
"concept.autoregressive-generation",
Expand All @@ -19,7 +20,7 @@
"citationIds": ["citation.brown-gpt-3", "citation.kaplan-scaling-laws"],
"status": "published",
"createdAt": "2026-06-18T00:00:00.000Z",
"updatedAt": "2026-06-21T00:00:00.000Z",
"updatedAt": "2026-06-22T00:00:00.000Z",
"authors": [
"Tom B. Brown",
"Benjamin Mann",
Expand Down
3 changes: 2 additions & 1 deletion src/content/registry/papers/gpt-2-report.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,15 @@
],
"tags": ["foundations", "model-family", "tokenization"],
"relatedIds": [
"concept.decoder",
"module.byte-level-tokenization",
"concept.transformer-architecture",
"concept.scaling-law"
],
"citationIds": ["citation.gpt-2-report"],
"status": "published",
"createdAt": "2026-06-20T00:00:00.000Z",
"updatedAt": "2026-06-20T00:00:00.000Z",
"updatedAt": "2026-06-22T00:00:00.000Z",
"authors": [
"Alec Radford",
"Jeffrey Wu",
Expand Down
2 changes: 1 addition & 1 deletion src/lib/content/causal-attention-module-page.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ describe("loadModulePage causal-attention", () => {
expect(html).toContain('href="/docs/modules/attention"');
expect(html).toContain('href="/docs/modules/bidirectional-attention"');
expect(html).toContain('href="/docs/glossary/autoregressive-generation"');
expect(html).toContain('href="/docs/glossary/decoder"');
expect(html).toContain('href="/docs/concepts/decoder"');
expect(html).toContain('data-testid="curated-related-docs"');
expect((html.match(/data-testid="tag-pill-list"/g) ?? []).length).toBe(1);
expect(html).not.toContain("Reader Shortcut");
Expand Down
113 changes: 113 additions & 0 deletions src/lib/content/decoder-concept-validation.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import { describe, expect, test } from "bun:test";
import { createElement } from "react";
import { renderToStaticMarkup } from "react-dom/server";
import { ModulePageProviders } from "@/features/docs/components/ModulePageProviders";
import { validatePageAssetReferences } from "@/lib/content/assets";
import { loadConceptPage } from "@/lib/content/concept-page";
import { loadGlossaryPage } from "@/lib/content/glossary-page";
import { loadPublishedDocsPages } from "@/lib/content/pages";
import {
getPublishedDocsEntriesBySlug,
getPublishedDocsEntryByRegistryId,
} from "@/lib/content/published-docs-registry-ids";
import { getConceptById } from "@/lib/content/registry-runtime";
import { docsSearchApi } from "@/lib/search/search-server";

describe("decoder concept page focused validation (decoder-concept-page-004)", () => {
test("published docs inventory resolves the canonical decoder route, registry id, and English messages together", async () => {
const record = getConceptById("concept.decoder");
const pages = await loadPublishedDocsPages("en");
const conceptPage = pages.find(
(entry) => entry.url === "/docs/concepts/decoder",
);
const glossaryPage = pages.find(
(entry) => entry.url === "/docs/glossary/decoder",
);

expect(record?.status).toBe("published");
expect(conceptPage).toBeDefined();
expect(conceptPage?.docsSlug).toBe("concepts/decoder");
expect(conceptPage?.frontmatter.kind).toBe("concept");
expect(conceptPage?.frontmatter.registryId).toBe("concept.decoder");
expect(conceptPage?.frontmatter.messageNamespace).toBe("local");
expect(conceptPage?.frontmatter.assetNamespace).toBe("local");
expect(conceptPage?.messages.title).toBe("Decoder");
expect(conceptPage?.messages.openingSummary).toContain(
"predicts the next token",
);

expect(glossaryPage?.frontmatter.kind).toBe("glossary");
expect(glossaryPage?.frontmatter.registryId).toBe("concept.decoder");

expect(getPublishedDocsEntryByRegistryId("concept.decoder")).toEqual(
expect.objectContaining({
docsSlug: "concepts/decoder",
pageKind: "concept",
section: "concepts",
}),
);
expect(getPublishedDocsEntriesBySlug("decoder")).toEqual(
expect.arrayContaining([
expect.objectContaining({
docsSlug: "concepts/decoder",
pageKind: "concept",
}),
expect.objectContaining({
docsSlug: "glossary/decoder",
pageKind: "glossary",
}),
]),
);
});

test("canonical decoder bundle resolves registry-backed copy and valid local assets together", async () => {
const record = getConceptById("concept.decoder");
if (!record) {
throw new Error("expected concept.decoder in registry");
}

const page = await loadConceptPage("decoder");

expect(page.frontmatter.kind).toBe("concept");
expect(page.frontmatter.registryId).toBe(record.id);
expect(page.messages.title).toBe("Decoder");
expect(page.messages.description).toContain(
"turns context into output predictions",
);
expect(page.messages.sections?.whatItIs.body).toContain(
"readout side of the system",
);
expect(page.messages.sections?.decoderOnlyLoop.body).toContain(
"left-to-right, or causal, attention",
);
expect(page.messages.sections?.comparedWithOtherLayouts.body).toContain(
"encoder-decoder model splits the work",
);
expect(validatePageAssetReferences(page.assets, page.messages)).toEqual([]);
});

test("discovery prefers the canonical concept route while the glossary bridge remains a visible nearby surface", async () => {
const conceptResults = await docsSearchApi.search("decoder-only stack");
expect(conceptResults[0]?.url).toBe("/docs/concepts/decoder");

const bridgeResults = await docsSearchApi.search("decoding network");
expect(
bridgeResults.some((result) => result.url === "/docs/concepts/decoder"),
).toBe(true);
expect(
bridgeResults.some((result) => result.url === "/docs/glossary/decoder"),
).toBe(true);

const glossaryPage = await loadGlossaryPage("decoder");
const html = renderToStaticMarkup(
createElement(ModulePageProviders, {
messages: glossaryPage.messages,
assets: glossaryPage.assets,
// biome-ignore lint/correctness/noChildrenProp: third createElement arg conflicts with strict props typing
children: glossaryPage.content,
}),
);

expect(html).toContain('href="/docs/concepts/decoder"');
});
});
Loading
Loading