Written once. Structured at write-time. Evolved continuously by agents.
A wiki where pages do not exist — only evolving memory.
╔════════════════════════════════════════════════════════════════════════════╗
║ ║
║ S E M A N T I C M A R K D O W N ║
║ ║
║ "The Document IS the Memory Substrate" ║
║ ║
║ ║
║ ║
║ W R I T E O N C E ║
║ C O N S U M E E V E R Y W H E R E ║
║ ║
║ H U M A N S • A G E N T S • A P P S ║
║ ║
╠════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ ║
║ │ HUMANS │ │ AGENTS │ │ APPS │ ║
║ │ │ │ │ │ │ ║
║ │ Readable │ │ Query by │ │ Filter by │ ║
║ │ Markdown │ │ tags, type, │ │ tags, topics, │ ║
║ │ │ │ segment │ │ sentiment │ ║
║ │ Familiar │ │ Structured │ │ Rich views & │ ║
║ │ Authoring │ │ Metadata │ │ navigation │ ║
║ └─────────────┘ └─────────────┘ └─────────────────┘ ║
║ ║
╠════════════════════════════════════════════════════════════════════════════╣
║ ║
║ @document ║
║ { ║
║ type: "research_note", ║
║ ticker: "A", ║
║ sectors: ["Life Sciences", "Diagnostics"] ║
║ } ║
║ ║
║ --- ║
║ ║
║ @block ║
║ { ║
║ id: 1, ║
║ type: "investment_thesis", ║
║ topic: "Margin Expansion", ║
║ tags: ["pricing", "margins"], ║
║ sentiment: 0.72 ║
║ } ║
║ ║
║ Agilent continues to benefit from pricing power and ║
║ operating leverage. Management highlighted a 50bp margin ║
║ improvement driven by pricing actions and manufacturing ║
║ efficiencies. ║
║ ║
║ --- ║
║ ║
║ @block ║
║ { ║
║ id: 2, ║
║ type: "segment_analysis", ║
║ segment: "Life Sciences & Diagnostics", ║
║ tags: ["pharma", "growth"], ║
║ highlights: ["Biologics demand remains strong"] ║
║ } ║
║ ║
║ Revenue contribution by end market: ║
║ ║
║ ```html ║
║ <table> ║
║ <tr><th>End Market</th><th>Growth</th></tr> ║
║ <tr><td>Pharma</td><td>+9%</td></tr> ║
║ <tr><td>Diagnostics</td><td>+6%</td></tr> ║
║ </table> ║
║ ``` ║
║ ║
╠════════════════════════════════════════════════════════════════════════════╣
║ ║
║ One file. Persistent memory for humans, agents, and applications. ║
║ ║
║ Write-time structure. No ETL. No sidecar metadata. ║
║ ║
╚════════════════════════════════════════════════════════════════════════════╝
SMD is the primitive layer of LLM-native knowledge systems.
LLM systems today simulate memory using external infrastructure — vector databases, context windows, retrieval pipelines. SMD removes the simulation. Memory becomes the document itself.
Meaning is not inferred after the fact — it is embedded at write time. Every block carries its own structured metadata — tags, sentiment, entities, enrichments, summaries, or arbitrary JSON — while the body stays human-readable markdown.
One artifact. Three surfaces:
- Humans — clean Markdown body
- Agents — parseable JSON headers with block-level enrichment
- Viewers — renderable semantic overlays (sentiment coloring, tag filters, highlights)
SMD makes the document the smallest unit of intelligence infrastructure.
SMD is what a wiki becomes when memory is the primitive.
- A document is a container of memory
- A block is an addressable wiki node
- A collection of documents is a single evolving knowledge graph
Traditional systems:
Pages → Links → Wiki
SMD:
Memory blocks → Documents → Emergent wiki graph
LLM systems do not need better retrieval. They need a native memory substrate.
SMD is that substrate.
SMD operates as a single format interpreted through three layers:
A readable, familiar Markdown document.
Authors write in plain Markdown. The JSON headers are lightweight and stay out of the way. No toolchain required.
A block-addressable structured graph.
Every block has a typed JSON header. Blocks are queried, filtered, and indexed by type, tags, sentiment, entities — without parsing the body.
Every block is a persistent memory unit that can be updated without rewriting the document.
Documents accumulate structured experience. Blocks become stable memory units that can be independently referenced. Metadata becomes traceable context attached to memory, not separate from it. Retrieval becomes context reconstruction, not search. Agents operate on shared persistent memory state via the document.
No decay models. No lifecycle states. No extra mechanics — just an interpretation shift.
LLM systems simulate memory through fragmented infrastructure:
- Content lives in Markdown
- Metadata lives in YAML
- Annotations live in a database
- Embeddings live in a vector store
- Application state lives somewhere else
Each layer is a simulation. Together they reconstruct what was lost at write time.
SMD collapses all of it into one artifact: a document that IS its own memory.
A single piece of content today passes through many systems:
- Humans write text
- Systems chunk it
- Databases store it as blobs
- AI systems reconstruct meaning from context windows
- Applications rebuild structure for display
This creates fragmentation: loss of structure, duplicated processing, ambiguous meaning, brittle pipelines.
A document is not stored. It is reconstructed differently by every system that touches it.
SMD removes translation layers. Structure is written at creation time.
Traditional:
write → parse → chunk → infer → reconstruct
Semantic Markdown:
write → structure → consume
Everything in SMD is explicit, ordered, and addressable.
The fundamental abstraction is not the document — it is the addressable knowledge node graph emerging from blocks across documents.
There is no hidden structure outside the document itself.
SMD turns documents into persistent memory objects that accumulate structure over time.
- Documents accumulate structured experience — each enrichment pass adds memory without altering original content
- Blocks become stable memory units — independently addressable, retrievable by
block_idwithout chunking heuristics - Metadata becomes traceable context — tags, sentiment, entities, and summaries are attached to memory, not stored in a separate system
- Retrieval becomes context reconstruction — fetch specific blocks by type, tag, or sentiment instead of keyword search
- Agents operate on shared persistent memory — multiple agents can read, annotate, and evolve the same document over time
- Cross-document memory views — aggregate blocks by tag across multiple
.smdfiles - Bottom-up knowledge emergence — discover topics from tag co-occurrence across a corpus
SMD documents are entry points into a shared, evolving knowledge graph.
- A block is a wiki node — independently addressable, typed, and enriched
- A document is a local projection of the wiki — one view into the graph
- The wiki exists above documents, not inside them — memory lives across files
- Agents continuously expand and reorganize this graph through enrichment
Unlike traditional systems where memory is reconstructed from logs or stored as compressed residues, SMD stores memory in its native form.
| System | Memory Representation |
|---|---|
| Logs / Databases | Memory reconstructed after the fact from event streams |
| Vector DBs | Memory stored as compressed semantic residues |
| SMD | Memory stored natively as structured, addressable objects |
This is the key distinction: SMD does not require reconstruction. The document is the memory, in the form it was written.
SMD is the minimal system required for a wiki that agents can co-author.
Traditional wikis are human-written knowledge graphs. SMD wikis are continuously computed memory graphs.
Traditional wikis assume:
- humans write pages
- links connect pages
- structure is manually curated
SMD inverts this:
- blocks are generated and enriched continuously — by humans and agents
- structure emerges from metadata + embeddings + agent interaction — not manual linking
- links are implicit in shared tags, entities, and semantic overlap
- agents are first-class participants in knowledge construction
In SMD, the wiki is not authored.
It is computed and continuously evolved over a shared memory substrate.
In SMD, there are no pages. There is only evolving state.
SMD is to wikis what ASTs are to code editors.
SMD is a writable knowledge graph disguised as markdown.
Find all high-confidence margin-related insights:
filter:
type = "investment_thesis"
tags contains "margins"
sentiment > 0.7
SMD has two primitives: @document and @block.
The top-level container with a JSON header and optional Markdown body.
@document
{
"document_id": "aapl-q3-2026",
"schema_version": "0.2",
"type": "transcript",
"created": "2026-01-01T00:00:00Z"
}
---
The fundamental addressable unit. Every block has a JSON header and a Markdown body, separated by ---.
@block
{
"block_id": "qa-0042",
"type": "qa",
"tags": ["guidance", "margins"],
"sentiment": 0.74,
"entities":["AAPL"],
"summary": "Positive guidance for Q4."
}
---
**Analyst:** Can you discuss guidance?
**CEO:** We expect steady growth next quarter.
Typed segments provide multiple structured interpretations of a single block body without losing human readability.
Block bodies are parsed into typed segments. Markdown text between fenced blocks is type: "markdown". Triple-backtick fenced blocks with a type label become typed segments:
@block { "block_id": "note-001" }
---
Some markdown here.
```thought
This is my internal thinking about this topic.
```
```action_item
Review benchmarks. Assigned to: Bob
Due: 2026-07-01
```
Parses to: [{type: "markdown", …}, {type: "thought", …}, {type: "markdown", …}, {type: "action_item", …}]
@document
{
"document_id": "aapl-q3-2026",
"schema_version": "0.2",
"type": "transcript",
"created": "2026-01-01T00:00:00Z",
"meta": {
"ticker": "AAPL",
"fiscal_quarter": "Q3"
}
}
---
@block
{
"block_id": "qa-0042",
"type": "qa",
"position": 1,
"tags": ["guidance"],
"sentiment": 0.74,
"entities": ["AAPL"],
"summary": "Positive guidance."
}
---
**Analyst:** Can you discuss guidance?
**CEO:** We expect steady growth next quarter.
- A document is a container of blocks
- Every entity has a JSON header
- Header and body are separated by
--- - Block bodies contain ordered typed segments (markdown + fenced blocks)
- Metadata is always structured JSON
- No required structural model exists outside the format
- Blocks are independently addressable — they can be read, enriched, and rewritten without affecting other blocks
- The block is the atomic unit of meaning, storage, retrieval, and enrichment
pip install semantic-markdown| Component | Description |
|---|---|
parser.py |
Core SMD text → document parser. O(n), single pass. |
indexer.py |
SMDIndexer — chunking, sentiment-keyed storage, Q&A thread extraction, tag inverted index, topic clustering |
harness.py |
SMD Agent Harness — SMDAgent + MCP server; controlled LLM-document interface |
smd <file.smd> — parse and display SMD document structure
smd-mcp — start the MCP server for LLM integration
index.html — a standalone browser viewer with embedded JS SMD parser. Features:
- Template picker sidebar (transcript, notebook, corpus)
- Tag-based filtering (clickable tag chips)
- Sentiment-colored block cards (red → yellow → green)
- Highlight annotations with sentiment bars
- Key takeaways, summaries, fenced segment rendering
- Paste-any-SMD text area for quick testing
python -m http.server 8080
# Open http://localhost:8080harness.py — the SMD Agent Harness: combines SMDAgent + MCP server into a single, auditable interface. The harness is the controlled boundary between an LLM and SMD documents — every read, write, and enrichment goes through typed tool calls. The LLM never touches the filesystem directly.
smd-mcp12 tools: read, search, add_enrichment, del_enrichment, edit_enrichment, write_sentiment, write_tags, filter_blocks, read_next_qa, read_next_block, read_document, save
Agents do not process documents. They interact with persistent memory objects.
1. OBSERVE — Agent reads memory state via the harness
2. ENRICH — Agent attaches new memory: sentiment, tags, entities, summaries
Over time, SMD documents converge into a living wiki graph co-authored by humans and agents.
The harness is the controlled boundary — every memory operation goes through typed tool calls. The LLM never touches the filesystem directly. Every enrichment is auditable and traceable.
- Agent knowledge bases — wikis that agents read, write, and evolve over time
- Earnings transcripts — Q&A blocks with sentiment, speaker attribution, topic tags
- Research notebooks — dated entries with thought/action_item fenced blocks; incremental knowledge accumulation
- RAG corpora — pre-enriched semantic memory units with summaries and entities
- Multi-agent collaboration — blocks attributed by agent/author with auditable enrichment history
- Financial dashboards — block-level embeds (charts, widgets) alongside structured memory state
- Personal knowledge graphs — documents that grow into interconnected memory networks
| System | What it gets right | What it lacks |
|---|---|---|
| Markdown | Human-readable simplicity | No native structure |
| JSON / ASTs | Fully structured data | Not human-writable as primary format |
| YAML+MD (Frontmatter) | File-level metadata | No block-level addressing |
| Notion | Block-based authoring | Not portable outside its ecosystem |
| ProseMirror | Structured document tree | No semantic node/data standard |
| MDX | Markdown + components | Requires JSX toolchain |
| Obsidian | Portable Markdown ecosystem | No semantic block schema |
| Jupyter Notebooks | Rich mixed-content documents | Not designed for semantic retrieval |
| SQLite | Structured storage | Not human-authorable |
| Git | Versioned plain text | No semantic structure |
SMD sits at the intersection: structured like ProseMirror, authored like Markdown, composable like JSON, modular like Notion blocks, portable like a plain text file.
But unlike each individually:
The document is the memory.
Frontmatter describes files. SMD describes blocks.
Frontmatter provides file-level metadata. SMD provides addressable semantic objects inside a document.
In SMD, every block carries its own metadata, enrichments, tags, summaries, sentiment, entities, and application-specific state.
The unit of structure is not the file. It is the block.
Experimental — active research project with a reference parser, viewer, and agent tooling.
- 📄 Full Specification
- 🐍 Parser
- 🔍 Indexer
- 🤖 Agent Harness
- 🖥️ Viewer
- 📚 Examples
Online: Enable GitHub Pages in repo settings → deploy from master branch, / (root) → visit finxsight.github.io/semantic-markdown
Local: python -m http.server 8080 → open http://localhost:8080
The viewer loads .smd files from examples/ and renders them with sentiment coloring, block-type filters, tag navigation, and topic highlighting.
@software{semantic_markdown,
author = {Sandeep Muthangi},
title = {Semantic Markdown: A Block-Addressable Memory Substrate
for Human-Agent Knowledge Systems},
year = {2026},
url = {https://github.com/finxsight/semantic-markdown}
}Sandeep Muthangi
Semantic Markdown (SMD), 2026
Apache License 2.0 — see LICENSE for full text.
A document is not text.
It is the smallest unit of intelligence infrastructure.
The document is the memory substrate.