BubbleStream

Dual-Stream Memory Framework for LLM Infinite Context

BubbleStream is an inference-layer framework that gives any LLM effectively unlimited context by managing memory externally. Instead of relying on ever-growing context windows, it compresses conversation history into indexed memory blocks that "bubble up" on demand -- achieving 94% memory precision across 32K+ cumulative dialogue with only 16K effective context.

This project was the starting point for a broader research program into bounded-state sequence learning. The architectural limitations discovered here (embedding-based retrieval ceiling, inability to learn) led directly to the design of MathBrain, a trainable architecture that solves the memory problem from first principles.

How It Works

User message
     │
     ▼
┌─────────────────────────────────┐
│       Thinking Stream           │
│       (strong model, reasoning) │
│                                 │
│  [Memory zone] + [rolling       │
│   context] + [active zone]      │
└──────────────┬──────────────────┘
               │ completed segments
               ▼
┌─────────────────────────────────┐
│        Memory Stream            │
│     (weak model, compression)   │
│                                 │
│  compress → store → retrieve    │
│  → deduplicate → decay          │
└─────────────────────────────────┘

The core idea: separate reasoning from memory management.

Thinking Stream uses a strong model (DeepSeek V3, Claude, etc.) and focuses only on reasoning. It sees a rolling context window plus injected memory blocks.
Memory Stream uses a weaker/cheaper model and handles compression, storage, retrieval, and maintenance of memory "bubbles" asynchronously.
Memory blocks bubble up into the thinking context when relevant -- either passively (reranked after each segment) or actively (when the reasoning model calls query_memory()).

Key Properties

Works with any LLM: Pure inference-layer solution, no model modification or fine-tuning
Effectively unlimited context: Rolling window + external memory, tested up to 32K+ tokens
94% memory precision: With only 16K effective context window
Async dual-stream: Memory compression never blocks reasoning
Natural forgetting: Unused memories decay via heat-based scoring
Full web UI + CLI + API: Complete frontend and backend included

Quick Start

Backend

# Clone and install
git clone https://github.com/Mr-Skeleton-Max/BubbleStream.git
cd BubbleStream

# Set up environment
cp .env.example .env
# Edit .env and add your API key

# Install dependencies
pip install -r requirements.txt

# Start API server
python run_api.py
# Server runs at http://localhost:8000

Frontend

cd Web
npm install
npm run dev
# Opens at http://localhost:5173

CLI

python cli.py chat "Hello, let's have a long conversation"

Architecture

BubbleStream/
├── run_api.py                    # API server entry point
├── cli.py                        # CLI client
├── src/
│   ├── orchestrator.py           # Dual-stream coordinator
│   ├── thinking/                 # Thinking Stream
│   │   ├── stream.py             #   Main reasoning loop
│   │   ├── segment_detector.py   #   Segment boundary detection
│   │   ├── context_manager.py    #   Rolling context window
│   │   └── memory_interface.py   #   Memory query interface
│   ├── memory/                   # Memory Stream
│   │   ├── bubble_generator.py   #   Compress segments → bubbles
│   │   ├── segment_queue.py      #   Async segment processing
│   │   ├── integration.py        #   Memory pipeline coordinator
│   │   └── graph/                #   Graph-based memory store
│   ├── storage/                  # Persistence layer (SQLite)
│   ├── shared/                   # Config, LLM client, embeddings
│   └── api/                      # FastAPI routes + WebSocket
├── Web/                          # React + Vite frontend
├── cli/                          # TypeScript CLI (alternative)
├── prompts/                      # Prompt templates for both streams
└── docs/                         # Design documents (Chinese)

Design Principles

Separation of concerns: Reasoning model never manages memory directly
Greedy compression: Better to over-store than to miss information
Async by default: Memory processing never blocks the reasoning flow
Natural decay: Unused memories lose heat and sink; accessed memories bubble up
Single interface: The reasoning model's only memory operation is query_memory(query) -> str

Limitations and Lessons Learned

BubbleStream works well as an engineering solution, but it has a fundamental ceiling:

Retrieval quality is bounded by embedding similarity. As memory grows, embedding-based retrieval becomes increasingly unreliable for nuanced or compositional queries.
Not trainable. The system exploits Transformer attention properties but cannot learn or improve from experience.
Memory precision is determined by the management mechanism, not the LLM's capability.

These limitations motivated the development of MathBrain -- a trainable architecture that replaces retrieval-based memory with categorical voting from bounded state.

Requirements

Python >= 3.11
Node.js >= 18 (for frontend)
An OpenAI-compatible API key (SiliconFlow, OpenAI, etc.)

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Web		Web
cli		cli
docs		docs
examples		examples
prompts		prompts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_api.py		run_api.py
run_new.py		run_new.py
run_server.py		run_server.py
start.sh		start.sh
status.sh		status.sh
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BubbleStream

How It Works

Key Properties

Quick Start

Backend

Frontend

CLI

Architecture

Design Principles

Limitations and Lessons Learned

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BubbleStream

How It Works

Key Properties

Quick Start

Backend

Frontend

CLI

Architecture

Design Principles

Limitations and Lessons Learned

Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages