The self-evolving outer loop for autonomous AI agents.
Where most agent frameworks stop when the task ends, Entity keeps going — accumulating knowledge, improving goals, and running indefinitely without losing context.
Entity
├── /inhale ← knowledge absorption (external research → session context)
├── /exhale ← knowledge evolution (insights → experiments + code)
├── /live ← finite self-evolving outer loop
│ └── SCORE → EVOLVE → converge
├── /live-inf ← infinite outer loop (no iteration cap)
│ └── context rotation, world model, no session boundary
├── CTX ← context precision layer
│ └── LLM-free, 5.2% token budget, R@5=1.0 dependency recall
└── Safety Triad ← EVOLVE gate (goal drift + reward hacking + CoT monitorability)
| Feature | Entity | Oh My Codex | LangGraph |
|---|---|---|---|
| Infinite context rotation | ✓ | — | — |
| Self-evolving goals | ✓ | — | — |
| Knowledge absorption loop | ✓ | — | — |
| Safety Triad gate | ✓ | — | — |
| Parallel agent execution | via Oh My Codex | ✓ | partial |
Oh My Codex solves: "How do I run multiple agents in parallel right now?" Entity solves: "How do I make agents improve themselves over time?"
They're not alternatives. They're layers.
Most agent frameworks are session-local. They run a task and stop.
Real autonomous work requires:
- Context that persists past the window limit without data loss
- Goals that evolve when the current one is achieved
- Knowledge that accumulates — each run deposits what it learned
- Failures that are classified, not just retried
- Execution that continues indefinitely — hours, not seconds
Entity is infrastructure for this.
/inhale (collect research) → /exhale (design experiments) → /live (execute + evolve)
↑ |
└───────────────── knowledge feedback ─────────────────────┘
Each cycle: external insights become experiments, experiments become improvements, improvements raise the goal bar — until convergence.
Automated collection from research channels (arXiv, HN, newsletters, GeekNews). Scores items by relevance, injects actionable insights into session context.
- 57-keyword relevance scoring (max 10.0)
- Reflect-type classification: stuck_agent / hypothesis_validation / skill_selection / evaluation
- Source attribution: channel → paper → date → URL (full provenance)
Transforms absorbed insights into concrete project artifacts:
experimentmode → paired comparison design docscodemode → PR-ready implementation changesdesignmode → architecture integration specshypothesismode → H0/H1 with measurement protocol
All artifacts start as proposed → verified via experiment/test → accepted.
iter N: autopilot → SCORE (5 dimensions) → EVOLVE goal → iter N+1
Stops when score improvement delta < epsilon (default 0.05) for k=3 consecutive iterations.
- score_ensemble_n=3 (multi-reviewer, reduces LLM scoring variance)
- goal_fidelity gate (min 0.7 per step, min 0.50 cumulative)
- Context budget check at 70% — triggers early handoff before exhaustion
Extends /live for indefinite execution:
- Context rotation: at 70% budget → safe state handoff → fresh session → resume
- World model: epistemic state layer persists across rotations
- Co-evolution feedback: strategy outcomes feed back into Wave 1
LLM-free context loader that classifies query type and selects the matching retrieval strategy.
- 5.2% average token budget (vs 40-60% for naive loading)
- R@5 = 1.0 on dependency recall
- Zero LLM calls for retrieval — pure algorithmic
Three-detector gate that must pass before any goal evolution:
- Goal drift detector — alignment check against original_goal (cosine similarity)
- Reward hacking detector — divergence between score and task completion signal
- CoT monitorability checker — TF-IDF cosine between CoT intent and actual action
Every design decision is backed by controlled experiments:
| Question | Finding | Impact on design |
|---|---|---|
| How should agents reason? | Hypothesis-driven: -50% attempts on hard bugs, 100% first-hypothesis accuracy | Default reasoning in /live inner loop |
| Where are context limits? | Threshold-based cliff, not gradual fade — silent failure at specific lengths | Context rotation at 70% in /live-inf |
| Where do agents fail? | 3 predictable clusters: wrong decomposition, role non-compliance, boundary violation | omc-failure-router classification |
git clone https://github.com/jaytoone/HarnessOS # repo rename to Entity pending
cd Entity
# Run hypothesis-driven vs engineering debugging experiment
python3 analyze.py --run
# Run all experiments
python3 runner.py --exp a # context degradation (1K/10K/50K/100K tokens)
python3 runner.py --exp b # autonomous agent failure classification
# Tests
python3 -m pytest # 214 tests, 100% coverageNo pip install required. No API keys required for base experiments.
| Component | Status |
|---|---|
| CTX | Stable |
| /live | Stable |
| /live-inf | Stable |
| Safety Triad | Stable |
| /inhale + /exhale | Stable |
| HalluMaze | In development |
| Evaluation Layer | Planned |
An entity persists. It accumulates state, learns from experience, and acts with continuity.
LLMs have enormous capability. Without control structure, that capability is context-unaware, goal-unstable, failure-opaque, and session-local.
Entity adds the control structure — and keeps it running.