User as Code: Executable Memory for Personalized Agents

Research artifact for the paper User as Code: Executable Memory for Personalized Agents (Bojie Li, Pine AI).

📄 Paper: arXiv:2606.16707 (LaTeX sources in this repo; build with make)
🌐 Interactive companion site: https://01.me/research/user-as-code — explore every graded test case across all four benchmarks
⚖️ License: Apache-2.0

The idea in one paragraph

Personalized agents need a user memory: a model of the user that accumulates across conversations. Today that memory is stored as unstructured text, knowledge graphs, or flat fact stores and consulted by retrieval (similarity search). Because storing a fact and acting on it are separate steps, such "bag-of-facts" memory recalls well but struggles to resolve contradictions, aggregate over many records, or enforce logical rules. User as Code (UaC) instead makes memory executable: a user's state is a directory of typed Python objects, and the rules over that state are ordinary Python functions, so representing the user and reasoning about the user happen in one medium an interpreter can run. The enabling mechanism is a two-phase pipeline — an append-only fact log, periodically checkpointed into structured typed code.

What's in this repository

Path	What it is
`paper.tex`, `body*.tex`, `reference.bib`, `Makefile`	LaTeX sources for the paper (compiles with `arxiv.sty` + `plainnat`)
`figures/`	Paper figures (PDF) and the scripts that generate them
`prototype/`	Reference UaC implementation — a worked example user (`jessica_thompson`) as typed domains + executable constraints + tests
`experiments/`	Full experiment harness, the UaC pipeline (`user_as_code_v5.py`), baseline reimplementations, and committed `results/`. See `experiments/README.md`
`evaluation/`	The Active Service benchmark scenario definitions (60 scenarios, 5 categories). See `evaluation/README.md`
`benchmarks/`	Fetch script + instructions for the third-party datasets (LOCOMO, LongMemEval). Raw data is not redistributed. See `benchmarks/README.md`
`web/`	React companion site that visualizes every graded test case. See `web/README.md`
`scripts/`	`build_site_data.py` — turns `experiments/results/` into the site's data bundles
`user-as-code/`	Slidev slide deck (talk version of the paper)

Quick start

Building the paper

make            # -> paper.pdf  (pdflatex + bibtex; needs a TeX Live install)

Running the reference prototype

The fastest way to see "user as code" concretely — no API keys or datasets needed:

cd prototype
python runner.py                         # run every constraint, print the alerts
python -m pytest jessica_thompson/tests/ # validate constraint behavior

Each user is a self-contained Python project: manifest.py (compact always-loaded index), domains/ (typed dataclass schemas + state), constraints/ (executable invariants that return alerts), and tests/.

Reproducing the experiments

# 1. install deps
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. fetch the benchmark datasets (LOCOMO downloads directly; LongMemEval is author-distributed)
./benchmarks/fetch_benchmarks.sh

# 3. set API keys
export GEMINI_API_KEY=...        # main pipeline + judge (Gemini 3 Flash)
export OPENROUTER_API_KEY=...    # cross-family judge; Mem0/A-MEM write path

# 4. run an experiment (see experiments/README.md for the full script -> result map)
cd experiments
python run_locomo_10conv.py      # LOCOMO 600-QA comparison

Every per-run output we report is committed under experiments/results/, so you can inspect the paper's numbers without re-running anything. The experiments/README.md maps each script to the paper table/figure it produces.

Running the companion website

cd web
npm install
npm run dev      # http://localhost:5173
# data bundles are regenerated with:  python3 ../scripts/build_site_data.py

Headline results

Capability	UaC	Best retrieval baseline	Why
Factual recall (LOCOMO, 600 QA)	78.8%	within 1pt of a full-context upper bound	competitive with the strongest prior systems
Analytical inference (aggregate queries)	99%	6–43%	answer is a one-line computation over typed state, not a search over text
Active Service (unsolicited alerts)	100% standard / 85% hard	n/a	constraints execute deterministically on state change — retrieval cannot initiate

See the paper for the full tables, ablations, cost analysis, and cross-judge/cross-LLM robustness checks.

Reproducibility notes

Committed: all experiment scripts, the per-run result JSONs, the synthetic analytical benchmark, the Active Service scenarios, and the reference prototype.
Not committed (regenerable / third-party): the vector-index cache (experiments/chroma_db/), the raw benchmark datasets (benchmarks/*/data/, fetched via the script), and a few large LongMemEval-derived dumps (rebuilt by the pipeline). See the .gitignore for the exact list and the per-directory READMEs for how each is regenerated.

Cite this work

If you use this work, please cite the paper (arXiv:2606.16707):

@article{li2026userascode,
  title         = {User as Code: Executable Memory for Personalized Agents},
  author        = {Li, Bojie},
  journal       = {arXiv preprint arXiv:2606.16707},
  year          = {2026},
  eprint        = {2606.16707},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  url           = {https://arxiv.org/abs/2606.16707}
}

License

Code and documentation are released under the Apache License 2.0 (see also NOTICE). Third-party datasets (LOCOMO, LongMemEval) and memory libraries (Mem0, A-MEM, MemMachine, EverMemOS, Hindsight) are governed by their own licenses and are not included here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

User as Code: Executable Memory for Personalized Agents

The idea in one paragraph

What's in this repository

Quick start

Building the paper

Running the reference prototype

Reproducing the experiments

Running the companion website

Headline results

Reproducibility notes

Cite this work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
benchmarks		benchmarks
evaluation		evaluation
experiments		experiments
figures		figures
prototype		prototype
scripts		scripts
user-as-code		user-as-code
web		web
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
arxiv.sty		arxiv.sty
body.tex		body.tex
body_cost_table.tex		body_cost_table.tex
body_tables.tex		body_tables.tex
paper.tex		paper.tex
reference.bib		reference.bib
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

User as Code: Executable Memory for Personalized Agents

The idea in one paragraph

What's in this repository

Quick start

Building the paper

Running the reference prototype

Reproducing the experiments

Running the companion website

Headline results

Reproducibility notes

Cite this work

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages