Concordia v2 Multi-Agent Simulation Framework

A flexible multi-agent simulation framework built on Concordia v2 with Hydra-based configuration management and multi-model support.

Features

Hydra Configuration: Composable YAML configurations for experiments, models, scenarios
Multi-Model Support: GPT-4o, GPT-4o-mini, Claude, Ollama with per-agent model assignment
Dynamic Scenarios: Switch scenarios via config (marketplace, election, misinformation, ai_conference, debate)
Social Media Environment: In-memory social media platform with posts, replies, likes, boosts, follows
Evaluation Probes: Query agents at checkpoints without affecting their memory (categorical, numeric, boolean, judged_numeric)
ValueFlow Scenario: Replication of arxiv 2602.08567 — measures how value perturbations propagate through multi-agent networks via Schwartz Value Survey probes and System Susceptibility metrics
Style Diversity Evaluation: Reference-free metrics (self-BLEU, lexical diversity, content evolution, etc.) for diagnosing repetitive agent behavior
Experiment Organization: Structured study/hypothesis/condition tree for reproducible experiments (see experiments/study_schema.md)
Modular Architecture: Engines, simulators, and components can be mixed and matched
Extensible Design: Easy to add new scenarios, agents, game masters, and components
Full Dev Infrastructure: Pre-commit hooks, CI/CD, type checking, ~320 tests

Quick Start

Installation

# Clone and install
git clone git@github.com:mptouzel/scensim.git
cd scensim/simulator

# Using uv (recommended)
uv sync

# Or using pip
pip install -e ".[dev]"

# Setup pre-commit hooks
pre-commit install

Configuration

Copy the environment template:

cp .env.example .env

Add your API keys to .env:

OPENAI_API_KEY=your-key-here
ANTHROPIC_API_KEY=your-key-here

Running Simulations

# Run with default configuration (marketplace)
uv run python run_experiment.py

# Social media scenarios
uv run python run_experiment.py scenario=misinformation
uv run python run_experiment.py scenario=ai_conference

# Traditional scenarios
uv run python run_experiment.py scenario=election
uv run python run_experiment.py scenario=debate

# Override parameters
uv run python run_experiment.py simulation.execution.max_steps=50 model=claude

# Switch model
uv run python run_experiment.py scenario=ai_conference model=gpt4o

# Multi-model simulation
uv run python run_experiment.py model=multi_model

# Quick test with mock models (no API calls)
uv run python run_experiment.py --quick-test

# View configuration without running
uv run python run_experiment.py --cfg job

Scenarios

Scenario	Engine	Description
`marketplace`	sequential	Buyers, sellers, and an auctioneer negotiate trades
`election`	sequential	Voters, candidates, and media interact during a campaign
`misinformation`	social_media	Information spread and manipulation on a social media platform
`ai_conference`	social_media	Two echo chambers (conference attendees vs. protesters) collide on social media; studies groupthink dynamics
`debate`	sequential	Formal debate with debaters, moderator, and judges
`valueflow`	valueflow	Value perturbation propagation across agent networks (arxiv 2602.08567); measures β-susceptibility and System Susceptibility via Schwartz Value Survey probes

Social media scenarios auto-select the social_media environment config via a Hydra defaults override, which activates the SocialMediaEngine for parallel agent execution with feed-based interaction.

Project Structure

simulator/
├── run_experiment.py              # Main entry point (Hydra-decorated)
├── config/                        # Hydra configuration
│   ├── experiment.yaml            # Main config with defaults
│   ├── simulation/                # Simulation mode configs
│   │   ├── sequential.yaml
│   │   └── parallel.yaml
│   ├── model/                     # Model configurations
│   │   ├── gpt4.yaml             # GPT-4o-mini (default)
│   │   ├── gpt4o.yaml            # GPT-4o
│   │   ├── claude.yaml
│   │   ├── mock.yaml
│   │   └── multi_model.yaml
│   ├── environment/               # Environment settings
│   │   ├── generic_world.yaml
│   │   ├── game_theoretic.yaml
│   │   ├── social_media.yaml
│   │   └── valueflow.yaml         # ValueFlowEngine settings
│   ├── scenario/                  # Scenario definitions
│   │   ├── marketplace.yaml
│   │   ├── election.yaml
│   │   ├── misinformation.yaml
│   │   ├── ai_conference.yaml
│   │   ├── debate.yaml
│   │   └── valueflow.yaml         # Topology, perturbation, interaction config
│   └── evaluation/                # Evaluation metrics
│       ├── basic_metrics.yaml
│       ├── election.yaml
│       ├── marketplace.yaml
│       └── valueflow.yaml         # 56 Schwartz value probes (JudgedNumericProbe)
├── scenarios/                     # Scenario implementations
│   ├── marketplace/               # Marketplace scenario
│   │   ├── agents.py              # BuyerAgent, SellerAgent, AuctioneerAgent
│   │   ├── game_masters.py        # MarketGameMaster
│   │   ├── knowledge.py           # Knowledge builders
│   │   ├── events.py              # Event generators
│   │   └── data/knowledge.yaml
│   ├── election/                  # Election scenario
│   │   ├── agents.py, game_masters.py, knowledge.py, events.py
│   │   └── data/knowledge.yaml
│   ├── misinformation/            # Misinformation scenario (social media)
│   │   ├── agents.py              # SocialMediaUserAgent prefab
│   │   └── game_masters.py        # MisinformationGameMaster
│   ├── ai_conference/             # AI Conference groupthink scenario (social media)
│   │   ├── agents.py              # AIConferenceAgent prefab
│   │   └── game_masters.py        # AIConferenceGameMaster
│   └── valueflow/                 # ValueFlow scenario (arxiv 2602.08567)
│       ├── README.md              # Modification guide and experiment status
│       ├── agents.py              # ValueFlowAgent prefab (neutral persona)
│       ├── game_masters.py        # ValueFlowGameMaster + build_topology_graph()
│       ├── engine.py              # ValueFlowEngine (DAG-filtered, 3 rounds)
│       ├── simulator.py           # ValueFlowSimulator (perturbation + judge wiring)
│       ├── metrics.py             # β-susceptibility, SS, JSON export
│       ├── plotting.py            # All visualization functions
│       └── data/schwartz_values.yaml  # 56-value Schwartz dataset
├── src/                           # Core library
│   ├── simulation/                # Simulation infrastructure
│   │   ├── simulation.py          # Core Simulation class
│   │   ├── simulators/
│   │   │   ├── base.py            # BaseSimulator (abstract)
│   │   │   └── multi_model.py     # MultiModelSimulator
│   │   └── engines/
│   │       ├── base.py            # BaseEngine (abstract)
│   │       ├── sequential.py      # SequentialEngine
│   │       └── engine_utils.py    # Action spec parser fix
│   ├── entities/                  # Generic entity prefabs
│   │   ├── agents/
│   │   │   ├── basic_entity.py    # BasicEntity prefab
│   │   │   └── planning_agent.py  # PlanningAgent prefab
│   │   ├── game_masters/
│   │   │   └── basic_gm.py        # BasicGameMaster prefab
│   │   └── components/
│   │       └── base.py            # BaseComponent (abstract)
│   ├── environments/              # Environment implementations
│   │   └── social_media/
│   │       ├── app.py             # Post dataclass + SocialMediaApp
│   │       ├── engine.py          # SocialMediaEngine
│   │       ├── game_master.py     # SocialMediaGameMaster prefab
│   │       └── analysis.py        # Transmission chain analysis
│   ├── evaluation/                # Evaluation system
│   │   ├── probes.py              # CategoricalProbe, NumericProbe, BooleanProbe
│   │   └── probe_runner.py        # ProbeRunner (orchestrates probe execution)
│   ├── models/                    # Model implementations
│   │   ├── openai_model.py        # OpenAI GPT models
│   │   ├── anthropic_model.py     # Anthropic Claude models
│   │   └── local_model.py         # LocalModel for Ollama/local LLMs
│   └── utils/                     # Utilities
│       ├── config_helpers.py      # Config helper functions
│       ├── validation.py          # Config validation
│       ├── event_logger.py        # Simulation event logging
│       ├── logging_setup.py       # Logging configuration
│       └── testing.py             # Test utilities and mocks
├── scripts/                       # Utility scripts
│   ├── run_social_media_sim.py    # Standalone social media runner
│   ├── analyze_social_media.py    # CLI analysis tool
│   ├── explore_dashboard.py       # Interactive Dash explorer
│   ├── run_valueflow.py           # ValueFlow sweep runner (baseline + perturbed + metrics)
│   └── judge_probe_results.py     # Backfill null probe values via LLM judge
├── experiments/                   # Study definitions, tooling, and organized results
│   ├── study_schema.md            # Canonical study structure and pipeline
│   ├── scripts/
│   │   ├── study_io.py            # Shared I/O: load/validate study.yaml, extract run metadata
│   │   └── organize_experiments.py  # Organizer — builds studies/ tree from study.yaml
│   └── studies/
│       └── {study_name}/          # Per-study: study.yaml + eval.py + notebook.ipynb + generated results tree
├── notebooks/                     # Empty — notebooks live in experiments/studies/{study_name}/notebook.ipynb
└── tests/                         # Test suite (241 tests)
    ├── conftest.py                # Shared fixtures
    ├── environments/              # Social media environment tests
    ├── test_agents/
    ├── test_simulators/
    ├── test_evaluation/
    ├── test_utils/
    ├── test_scenarios/
    └── test_integration/

Configuration System

Composable Configs

Configs are composed using Hydra's defaults list:

# config/experiment.yaml
defaults:
  - simulation: sequential
  - model: gpt4
  - environment: generic_world
  - scenario: marketplace
  - evaluation: basic_metrics

Override Examples

# Use Claude instead of GPT-4
uv run python run_experiment.py model=claude

# Use GPT-4o for higher quality
uv run python run_experiment.py model=gpt4o

# Parallel simulation with multi-model
uv run python run_experiment.py simulation=parallel model=multi_model

# Custom parameters
uv run python run_experiment.py \
  scenario.agents.buyer.budget=1000 \
  scenario.agents.seller.pricing_strategy=competitive

Multi-Model Support

Assign different models to different agents:

# config/model/multi_model.yaml
model_registry:
  gpt4:
    provider: openai
    model_name: gpt-4-turbo
  claude:
    provider: anthropic
    model_name: claude-3-5-sonnet

entity_model_mapping:
  Alice: gpt4
  Bob: claude
  narrator: gpt4

Social Media Environment

The social media environment provides an in-memory platform for simulating information spread:

Actions: post, reply, like, unlike, boost, follow, unfollow
Feed: Chronological feed from followed users
Engine: SocialMediaEngine runs parallel agent actions each step
Analysis: Transmission chain extraction, keyword overlap, network analysis

Activated by the social_media environment config (config/environment/social_media.yaml). Social media scenarios auto-select this environment via a Hydra defaults override.

# Run social media scenarios
uv run python run_experiment.py scenario=misinformation
uv run python run_experiment.py scenario=ai_conference

# Standalone runner (mock mode)
uv run python scripts/run_social_media_sim.py

# Analyze results
uv run python scripts/analyze_social_media.py path/to/checkpoint.json

# Interactive dashboard
uv run python scripts/explore_dashboard.py path/to/checkpoint.json

ValueFlow Scenario

Replication of arxiv 2602.08567. Measures how injecting an amplified value into one agent propagates through a multi-agent network, using the Schwartz Value Survey as probes.

# Run a single experiment (chain topology, social_power perturbed at agent 0)
uv run python run_experiment.py scenario=valueflow evaluation=valueflow environment=valueflow

# Sweep topologies (H1), value types (H3), and perturbation locations (H4)
uv run python scripts/run_valueflow.py \
  --topologies chain ring star fully_connected \
  --values social_power helpful equality \
  --locations 0 2 4

# Compare models — reuse existing baseline, write to separate results dir
uv run python scripts/run_valueflow.py \
  --model gpt4 \
  --topologies chain ring star fully_connected \
  --baseline-dir outputs/valueflow_experiment/<timestamp> \
  --output-dir experiments/valueflow/results_gpt4mini

# Analyze results
# Open notebooks/study_valueflow.ipynb

Key metrics saved to experiments/valueflow/results/{condition}/valueflow_metrics.json:

target_value_ss — System Susceptibility (headline scalar)
beta_susceptibility — per-agent, per-value shift (perturbed minus baseline)
beta_timeseries — β at each interaction round

See scenarios/valueflow/README.md for the full modification guide (add values, topologies, models, re-run from scratch).

Evaluation

Probes

Query agents at checkpoints without affecting their memory:

# config/evaluation/election.yaml
metrics:
  vote_preference:
    type: categorical
    categories: [conservative, progressive, undecided]
    prompt_template: |
      Based on {agent_name}'s views, which candidate do they prefer?
    applies_to: [voter]

Probe types: categorical, numeric (min/max range), boolean (yes/no), judged_numeric (free-form agent response scored by a judge LLM).

Style Diversity Metrics

Reference-free evaluation of agent linguistic diversity:

# Evaluate a single run
uv run python experiments/studies/style_diversity/eval.py path/to/checkpoint.json

# Compare two runs
uv run python experiments/studies/style_diversity/eval.py ckpt1.json ckpt2.json --compare

# Export to file
uv run python experiments/studies/style_diversity/eval.py checkpoint.json -o results/

Computes 10 metrics per agent: self-BLEU, lexical diversity, content evolution, opener variety, action entropy, near-duplicate rate, target fixation, action diversity, new post rate, and inter-agent distinctiveness.

Experiment Organization

Organize simulation runs into a browsable study/hypothesis/condition hierarchy:

uv run python experiments/scripts/organize_experiments.py experiments/studies/style_diversity/study.yaml

See experiments/study_schema.md for the full schema covering directory layout, file formats, and the standard results notebook structure.

Creating Custom Scenarios

Create config file config/scenario/my_scenario.yaml:

name: my_scenario
premise: |
  Description of your scenario...

roles:
  - name: player
    description: "A player in the game"

agents:
  entities:
    - name: Alice
      role: player
      prefab: basic_entity
      params:
        goal: "Win the game"

game_master:
  prefab: basic_game_master
  name: narrator

prefabs:
  basic_entity: src.entities.agents.basic_entity.BasicEntity
  basic_game_master: src.entities.game_masters.basic_gm.BasicGameMaster

(Optional) Create custom prefabs in scenarios/my_scenario/agents.py
Run: uv run python run_experiment.py scenario=my_scenario

For social media scenarios, add - override /environment: social_media to defaults and include initial_graph and seed_posts sections (see config/scenario/ai_conference.yaml for a complete example).

Development

Running Tests

# All tests
uv run pytest

# With coverage
uv run pytest --cov=src --cov=scenarios

# Specific test directory
uv run pytest tests/test_evaluation/ -v

# Skip integration tests
uv run pytest -m "not integration"

Code Quality

# Run all pre-commit hooks
uv run pre-commit run --all-files

# Type checking
uv run mypy src/ scenarios/

# Linting
uv run ruff check src/ scenarios/

Commit Convention

We use Conventional Commits:

# Using commitizen
uv run cz c

# Manual format
git commit -m "feat(scenario): add new auction mechanism"
git commit -m "fix(agent): resolve memory retrieval bug"

License

Apache License 2.0 - See LICENSE for details.

Acknowledgments

Concordia by Google DeepMind
Hydra by Facebook Research

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.claude/skills/codebase-architecture		.claude/skills/codebase-architecture
.github/workflows		.github/workflows
.vscode		.vscode
config		config
experiments		experiments
scenarios		scenarios
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AI_CONTEXT.md		AI_CONTEXT.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
closing_the_eval_gap.md		closing_the_eval_gap.md
pyproject.toml		pyproject.toml
run_experiment.py		run_experiment.py
setup.ps1		setup.ps1
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concordia v2 Multi-Agent Simulation Framework

Features

Quick Start

Installation

Configuration

Running Simulations

Scenarios

Project Structure

Configuration System

Composable Configs

Override Examples

Multi-Model Support

Social Media Environment

ValueFlow Scenario

Evaluation

Probes

Style Diversity Metrics

Experiment Organization

Creating Custom Scenarios

Development

Running Tests

Code Quality

Commit Convention

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Concordia v2 Multi-Agent Simulation Framework

Features

Quick Start

Installation

Configuration

Running Simulations

Scenarios

Project Structure

Configuration System

Composable Configs

Override Examples

Multi-Model Support

Social Media Environment

ValueFlow Scenario

Evaluation

Probes

Style Diversity Metrics

Experiment Organization

Creating Custom Scenarios

Development

Running Tests

Code Quality

Commit Convention

License

Acknowledgments

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages