Skip to content

mptouzel/scensim

Repository files navigation

Concordia v2 Multi-Agent Simulation Framework

A flexible multi-agent simulation framework built on Concordia v2 with Hydra-based configuration management and multi-model support.

Features

  • Hydra Configuration: Composable YAML configurations for experiments, models, scenarios
  • Multi-Model Support: GPT-4o, GPT-4o-mini, Claude, Ollama with per-agent model assignment
  • Dynamic Scenarios: Switch scenarios via config (marketplace, election, misinformation, ai_conference, debate)
  • Social Media Environment: In-memory social media platform with posts, replies, likes, boosts, follows
  • Evaluation Probes: Query agents at checkpoints without affecting their memory (categorical, numeric, boolean, judged_numeric)
  • ValueFlow Scenario: Replication of arxiv 2602.08567 — measures how value perturbations propagate through multi-agent networks via Schwartz Value Survey probes and System Susceptibility metrics
  • Style Diversity Evaluation: Reference-free metrics (self-BLEU, lexical diversity, content evolution, etc.) for diagnosing repetitive agent behavior
  • Experiment Organization: Structured study/hypothesis/condition tree for reproducible experiments (see experiments/study_schema.md)
  • Modular Architecture: Engines, simulators, and components can be mixed and matched
  • Extensible Design: Easy to add new scenarios, agents, game masters, and components
  • Full Dev Infrastructure: Pre-commit hooks, CI/CD, type checking, ~320 tests

Quick Start

Installation

# Clone and install
git clone git@github.com:mptouzel/scensim.git
cd scensim/simulator

# Using uv (recommended)
uv sync

# Or using pip
pip install -e ".[dev]"

# Setup pre-commit hooks
pre-commit install

Configuration

  1. Copy the environment template:
cp .env.example .env
  1. Add your API keys to .env:
OPENAI_API_KEY=your-key-here
ANTHROPIC_API_KEY=your-key-here

Running Simulations

# Run with default configuration (marketplace)
uv run python run_experiment.py

# Social media scenarios
uv run python run_experiment.py scenario=misinformation
uv run python run_experiment.py scenario=ai_conference

# Traditional scenarios
uv run python run_experiment.py scenario=election
uv run python run_experiment.py scenario=debate

# Override parameters
uv run python run_experiment.py simulation.execution.max_steps=50 model=claude

# Switch model
uv run python run_experiment.py scenario=ai_conference model=gpt4o

# Multi-model simulation
uv run python run_experiment.py model=multi_model

# Quick test with mock models (no API calls)
uv run python run_experiment.py --quick-test

# View configuration without running
uv run python run_experiment.py --cfg job

Scenarios

Scenario Engine Description
marketplace sequential Buyers, sellers, and an auctioneer negotiate trades
election sequential Voters, candidates, and media interact during a campaign
misinformation social_media Information spread and manipulation on a social media platform
ai_conference social_media Two echo chambers (conference attendees vs. protesters) collide on social media; studies groupthink dynamics
debate sequential Formal debate with debaters, moderator, and judges
valueflow valueflow Value perturbation propagation across agent networks (arxiv 2602.08567); measures β-susceptibility and System Susceptibility via Schwartz Value Survey probes

Social media scenarios auto-select the social_media environment config via a Hydra defaults override, which activates the SocialMediaEngine for parallel agent execution with feed-based interaction.

Project Structure

simulator/
├── run_experiment.py              # Main entry point (Hydra-decorated)
├── config/                        # Hydra configuration
│   ├── experiment.yaml            # Main config with defaults
│   ├── simulation/                # Simulation mode configs
│   │   ├── sequential.yaml
│   │   └── parallel.yaml
│   ├── model/                     # Model configurations
│   │   ├── gpt4.yaml             # GPT-4o-mini (default)
│   │   ├── gpt4o.yaml            # GPT-4o
│   │   ├── claude.yaml
│   │   ├── mock.yaml
│   │   └── multi_model.yaml
│   ├── environment/               # Environment settings
│   │   ├── generic_world.yaml
│   │   ├── game_theoretic.yaml
│   │   ├── social_media.yaml
│   │   └── valueflow.yaml         # ValueFlowEngine settings
│   ├── scenario/                  # Scenario definitions
│   │   ├── marketplace.yaml
│   │   ├── election.yaml
│   │   ├── misinformation.yaml
│   │   ├── ai_conference.yaml
│   │   ├── debate.yaml
│   │   └── valueflow.yaml         # Topology, perturbation, interaction config
│   └── evaluation/                # Evaluation metrics
│       ├── basic_metrics.yaml
│       ├── election.yaml
│       ├── marketplace.yaml
│       └── valueflow.yaml         # 56 Schwartz value probes (JudgedNumericProbe)
├── scenarios/                     # Scenario implementations
│   ├── marketplace/               # Marketplace scenario
│   │   ├── agents.py              # BuyerAgent, SellerAgent, AuctioneerAgent
│   │   ├── game_masters.py        # MarketGameMaster
│   │   ├── knowledge.py           # Knowledge builders
│   │   ├── events.py              # Event generators
│   │   └── data/knowledge.yaml
│   ├── election/                  # Election scenario
│   │   ├── agents.py, game_masters.py, knowledge.py, events.py
│   │   └── data/knowledge.yaml
│   ├── misinformation/            # Misinformation scenario (social media)
│   │   ├── agents.py              # SocialMediaUserAgent prefab
│   │   └── game_masters.py        # MisinformationGameMaster
│   ├── ai_conference/             # AI Conference groupthink scenario (social media)
│   │   ├── agents.py              # AIConferenceAgent prefab
│   │   └── game_masters.py        # AIConferenceGameMaster
│   └── valueflow/                 # ValueFlow scenario (arxiv 2602.08567)
│       ├── README.md              # Modification guide and experiment status
│       ├── agents.py              # ValueFlowAgent prefab (neutral persona)
│       ├── game_masters.py        # ValueFlowGameMaster + build_topology_graph()
│       ├── engine.py              # ValueFlowEngine (DAG-filtered, 3 rounds)
│       ├── simulator.py           # ValueFlowSimulator (perturbation + judge wiring)
│       ├── metrics.py             # β-susceptibility, SS, JSON export
│       ├── plotting.py            # All visualization functions
│       └── data/schwartz_values.yaml  # 56-value Schwartz dataset
├── src/                           # Core library
│   ├── simulation/                # Simulation infrastructure
│   │   ├── simulation.py          # Core Simulation class
│   │   ├── simulators/
│   │   │   ├── base.py            # BaseSimulator (abstract)
│   │   │   └── multi_model.py     # MultiModelSimulator
│   │   └── engines/
│   │       ├── base.py            # BaseEngine (abstract)
│   │       ├── sequential.py      # SequentialEngine
│   │       └── engine_utils.py    # Action spec parser fix
│   ├── entities/                  # Generic entity prefabs
│   │   ├── agents/
│   │   │   ├── basic_entity.py    # BasicEntity prefab
│   │   │   └── planning_agent.py  # PlanningAgent prefab
│   │   ├── game_masters/
│   │   │   └── basic_gm.py        # BasicGameMaster prefab
│   │   └── components/
│   │       └── base.py            # BaseComponent (abstract)
│   ├── environments/              # Environment implementations
│   │   └── social_media/
│   │       ├── app.py             # Post dataclass + SocialMediaApp
│   │       ├── engine.py          # SocialMediaEngine
│   │       ├── game_master.py     # SocialMediaGameMaster prefab
│   │       └── analysis.py        # Transmission chain analysis
│   ├── evaluation/                # Evaluation system
│   │   ├── probes.py              # CategoricalProbe, NumericProbe, BooleanProbe
│   │   └── probe_runner.py        # ProbeRunner (orchestrates probe execution)
│   ├── models/                    # Model implementations
│   │   ├── openai_model.py        # OpenAI GPT models
│   │   ├── anthropic_model.py     # Anthropic Claude models
│   │   └── local_model.py         # LocalModel for Ollama/local LLMs
│   └── utils/                     # Utilities
│       ├── config_helpers.py      # Config helper functions
│       ├── validation.py          # Config validation
│       ├── event_logger.py        # Simulation event logging
│       ├── logging_setup.py       # Logging configuration
│       └── testing.py             # Test utilities and mocks
├── scripts/                       # Utility scripts
│   ├── run_social_media_sim.py    # Standalone social media runner
│   ├── analyze_social_media.py    # CLI analysis tool
│   ├── explore_dashboard.py       # Interactive Dash explorer
│   ├── run_valueflow.py           # ValueFlow sweep runner (baseline + perturbed + metrics)
│   └── judge_probe_results.py     # Backfill null probe values via LLM judge
├── experiments/                   # Study definitions, tooling, and organized results
│   ├── study_schema.md            # Canonical study structure and pipeline
│   ├── scripts/
│   │   ├── study_io.py            # Shared I/O: load/validate study.yaml, extract run metadata
│   │   └── organize_experiments.py  # Organizer — builds studies/ tree from study.yaml
│   └── studies/
│       └── {study_name}/          # Per-study: study.yaml + eval.py + notebook.ipynb + generated results tree
├── notebooks/                     # Empty — notebooks live in experiments/studies/{study_name}/notebook.ipynb
└── tests/                         # Test suite (241 tests)
    ├── conftest.py                # Shared fixtures
    ├── environments/              # Social media environment tests
    ├── test_agents/
    ├── test_simulators/
    ├── test_evaluation/
    ├── test_utils/
    ├── test_scenarios/
    └── test_integration/

Configuration System

Composable Configs

Configs are composed using Hydra's defaults list:

# config/experiment.yaml
defaults:
  - simulation: sequential
  - model: gpt4
  - environment: generic_world
  - scenario: marketplace
  - evaluation: basic_metrics

Override Examples

# Use Claude instead of GPT-4
uv run python run_experiment.py model=claude

# Use GPT-4o for higher quality
uv run python run_experiment.py model=gpt4o

# Parallel simulation with multi-model
uv run python run_experiment.py simulation=parallel model=multi_model

# Custom parameters
uv run python run_experiment.py \
  scenario.agents.buyer.budget=1000 \
  scenario.agents.seller.pricing_strategy=competitive

Multi-Model Support

Assign different models to different agents:

# config/model/multi_model.yaml
model_registry:
  gpt4:
    provider: openai
    model_name: gpt-4-turbo
  claude:
    provider: anthropic
    model_name: claude-3-5-sonnet

entity_model_mapping:
  Alice: gpt4
  Bob: claude
  narrator: gpt4

Social Media Environment

The social media environment provides an in-memory platform for simulating information spread:

  • Actions: post, reply, like, unlike, boost, follow, unfollow
  • Feed: Chronological feed from followed users
  • Engine: SocialMediaEngine runs parallel agent actions each step
  • Analysis: Transmission chain extraction, keyword overlap, network analysis

Activated by the social_media environment config (config/environment/social_media.yaml). Social media scenarios auto-select this environment via a Hydra defaults override.

# Run social media scenarios
uv run python run_experiment.py scenario=misinformation
uv run python run_experiment.py scenario=ai_conference

# Standalone runner (mock mode)
uv run python scripts/run_social_media_sim.py

# Analyze results
uv run python scripts/analyze_social_media.py path/to/checkpoint.json

# Interactive dashboard
uv run python scripts/explore_dashboard.py path/to/checkpoint.json

ValueFlow Scenario

Replication of arxiv 2602.08567. Measures how injecting an amplified value into one agent propagates through a multi-agent network, using the Schwartz Value Survey as probes.

# Run a single experiment (chain topology, social_power perturbed at agent 0)
uv run python run_experiment.py scenario=valueflow evaluation=valueflow environment=valueflow

# Sweep topologies (H1), value types (H3), and perturbation locations (H4)
uv run python scripts/run_valueflow.py \
  --topologies chain ring star fully_connected \
  --values social_power helpful equality \
  --locations 0 2 4

# Compare models — reuse existing baseline, write to separate results dir
uv run python scripts/run_valueflow.py \
  --model gpt4 \
  --topologies chain ring star fully_connected \
  --baseline-dir outputs/valueflow_experiment/<timestamp> \
  --output-dir experiments/valueflow/results_gpt4mini

# Analyze results
# Open notebooks/study_valueflow.ipynb

Key metrics saved to experiments/valueflow/results/{condition}/valueflow_metrics.json:

  • target_value_ss — System Susceptibility (headline scalar)
  • beta_susceptibility — per-agent, per-value shift (perturbed minus baseline)
  • beta_timeseries — β at each interaction round

See scenarios/valueflow/README.md for the full modification guide (add values, topologies, models, re-run from scratch).

Evaluation

Probes

Query agents at checkpoints without affecting their memory:

# config/evaluation/election.yaml
metrics:
  vote_preference:
    type: categorical
    categories: [conservative, progressive, undecided]
    prompt_template: |
      Based on {agent_name}'s views, which candidate do they prefer?
    applies_to: [voter]

Probe types: categorical, numeric (min/max range), boolean (yes/no), judged_numeric (free-form agent response scored by a judge LLM).

Style Diversity Metrics

Reference-free evaluation of agent linguistic diversity:

# Evaluate a single run
uv run python experiments/studies/style_diversity/eval.py path/to/checkpoint.json

# Compare two runs
uv run python experiments/studies/style_diversity/eval.py ckpt1.json ckpt2.json --compare

# Export to file
uv run python experiments/studies/style_diversity/eval.py checkpoint.json -o results/

Computes 10 metrics per agent: self-BLEU, lexical diversity, content evolution, opener variety, action entropy, near-duplicate rate, target fixation, action diversity, new post rate, and inter-agent distinctiveness.

Experiment Organization

Organize simulation runs into a browsable study/hypothesis/condition hierarchy:

uv run python experiments/scripts/organize_experiments.py experiments/studies/style_diversity/study.yaml

See experiments/study_schema.md for the full schema covering directory layout, file formats, and the standard results notebook structure.

Creating Custom Scenarios

  1. Create config file config/scenario/my_scenario.yaml:
name: my_scenario
premise: |
  Description of your scenario...

roles:
  - name: player
    description: "A player in the game"

agents:
  entities:
    - name: Alice
      role: player
      prefab: basic_entity
      params:
        goal: "Win the game"

game_master:
  prefab: basic_game_master
  name: narrator

prefabs:
  basic_entity: src.entities.agents.basic_entity.BasicEntity
  basic_game_master: src.entities.game_masters.basic_gm.BasicGameMaster
  1. (Optional) Create custom prefabs in scenarios/my_scenario/agents.py
  2. Run: uv run python run_experiment.py scenario=my_scenario

For social media scenarios, add - override /environment: social_media to defaults and include initial_graph and seed_posts sections (see config/scenario/ai_conference.yaml for a complete example).

Development

Running Tests

# All tests
uv run pytest

# With coverage
uv run pytest --cov=src --cov=scenarios

# Specific test directory
uv run pytest tests/test_evaluation/ -v

# Skip integration tests
uv run pytest -m "not integration"

Code Quality

# Run all pre-commit hooks
uv run pre-commit run --all-files

# Type checking
uv run mypy src/ scenarios/

# Linting
uv run ruff check src/ scenarios/

Commit Convention

We use Conventional Commits:

# Using commitizen
uv run cz c

# Manual format
git commit -m "feat(scenario): add new auction mechanism"
git commit -m "fix(agent): resolve memory retrieval bug"

License

Apache License 2.0 - See LICENSE for details.

Acknowledgments

About

A flexible multi-agent simulation framework built on [Concordia v2](https://github.com/google-deepmind/concordia) with Hydra-based configuration management and multi-model support.

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors