Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 54 additions & 27 deletions .pm/tracker.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# Project Task Tracker

**Last Updated:** 2025-12-02T06:23:51Z
**Last Updated:** 2025-12-02T18:55:00Z

## Status Summary

**Recent Progress (since last update):**

- 🎉 **Task 8.4.1 (Content Pipeline Tooling & CI) COMPLETED** - GitHub Issue [#23](https://github.com/TheWizardsCode/GEngine/issues/23)
- Content build script (`scripts/build_content.py`) validates worlds, configs, and sweeps
- CI workflow (`.github/workflows/content-validation.yml`) runs on content file changes
- Designer workflow documented in `docs/gengine/content_designer_workflow.md`
- 17 tests covering all validation paths, all passing
- Clear error messages with entity reference validation
- 🎉 **Task 10.1.2 (Strengthen AgentSystem Tests) COMPLETED**
- Refactored `AgentSystem` to extract scoring logic for testability.
- Added unit tests verifying trait influence (empathy, cunning, resolve) on decision scoring.
Expand Down Expand Up @@ -92,9 +98,9 @@

**Current Priorities:**

1. 🚀 **Phase 8 Deployment** - Core complete (8.1.1, 8.2.1, 8.3.1), need CI automation (8.3.2) and content pipeline (8.4.1)
1. 🚀 **Phase 8 Deployment** - Core complete (8.1.1, 8.2.1, 8.3.1, 8.4.1), need CI automation (8.3.2) to finish
2. 🤖 **Phase 9 AI Testing** - Observer (9.1.1) and action layer (9.2.1) complete, LLM-enhanced (9.3.1) ready to start
3. 🔧 **CI/CD Gap** - No automated workflows exist; high risk of regressions
3. 🔧 **CI/CD Gap** - K8s validation workflow (8.3.2) still needed for deployment protection

**Recommended Next 3 Parallel Tasks:**

Expand All @@ -105,37 +111,30 @@
- Impact: Protects all environments from manifest errors
- Estimated time: 1-2 days

2. **8.3.3 - K8s Resource Tuning** (Priority: MEDIUM, Effort: Low)
- Why: Complete 8.3.1 resource sizing acceptance criteria
- Owner needed: DevOps/SRE-focused agent
- Parallelizable: Configuration work, independent of code
- Impact: Prevents resource exhaustion in production
- Estimated time: 4-6 hours
- Prerequisites: Smoke test data from 8.3.1

3. **9.3.1 - LLM-Enhanced AI Decisions** (Priority: MEDIUM, Effort: High)
2. **9.3.1 - LLM-Enhanced AI Decisions** (Priority: MEDIUM, Effort: High)
- Why: Builds on completed AI foundation (9.1.1, 9.2.1)
- Owner needed: AI/ML-focused agent with LLM experience
- Parallelizable: AI/ML work, independent of infrastructure
- Impact: Enables advanced AI testing capabilities
- Estimated time: 3-5 days

**Alternative (if no AI/ML owner available):**
- **8.4.1 - Content Pipeline Tooling** instead of 9.3.1
- Priority: MEDIUM, Effort: Medium
- Unblocks content designers
- Estimated time: 2-3 days
3. **10.1.3 - Expand SimEngine API Tests** (Priority: HIGH, Effort: Medium)
- Why: Improve core system test coverage
- Owner needed: Test-focused agent
- Parallelizable: Test work, independent of infrastructure
- Impact: Better regression detection for core engine
- Estimated time: 2-3 days

**Key Risks:**

- 🔴 **K8s CI validation missing** - Bad manifests can break deployment (8.3.2) - HIGH IMPACT
- ⚠️ **Phase 8 content pipeline needs ownership** - Task 8.4.1 requires assignment
- ⚠️ **Phase 9 LLM enhancement ready** - Rule-based AI complete, LLM-enhanced (9.3.1) unblocked but needs owner
- ✅ **Phase 8 content pipeline complete** - Task 8.4.1 finished with build script, CI workflow, and documentation (2025-12-02)
- ✅ **Phase 8 observability complete** - Task 8.3.1 Prometheus annotations and smoke tests added (2025-12-01)
- ✅ **Phase 7 delivery risk eliminated** - All core player features complete and tested, per-agent modifiers enabled by default
- ✅ **Containerization complete** - Docker/Compose and K8s manifests tested and documented
- ✅ **AI player foundation complete** - Observer and action layer shipped with 112 tests
- ✅ **Clean repository state** - Issues #21, #24, #25 closed (verified 2025-12-01)
- ✅ **Clean repository state** - Issues #21, #23, #24, #25 closed (verified 2025-12-02)

| ID | Task | Status | Priority | Responsible | Updated |
| ----: | ----------------------------------------------- | ----------- | -------- | ------------------ | ---------- |
Expand Down Expand Up @@ -168,7 +167,7 @@
| 8.3.3 | K8s Resource Sizing & Tuning (M8.3.y) | completed | Medium | devops-agent | 2025-12-02 |
| 8.3.3 | Gateway/LLM Prometheus Metrics (M8.3.x) | not-started | Medium | TBD (ask Ross) | 2025-12-01 |
| 8.3.4 | Integrate K8s Smoke Test into CI (M8.3.x) | not-started | Medium | TBD (ask Ross) | 2025-12-01 |
| 8.4.1 | Content pipeline tooling & CI (M8.4) | not-started | Medium | TBD (ask Ross) | 2025-11-30 |
| 8.4.1 | Content pipeline tooling & CI (M8.4) | completed | Medium | devops-agent | 2025-12-02 |
| 9.1.1 | AI Observer foundation acceptance (M9.1) | completed | Medium | gamedev-agent | 2025-11-30 |
| 9.2.1 | Rule-based AI action layer (M9.2) | completed | Medium | gamedev-agent | 2025-12-01 |
| 9.3.1 | LLM-enhanced AI decisions (M9.3) | not-started | Medium | TBD (ask Ross) | 2025-11-30 |
Expand Down Expand Up @@ -717,16 +716,44 @@
### 8.4.1 — Content Pipeline Tooling & CI (M8.4)
- **GitHub Issue:** [#23](https://github.com/TheWizardsCode/GEngine/issues/23)
- **Description:** Implement content build tooling (`scripts/build_content.py`), CI validation hooks, and documentation so designers can author/test YAML and story seeds efficiently.
- **Acceptance Criteria:** Content build step produces artifacts consumed by simulation; CI validates content on change; designer workflow documented.
- **Acceptance Criteria:**
- ✅ Content build step produces artifacts consumed by simulation
- ✅ CI validates content on change (schema, references, integrity)
- ✅ Designer workflow documented
- ✅ Clear error messages for content validation failures
- **Priority:** Medium
- **Responsible:** TBD (ask Ross)
- **Dependencies:** Stable content schema and directory structure.
- **Responsible:** devops-agent
- **Status:** ✅ COMPLETED
- **Dependencies:** Stable content schema and directory structure (✅ complete).
- **Risks & Mitigations:**
- Risk: Pipeline friction slows content iteration. Mitigation: Optimize for designer ergonomics, provide quick local commands.
- **Next Steps:**
1. Implement build script.
2. Wire into CI.
3. Document designer workflow.
- **Completion Notes:**
- **Build Script** (`scripts/build_content.py`):
- Validates world definitions (`world.yml` and `story_seeds.yml`) with entity reference checking
- Validates simulation configuration (`simulation.yml`) against Pydantic schema
- Validates difficulty sweep configurations (`content/config/sweeps/*/`)
- Outputs JSON manifest with validation results and file lists
- Clear error messages with icons (❌/✓) and bullet-point formatting
- Exit codes: 0 (success), 1 (validation errors), 2 (file/config errors)
- **CI Workflow** (`.github/workflows/content-validation.yml`):
- Triggers on push to main and PRs that modify content files
- Monitors: `content/**/*.yml`, `content/**/*.yaml`, `scripts/build_content.py`, `.github/workflows/content-*.yml`
- Runs validation via `uv run python scripts/build_content.py --verbose --output content-manifest.json`
- Uploads content manifest artifact for debugging
- Blocks PR merge on validation failures
- **Designer Documentation** (`docs/gengine/content_designer_workflow.md`):
- Content types and structure (worlds, configs, sweeps)
- YAML schema examples with annotations
- Local validation instructions with exit codes
- CI/CD validation details and artifact retrieval
- Troubleshooting section with common validation errors
- Best practices for content authors
- **Test Coverage** (`tests/scripts/test_build_content.py`):
- 17 tests covering all validation paths
- Tests for valid content, missing files, invalid schemas, bad entity references
- Integration tests validating real repository content
- All tests passing
- **Last Updated:** 2025-12-02

### 9.1.1 — AI Observer Foundation Acceptance (M9.1)
- **GitHub Issue:** [#19](https://github.com/TheWizardsCode/GEngine/issues/19)
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ build-backend = "setuptools.build_meta"

[tool.ruff]
line-length = 88

[tool.ruff.lint]
select = ["E", "F", "B", "I"]

[tool.pytest.ini_options]
Expand Down
15 changes: 8 additions & 7 deletions scripts/analyze_difficulty_profiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ def from_telemetry(cls, preset: str, data: dict[str, Any]) -> "DifficultyProfile
# Calculate faction balance as delta between faction legitimacies
faction_leg = data.get("faction_legitimacy", {})
leg_values = list(faction_leg.values())
faction_balance = max(leg_values) - min(leg_values) if len(leg_values) >= 2 else 0.0
faction_balance = (
max(leg_values) - min(leg_values) if len(leg_values) >= 2 else 0.0
)

# Economic pressure from price volatility
economy = data.get("last_economy", {})
Expand Down Expand Up @@ -161,17 +163,16 @@ def compare_profiles(profiles: dict[str, DifficultyProfile]) -> dict[str, Any]:
"✓ Unrest correctly increases with difficulty (harder = more unrest)"
)
else:
findings.append(
"⚠ Unrest does not consistently increase with difficulty"
)
findings.append("⚠ Unrest does not consistently increase with difficulty")

# Check for extreme values
for preset, profile in profiles.items():
if profile.stability_end <= 0.0:
findings.append(f"⚠ {preset}: Stability collapsed to 0 (may be too harsh)")
if profile.anomalies > 100:
findings.append(
f"⚠ {preset}: High anomaly count ({profile.anomalies}) indicates system stress"
f"⚠ {preset}: High anomaly count ({profile.anomalies}) "
"indicates system stress"
)

# Check differentiation between adjacent difficulties
Expand All @@ -181,8 +182,8 @@ def compare_profiles(profiles: dict[str, DifficultyProfile]) -> dict[str, Any]:
stability_diff = abs(prof1.stability_end - prof2.stability_end)
if stability_diff < 0.05:
findings.append(
f"⚠ {p1} vs {p2}: Stability difference is minimal ({stability_diff:.3f}), "
"consider widening gap"
f"⚠ {p1} vs {p2}: Stability difference is minimal "
f"({stability_diff:.3f}), consider widening gap"
)

comparison["findings"] = findings
Expand Down
1 change: 0 additions & 1 deletion scripts/eoe_dump_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

import argparse
from pathlib import Path
from typing import Optional

from gengine.echoes.content import load_world_bundle
from gengine.echoes.persistence import save_snapshot
Expand Down
6 changes: 4 additions & 2 deletions scripts/plot_environment_trajectories.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,17 @@ def main(argv: Sequence[str] | None = None) -> int:
runs = _collect_runs(args.run)
if not runs:
raise SystemExit(
"No telemetry files found. Provide --run LABEL=PATH or rerun the sweeps to generate JSON."
"No telemetry files found. Provide --run LABEL=PATH "
"or rerun the sweeps to generate JSON."
)

fig, (ax_pollution, ax_unrest) = plt.subplots(2, 1, sharex=True, figsize=(10, 6))
for label, path in runs.items():
ticks, pollution, unrest = _extract_series(path)
if len(ticks) < 2:
print(
f"Warning: {label} only provided {len(ticks)} sample(s); increase focus.history_length before capturing telemetry."
f"Warning: {label} only provided {len(ticks)} sample(s); "
"increase focus.history_length before capturing telemetry."
)
ax_pollution.plot(ticks, pollution, label=label)
ax_unrest.plot(ticks, unrest, label=label)
Expand Down
12 changes: 9 additions & 3 deletions scripts/run_difficulty_sweeps.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,14 @@ def run_difficulty_sweeps(
sys.stderr.write(f"[SKIP] Config not found: {config_root}\n")
continue

output_path = output_dir / f"difficulty-{preset}-sweep.json" if output_dir else None
output_path = (
output_dir / f"difficulty-{preset}-sweep.json" if output_dir else None
)

if verbose:
sys.stderr.write(f"\n[START] {preset.upper()} difficulty ({ticks} ticks, seed={seed})\n")
sys.stderr.write(
f"\n[START] {preset.upper()} difficulty ({ticks} ticks, seed={seed})\n"
)

start = perf_counter()
summary = run_headless_sim(
Expand Down Expand Up @@ -106,7 +110,9 @@ def run_difficulty_sweeps(

total_elapsed = perf_counter() - start_total
if verbose:
sys.stderr.write(f"\n[COMPLETE] {len(results)} presets in {total_elapsed:.1f}s\n")
sys.stderr.write(
f"\n[COMPLETE] {len(results)} presets in {total_elapsed:.1f}s\n"
)

return results

Expand Down
36 changes: 27 additions & 9 deletions scripts/run_headless_sim.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,26 @@ def run_headless_sim(
"faction_actions": sum(len(report.faction_actions) for report in reports),
"faction_action_breakdown": _faction_breakdown(reports),
}
summary["suppressed_events"] = sum(len(report.suppressed_events) for report in reports)
summary["suppressed_events"] = sum(
len(report.suppressed_events) for report in reports
)
summary["director_feed"] = dict(engine.state.metadata.get("director_feed", {}))
summary["director_history"] = list(engine.state.metadata.get("director_history") or [])
summary["director_analysis"] = dict(engine.state.metadata.get("director_analysis") or {})
summary["director_events"] = list(engine.state.metadata.get("director_events") or [])
summary["director_pacing"] = dict(engine.state.metadata.get("director_pacing") or {})
summary["director_history"] = list(
engine.state.metadata.get("director_history") or []
)
summary["director_analysis"] = dict(
engine.state.metadata.get("director_analysis") or {}
)
summary["director_events"] = list(
engine.state.metadata.get("director_events") or []
)
summary["director_pacing"] = dict(
engine.state.metadata.get("director_pacing") or {}
)
summary["story_seeds"] = list(engine.state.metadata.get("story_seeds_active") or [])
summary["story_seed_lifecycle"] = dict(engine.state.metadata.get("story_seed_lifecycle") or {})
summary["story_seed_lifecycle"] = dict(
engine.state.metadata.get("story_seed_lifecycle") or {}
)
summary["story_seed_lifecycle_history"] = list(
engine.state.metadata.get("story_seed_lifecycle_history") or []
)
Expand Down Expand Up @@ -131,7 +143,9 @@ def _advance_in_batches(
"ticks": len(step_reports),
"ending_tick": last_report.tick if last_report else engine.state.tick,
"agent_actions": sum(len(report.agent_actions) for report in step_reports),
"faction_actions": sum(len(report.faction_actions) for report in step_reports),
"faction_actions": sum(
len(report.faction_actions) for report in step_reports
),
}
if last_report is not None:
batch_payload["tick_ms"] = round(
Expand Down Expand Up @@ -213,8 +227,12 @@ def main(argv: Sequence[str] | None = None) -> int:
default=None,
help="Optional snapshot file to load instead of content",
)
parser.add_argument("--ticks", "-t", type=int, default=200, help="Number of ticks to advance")
parser.add_argument("--seed", type=int, default=None, help="RNG seed override for determinism")
parser.add_argument(
"--ticks", "-t", type=int, default=200, help="Number of ticks to advance"
)
parser.add_argument(
"--seed", type=int, default=None, help="RNG seed override for determinism"
)
parser.add_argument(
"--lod",
choices=["detailed", "balanced", "coarse"],
Expand Down
16 changes: 7 additions & 9 deletions src/gengine/ai_player/actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,9 +419,9 @@ def _create_observation_summary(
start_value=1.0, # Assumed start
end_value=stability,
delta=stability - 1.0,
trend="stable" if abs(stability - 1.0) < 0.01 else (
"increasing" if stability > 1.0 else "decreasing"
),
trend="stable"
if abs(stability - 1.0) < 0.01
else ("increasing" if stability > 1.0 else "decreasing"),
)

# Extract faction swings
Expand All @@ -432,9 +432,9 @@ def _create_observation_summary(
start_value=0.5, # Assumed start
end_value=leg,
delta=leg - 0.5,
trend="stable" if abs(leg - 0.5) < 0.05 else (
"increasing" if leg > 0.5 else "decreasing"
),
trend="stable"
if abs(leg - 0.5) < 0.05
else ("increasing" if leg > 0.5 else "decreasing"),
)

return ObservationReport(
Expand Down Expand Up @@ -472,9 +472,7 @@ def _build_telemetry(self, final_state: dict[str, Any]) -> dict[str, Any]:

return {
"action_counts": action_counts,
"priority_stats": {
k: round(v, 4) for k, v in priority_stats.items()
},
"priority_stats": {k: round(v, 4) for k, v in priority_stats.items()},
"strategy_type": self._strategy.strategy_type.value,
"final_state": {
"stability": final_state.get("stability", 1.0),
Expand Down
11 changes: 6 additions & 5 deletions src/gengine/ai_player/llm_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,7 @@ def __post_init__(self) -> None:
if self.llm_timeout_seconds <= 0:
raise ValueError("llm_timeout_seconds must be positive")
if not 0.0 <= self.rule_priority_scaling <= 1.0:
raise ValueError(
"rule_priority_scaling must be between 0.0 and 1.0"
)
raise ValueError("rule_priority_scaling must be between 0.0 and 1.0")


@dataclass
Expand Down Expand Up @@ -270,6 +268,7 @@ def request_decision(
if loop is not None and loop.is_running():
# Already in async context - use thread to avoid nested loops
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
# Create a new event loop in the thread
future = executor.submit(self._run_in_new_loop, request)
Expand Down Expand Up @@ -376,7 +375,8 @@ def _build_command_from_context(
if "multiple_stressed_factions" in factors:
factions = request.state.get("faction_legitimacy", {})
low_factions = [
f for f, leg in factions.items()
f
for f, leg in factions.items()
if leg < self._config.complexity_threshold_legitimacy
]
return (
Expand Down Expand Up @@ -496,7 +496,8 @@ def evaluate_complexity(
# Check faction stress
faction_legitimacy = state.get("faction_legitimacy", {})
stressed_factions = sum(
1 for leg in faction_legitimacy.values()
1
for leg in faction_legitimacy.values()
if leg < config.complexity_threshold_legitimacy
)
if stressed_factions >= config.complexity_threshold_factions:
Expand Down
Loading