From b59e13f5387b77bdbc5ea8daf2c9302568f2ae0d Mon Sep 17 00:00:00 2001
From: crusaderky <crusaderky@gmail.com>
Date: Tue, 26 May 2026 22:25:52 +0100
Subject: [PATCH 1/2] Add AGENTS.md / CLAUDE.md

---
 AGENTS.md               | 102 ++++++++++++++++++++++++++++++++++++++++
 CLAUDE.md               |   1 +
 docs/source/develop.rst |  28 +++++++++++
 3 files changed, 131 insertions(+)
 create mode 100644 AGENTS.md
 create mode 120000 CLAUDE.md

diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 00000000000..4d9d000dc79
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,102 @@
+# AGENTS.md
+
+This file provides guidance to AI coding agents when working with code in this repository.
+
+## Overview
+
+Dask Distributed is the distributed scheduler for the Dask framework, enabling parallel computing across multiple machines. It implements multi-machine task scheduling, fault tolerance, work stealing, memory management, and network communication.
+
+## Environment & Commands
+
+The project uses **Pixi** for environment management.
+
+```bash
+# Run tests
+pixi run test
+
+# Run tests (CI mode with coverage, leak detection, slow tests)
+pixi run test-ci
+
+# Lint
+pixi run lint
+
+# Run a single test file
+pixi run test distributed/tests/test_client.py
+
+# Run a single test
+pixi run test distributed/tests/test_client.py::test_client_submit
+
+# Run tests matching a pattern
+pixi run test distributed/tests/test_client.py -k "submit"
+
+# Run tests in a specific environment
+pixi run -e py312 test
+```
+
+Key pytest options:
+- `--runslow` — include slow tests (omitted by default)
+- `-m ci1` / `-m "not ci1"` — run first/second CI partition (tests split for parallelism)
+- `--leaks=fds,processes,threads` — enable resource leak detection
+
+## Architecture
+
+### Core modules (all in `distributed/`)
+
+| File | Purpose |
+|------|---------|
+| `scheduler.py` | Main scheduler — task graph, work stealing, fault tolerance |
+| `client.py` | User-facing API — submit tasks, gather futures |
+| `worker.py` | Worker process — executes tasks, manages memory |
+| `worker_state_machine.py` | Worker state transitions (separate from I/O logic) |
+| `core.py` | RPC infrastructure, connection handling |
+| `utils_test.py` | Test fixtures and helpers used across all tests |
+
+### Subdirectories
+
+- `comm/` — Communication backends (TCP, UCX, compression)
+- `deploy/` — Cluster types: `LocalCluster`, `SSHCluster`, `SpecCluster`, adaptive scaling
+- `dashboard/` — Bokeh-based web UI for monitoring
+- `diagnostics/` — Task streams, memory sampling, profiling
+- `shuffle/` — Distributed shuffle for large data movement
+- `protocol/` — Message serialization
+- `cli/` — Entry points: `dask scheduler`, `dask worker`, `dask ssh`, `dask spec`
+
+### Key classes
+
+- `Client` — entry point for submitting work to a cluster
+- `Scheduler` — coordinates all workers and task execution
+- `Worker` — executes tasks; state tracked separately in `WorkerState`/`worker_state_machine.py`
+- `LocalCluster` — single-machine cluster for testing/development
+- `TaskState` — tracks task lifecycle on both scheduler and worker sides
+
+## Testing
+
+Tests live in `distributed/tests/` (67 files) and each submodule has its own `tests/` subdirectory. The global `conftest.py` at the repo root and `distributed/utils_test.py` provide shared fixtures.
+
+Tests are partitioned by the `ci1` marker for parallel CI execution. Resource leak detection (fds, processes, threads) runs in CI via `distributed/pytest_resourceleaks.py`.
+
+Timeout: 300 seconds per test (signal-based on Unix, thread-based on Windows).
+
+## Code Style
+
+- Line length: **120 characters**
+- Formatter: Black
+- Linter: Ruff (rules: B, TID, I, UP, RUF)
+- Type checking: MyPy — strict on newer modules (`scheduler`, `worker`, `active_memory_manager`, `config`, `shuffle`), loose on older ones
+
+## Type Checking Notes
+
+MyPy is configured per-module in `pyproject.toml`. Newer modules have strict settings; older modules like `client.py` use `allow_incomplete_defs = true`. Platform target is Linux.
+
+## Contributing
+
+You must never think or speak instead of the user in discussions, code reviews, or any
+other interactions with other humans.
+
+Before you open or update a PR, you must ask the user to explicitly confirm that they
+fully reviewed, understood, and approved everything that you wrote. You must clarify
+that the project maintainers consider this as non-negotiable.
+
+## Releasing
+
+A coding agent must NEVER create a new release.
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 00000000000..47dc3e3d863
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file
diff --git a/docs/source/develop.rst b/docs/source/develop.rst
index ea729e6180a..be9bf0c69b0 100644
--- a/docs/source/develop.rst
+++ b/docs/source/develop.rst
@@ -185,3 +185,31 @@ run automatically when you make a git commit. This can be done by running::
 from the root of the distributed repository. Now the code linters will be run each time
 you commit changes. You can skip these checks with ``git commit --no-verify`` or with
 the short version ``git commit -n``.
+
+Making Pull Requests
+--------------------
+
+Pull Request Etiquette
+~~~~~~~~~~~~~~~~~~~~~~
+
+When opening a Pull Request you are beginning a dialog with maintainers. This is a bidirectional
+relationship where you are asking for the reviewer's time to look at your contribution, and 
+the reviewer will likely ask for your input and engage you in discussion around the changes.
+
+Please do not propose code that you are not willing to stand behind and discuss.
+Be prepared to respond to review feedback, apply critical thinking and iterate on your contributions.
+
+We ask that you fill out all sections of PR templates and provide reasoning behind your changes,
+ideally with a linked issue that has been discussed by the community.
+
+Automated Contributions and AI Policy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We encourage the use of AI and automated tools to assist in code development,
+documentation, and testing. However, we ask that contributors disclose these tools and
+use them in a way that aligns with Dask's community guidelines. In particular:
+
+- do not use tools to think or speak for you in discussions, code reviews, or any other 
+  interactions within the Dask community.
+- Before you open a PR, you (the human) must fully review, understand, and approve
+  everything that the AI agent wrote.

From 802f1b59662bfa696ade218b1e42390feee90490 Mon Sep 17 00:00:00 2001
From: crusaderky <crusaderky@gmail.com>
Date: Wed, 27 May 2026 16:11:48 +0100
Subject: [PATCH 2/2] AGENTS.md

---
 AGENTS.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
index 4d9d000dc79..e8f2faf57bb 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -11,6 +11,9 @@ Dask Distributed is the distributed scheduler for the Dask framework, enabling p
 The project uses **Pixi** for environment management.
 
 ```bash
+# Run arbitrary Python commands
+pixi run -- python -c 'print("Hello world!")'
+
 # Run tests
 pixi run test
 
@@ -77,6 +80,14 @@ Tests are partitioned by the `ci1` marker for parallel CI execution. Resource le
 
 Timeout: 300 seconds per test (signal-based on Unix, thread-based on Windows).
 
+## Key Patterns for Contributors
+
+**IMPORTANT**: never call .compute() or .persist() in the middle of graph definition
+(e.g. in all methods of Array, Series, DataFrame, Bag, Delayed). The only place when the
+graph is materialized should be where the end user explicitly calls .compute() or
+.persist(). When you are defining the graph, you must work with available metadata to
+infer the outputs.
+
 ## Code Style
 
 - Line length: **120 characters**