Stream-and-free chunk writes in the multi-chunk worker (don't accumulate all K)

🤖 *from Claude*

## Problem
The multi-chunk-per-worker writer (#30 item 3, landed in #84) **accumulates all K chunk outputs in memory before any are written.** `process_shard` (`src/zagg/processing/worker.py`) reads the shard's granules once, then loops `iter_chunks` appending each chunk's `(block_index, carrier, ragged)` to the `chunk_results` sink (`worker.py:344`) and returns; the **write happens afterward** in a separate loop (runner `_process_and_write`; `deployment/aws/lambda_handler.py:280-283`). So peak memory holds **all K carriers + all K ragged payloads at once**, on top of the pooled shard reads. For K>1 the worker could instead write each chunk and free it, capping the output-side footprint at one chunk.

## Scope — what this does NOT fix
This only helps **K>1**. At **K=1** (shard == chunk; e.g. `parent_order: 13` with `chunk_inner` dropped) there is exactly one chunk, so there's nothing to accumulate — the change is a **no-op at K=1**. It also does **not** touch the **read-phase** memory (pooled photons + the **#66** h5coro per-granule cache leak), which is the dominant OOM driver for dense ATL03 AOIs regardless of K. So this is an efficiency win for the multi-chunk path, **complementary to #66, not a substitute** — the current order-13 OOMs are #66, not this.

## Proposed fix
Invert the compute/write boundary so the worker streams: compute a chunk → write it → drop its refs → next, instead of materializing all K.
- Add a per-chunk write-callback seam to `process_shard`: `process_shard(..., write_chunk: Callable | None = None)`. When provided, after computing each chunk's `(block_index, carrier, ragged)`, call `write_chunk(...)` immediately and do **not** append to `chunk_results` (drop the locals so they're collectible). When `None`, keep today's `chunk_results`-append behavior for back-compat (the 2-tuple return + existing tests).
- Rewire the consumers to pass the callback (their existing per-chunk write-loop body): runner `_process_and_write`, and `deployment/aws/lambda_handler.py` (re-touches the handler → needs the §1 "named" authorization, as in #84 phase 7).
- Free each `carrier`/`ragged` after its write.

## Phases
1. `write_chunk` callback seam in `process_shard` + free-after-write; `chunk_results` path preserved when callback is `None`. Tests: with a callback, `chunk_results` stays empty and each chunk is written-then-dropped; K=1 byte-identical; multi-chunk output identical to today.
2. Rewire runner `_process_and_write` to the callback.
3. Rewire `lambda_handler.py` to the callback (authorized edit).

## Acceptance
- For K>1, peak output-side memory holds ~1 chunk instead of K (assert `chunk_results` is not accumulated when streaming).
- Output byte-identical (K=1 and K>1).
- No behavior change at K=1.

## Relation / notes
- **Complementary to #66** (h5coro per-granule cache eviction) — #66 is the actual driver of the current ATL03 OOMs and the priority; this is a separate K>1 efficiency win.
- Builds on #30 item 3 / #84 (the multi-chunk writer).
- Worker-side code → requires a Lambda **function redeploy** to take effect on Lambda.

_Filed as a tracking issue (`enhancement`); add `implement` to queue it. For the current OOM, #66 is the fix._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream-and-free chunk writes in the multi-chunk worker (don't accumulate all K) #91

Problem

Scope — what this does NOT fix

Proposed fix

Phases

Acceptance

Relation / notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Stream-and-free chunk writes in the multi-chunk worker (don't accumulate all K) #91

Description

Problem

Scope — what this does NOT fix

Proposed fix

Phases

Acceptance

Relation / notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions