diff --git a/crates/beava-server/tests/phase12_6_mio_only_dataplane.rs b/crates/beava-server/tests/phase12_6_mio_only_dataplane.rs index 07a3346b..20f9ba40 100644 --- a/crates/beava-server/tests/phase12_6_mio_only_dataplane.rs +++ b/crates/beava-server/tests/phase12_6_mio_only_dataplane.rs @@ -11,7 +11,8 @@ //! `crates/beava-server/src/recovery.rs::replay_handrolled_wal_dir` plus //! `replay_wal_from_lsn` (cold-path WAL replay on boot). Any third caller is //! an architectural regression per `project_phase18_no_dual_runtime` + -//! `project_redis_shaped_no_event_time_ever`. +//! `project_redis_shaped_no_event_time_ever` +//! (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing). //! //! **2. axum is restricted to the admin sidecar.** //! `axum::Router`, `axum::Json`, `axum::Extension`, `axum::extract::*`, @@ -145,7 +146,8 @@ fn only_apply_shard_and_recovery_call_apply_event_to_aggregations() { violations.is_empty(), "Architectural regression: only `apply_shard.rs::dispatch_push_sync` (mio data plane) \ and `recovery.rs::replay_*` (WAL replay) may call `apply_event_to_aggregations`. \ - Per `project_phase18_no_dual_runtime` + `project_redis_shaped_no_event_time_ever`, \ + Per `project_phase18_no_dual_runtime` + `project_redis_shaped_no_event_time_ever` \ + (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing), \ the mio event loop is the SOLE data-plane runtime. Any third caller is a regression.\n\ Violations:\n{}", violations.join("\n") diff --git a/docs/operators/buffer-geo/histogram.md b/docs/operators/buffer-geo/histogram.md new file mode 100644 index 00000000..c93718e0 --- /dev/null +++ b/docs/operators/buffer-geo/histogram.md @@ -0,0 +1,158 @@ +# bv.histogram + +> Fixed-bucket count histogram of a numeric field. `buckets` is a required register-time kwarg per [V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md). + +## Signature + +```python +bv.histogram( + field: str, + *, + buckets: list[float], # REQUIRED — register-time kwarg + where: bv.Col | None = None, +) -> AggDescriptor +``` + +## Description + +`bv.histogram` returns a count per fixed numeric bucket of `field` across +events that match the optional `where=` predicate. `buckets` is a strictly +increasing list of split points; the cells are `(-inf, b[0])`, `[b[0], b[1])`, +…, `[b[n-1], +inf)`. For `buckets=[10, 20, 50]` you get four cells with +labels `"<10"`, `"10-20"`, `"20-50"`, `">=50"`. Use it for "transaction +amount distribution by tier" or "request size in p50 / p90 / p99 buckets" — +features where you want a coarse shape, not a quantile estimate. + +`buckets` is a **required keyword argument** per +[V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md): the lifetime-aggregation +memory contract requires every unbounded-by-default operator to declare a +finite per-entity ceiling at register time. `bv.histogram`'s ceiling is +exactly `len(buckets) + 1` `u64` counters per entity. The register-time +JSON-prelude shim (`pre_check_unbounded_op_in_lifetime_mode`) rejects any +`histogram` payload missing `buckets` (or with an empty `buckets` array) +with the structured error code `unbounded_op_in_lifetime_mode`. There is +no fallback default — picking the bucket edges is a deliberate +capacity-planning + signal-design step. + +`bv.histogram` belongs to the **bounded-buffer** family. Per-event update +is a linear scan over `buckets` (≤ ~20 in practice) plus a saturating +counter increment — Tier 1 floor (~10 ns / ~30 ns measured). The query +path materializes a `BTreeMap` of labelled counts and is therefore listed +under Tier 3 in [cost-class.md](../cost-class.md); the apply-path cost +remains Tier 1. There is no `window=` kwarg in v0 — `bv.histogram` is +**lifetime-only**. For "amount histogram for the last 24 h", compose with +`@bv.event(cold_after=...)` or use [`bv.quantile`](../sketch/quantile.md) +for a quantile sketch instead. + +## Parameters + +| Name | Type | Required | Default | Description | +|------|------|----------|---------|-------------| +| `field` | `str` | Yes | — | Name of the numeric field to bucket (`f64` / `i64`). | +| `buckets` | `list[float]` | **Yes** | — | Strictly increasing list of split points. `n` values produce `n + 1` cells. Caps per-entity memory at `n + 1` `u64` counters per [V0-MEM-GOV-02 BoundedByRequiredKwarg("buckets")](../../../.planning/REQUIREMENTS.md). | +| `where` | `bv.Col` | No | `None` | Boolean expression on event fields; only matching events update the counters. | + +## Returns + +A `dict[str, int]` keyed by bucket label (e.g. `"<10"`, `"10-20"`, +`">=50"`) with `i64` count values. Wire form is `Value::Map` with +`BTreeMap`-sorted iteration. When the entity has seen zero matching +events, the result is the dict with all counters at `0`. + +## Complexity + +| Resource | Bound | +|----------|-------| +| CPU per event | **Tier 1** (~10 ns floor / ~30 ns measured — linear scan over ≤ ~20 buckets + saturating add) — see [cost-class.md](../cost-class.md#tier-1-fast-40-nscall--38-ops) | +| Query | **Tier 3** allocates a `BTreeMap` of `len(buckets) + 1` entries on each `app.get(...)` — see [cost-class.md](../cost-class.md#tier-3-algorithmic-floor-100-300-nscall--9-ops) — apply-thread cost is Tier 1; query-time cost is the asymmetry to flag when profiling | +| Memory per entity | **`BoundedByRequiredKwarg("buckets")`** — `(len(buckets) + 1) × 8` bytes per [Phase 12.8 V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md) | +| Lifetime mode | **Required** — `bv.histogram` has no `window=` kwarg in v0; lifetime is the only mode | + +## Examples + +### Example 1: Transaction-amount histogram per user + +```python +import beava as bv + +@bv.event +class Txn: + user_id: str + amount: float + +@bv.table(key="user_id") +def UserAmountHistogram(txn) -> bv.Table: + return ( + txn.group_by("user_id") + .agg(amount_hist=bv.histogram("amount", + buckets=[10.0, 50.0, 100.0, 500.0])) + ) + +# Push events +for amt in [5.0, 12.0, 25.0, 80.0, 200.0, 750.0]: + app.push("Txn", {"user_id": "alice", "amount": amt}) + +# Query +result = app.get("UserAmountHistogram", "alice") +# result == {"amount_hist": {"<10": 1, "10-50": 2, "50-100": 1, "100-500": 1, ">=500": 1}} +``` + +### Example 2: Successful-only request size histogram + +```python +@bv.table(key="endpoint") +def EndpointReqSizeHist(reqs) -> bv.Table: + return ( + reqs.group_by("endpoint") + .agg(req_size_hist=bv.histogram("size_bytes", + buckets=[1024, 65536, 1048576], + where=bv.col("status") == 200)) + ) +``` + +## Wire + +JSON wire form in a register payload: + +```json +{ + "kind": "derivation", + "name": "UserAmountHistogram", + "output_kind": "table", + "key": ["user_id"], + "agg": { + "amount_hist": { + "op": "histogram", + "params": { + "field": "amount", + "buckets": [10.0, 50.0, 100.0, 500.0] + } + } + } +} +``` + +See [examples/wire/register-fraud-team.request.json](../../../examples/wire/register-fraud-team.request.json) for a full payload example. + +## Edge cases + +- **`buckets` missing or empty at register time:** rejected with structured error code `unbounded_op_in_lifetime_mode` per [V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md). The JSON-prelude shim catches this before any state is allocated. +- **`buckets` not strictly increasing:** rejected with `aggregation_invalid_param` at register time. +- **Empty stream / cold-start:** all counters are `0`; the result dict is the full label set with zero values. +- **`window=` kwarg attempted:** raises `TypeError` at SDK-helper-call time. v0 has no windowed histogram; use `@bv.event(cold_after=...)` to scope the lifetime via cold-entity eviction, or [`bv.quantile`](../sketch/quantile.md) for a windowed shape estimate. +- **Non-numeric source field (`Value::Str`, `Value::Bool`):** event silently dropped (not bucketed). For categorical histograms use [`bv.event_type_mix`](./event_type_mix.md). +- **NaN inputs:** dropped — `numeric_from_row` returns `None` on non-`F64`/`I64` payloads. For cleaner semantics filter with `where=~bv.col(field).isnull()`. +- **Bucket-label format:** integer-valued edges render without trailing `.0` (`"<10"` not `"<10.0"`); fractional edges keep their decimal form. Stable across versions — Python SDKs can dict-key on the labels. +- **Counter overflow:** the per-bucket `u64` saturates at `2^64 - 1` (impossible in practice for a single entity). +- **Out-of-order event-time:** **does not matter.** beava is processing-time-only per [`project_redis_shaped_no_event_time_ever`](../../../.planning/PROJECT.md) (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing); buckets are populated in arrival order. +- **Lifetime mode:** **the only mode.** Per-entity ceiling is `(len(buckets) + 1) × 8` bytes per [V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md) BoundedByRequiredKwarg("buckets"). + +## See also + +- [cost-class.md](../cost-class.md) — performance tier (Tier 1 update / Tier 3 query) +- [bv.quantile](../sketch/quantile.md) — quantile-estimate companion (when you want p50/p90/p99 instead of fixed buckets) +- [bv.event_type_mix](./event_type_mix.md) — categorical-distribution companion (proportions per category, not counts per bucket) +- [bv.hour_of_day_histogram](./hour_of_day_histogram.md) — fixed 24-bin time-of-day histogram (no `buckets=` needed) +- [bv.dow_hour_histogram](./dow_hour_histogram.md) — fixed 168-bin day-of-week × hour histogram +- [V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md) — BoundedByRequiredKwarg memory governance contract +- [pipeline-dsl/compilation-rules.md](../../pipeline-dsl/compilation-rules.md) — chain compilation rules diff --git a/docs/operators/decay/index.md b/docs/operators/decay/index.md new file mode 100644 index 00000000..4d91c674 --- /dev/null +++ b/docs/operators/decay/index.md @@ -0,0 +1,43 @@ +# Decay-Family Aggregation Operators + +The 6 decay-family ops cover **exponentially-weighted statistics** (EWMA / EWVar / EW Z-Score), **forward-decay accumulators** (Cormode 2009), and **time-weighted averaging**. Five of the six use a `half_life` parameter to set an exponential decay rate; `bv.twa` uses `window` instead because it integrates held-time exactly rather than fading. + +| Op | Required kwarg | Returns | CPU tier | Notes | +|----|----------------|---------|----------|-------| +| [`bv.ewma`](./ewma.md) (alias `bv.ema`) | `half_life` | `f64` or `null` | Tier 1 | Exponentially-weighted mean. | +| [`bv.ewvar`](./ewvar.md) | `half_life` | `f64` or `null` | Tier 1 | Exponentially-weighted variance — companion second-moment to EWMA. | +| [`bv.ew_zscore`](./ew_zscore.md) | `half_life` | `f64` or `null` | Tier 1 | Current-event z-score against EWMA / EWVar baseline; the standard drift-aware anomaly primitive. | +| [`bv.decayed_sum`](./decayed_sum.md) | `half_life` | `f64` or `null` | Tier 1 | Cormode forward-decay sum — recency-weighted total that converges to a stable steady-state. | +| [`bv.decayed_count`](./decayed_count.md) | `half_life` | `f64` or `null` | Tier 1 | Same primitive without `field` — answers "how active recently?". The cheapest decay op. | +| [`bv.twa`](./twa.md) | `window` | `f64` or `null` | Tier 1 | Time-weighted average for irregularly-sampled gauge fields. | + +All six are `O(1)` memory per entity and Tier 1 CPU per [cost-class.md](../cost-class.md). The five EW-decay ops share a state shape of `(value, last_now_ms, initialized)` plus per-op extras; `bv.twa` carries `(sum_v_dt, sum_dt, last_v, last_t, initialized)`. + +## Key invariants + +- **Server processing-time only.** Decay coefficients use `Δt = now_ms() at this matching event - now_ms() at the previous matching event` per [`project_redis_shaped_no_event_time_ever`](../../../.planning/PROJECT.md) (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing). Producers cannot influence decay via payload fields. Late events (`Δt ≤ 0`) fall back to an unweighted blend and do **not** advance `last_now_ms`. +- **`half_life` is mandatory and finite for the EW family.** `bv.ewma`, `bv.ewvar`, `bv.ew_zscore`, `bv.decayed_sum`, `bv.decayed_count` all reject `"forever"` at SDK-helper-call time (regex `[1-9]\d*(?:ms|s|m|h|d)$`); use the corresponding lifetime ops ([`bv.first`](../point-ordinal/first.md), [`bv.var`](../core/var.md), [`bv.z_score`](../velocity/z_score.md), [`bv.sum`](../core/sum.md) `window="forever"`, [`bv.count`](../core/count.md) `window="forever"`) when an undecayed lifetime reading is what you want. +- **`bv.twa` accepts `window="forever"`.** Time-weighted average integrates held-time exactly, so the lifetime form is well-defined; per-op classification at `crates/beava-core/src/register_validate.rs` (~line 436) classifies all six ops as `O(1)` lifetime-bound (`OpLifetimeBound::O1`). +- **Reads do not decay forward.** `app.get(...)` returns the running statistic as of the **last matching event** — the helper does not re-decay the value to query time. (Re-decaying on read would mutate state on every `get`, breaking idempotence.) +- **Cold-start returns `null`** for all six ops. `bv.ewvar` and `bv.ew_zscore` additionally return `null` after only one matching event (variance is `0`; no spread to normalize against). +- **Cold-entity eviction (`@bv.event(cold_after=...)`)** drops the underlying state per the Redis-TTL pattern (V0-MEM-GOV-01); decay ops rebuild fresh on the next post-eviction matching event. + +## When to use which + +- **Smoothed running mean** that adapts to drift → [`bv.ewma`](./ewma.md). Pick `half_life` ≈ the timescale of the behaviour you care about. +- **Smoothed running variance** for anomaly scoring or volatility tracking → [`bv.ewvar`](./ewvar.md), usually paired with [`bv.ew_zscore`](./ew_zscore.md). +- **Recency-weighted total** (e.g. "spend in roughly the last hour") → [`bv.decayed_sum`](./decayed_sum.md). Steady-state ≈ `rate * value * half_life / ln(2)`. +- **Recency-weighted activity rate** → [`bv.decayed_count`](./decayed_count.md). Steady-state ≈ `rate * half_life / ln(2)`. +- **True time-weighted average** for gauges sampled at irregular intervals → [`bv.twa`](./twa.md). + +> Note: `bv.rate_of_change` is **not** in the decay family — it lives under [velocity/rate_of_change.md](../velocity/rate_of_change.md) per the Phase 9 op classification (it computes a slope across two adjacent windows, not an exponentially-weighted statistic). Polished by [Plan 13.0-09](../../../.planning/phases/13.0-design-contract-spec-docs/). + +## See also + +- [Operator catalog index](../index.md) — full 53-op catalogue (decay is the 6-op family) +- [cost-class.md](../cost-class.md) — per-op CPU tier metadata (all six decay ops are Tier 1) +- [Velocity family](../velocity/) — sibling family for slope-style and inter-arrival statistics, including `rate_of_change` +- [Core family](../core/) — fixed-window arithmetic mean / variance / sum / count counterparts +- [shared.md window grammar](../../sdk-api/shared.md) — duration-string format (`\d+(ms\|s\|m\|h\|d)` and the `"forever"` literal) +- Per-operator memory governance: [V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md) — every lifetime aggregation operator declares a finite per-entity memory ceiling at register-time +- [Pipeline DSL compilation rules](../../pipeline-dsl/compilation-rules.md) — how `bv.(...)` calls compile to JSON wire form diff --git a/docs/operators/recency/index.md b/docs/operators/recency/index.md new file mode 100644 index 00000000..2e1c12b0 --- /dev/null +++ b/docs/operators/recency/index.md @@ -0,0 +1,34 @@ +# Recency Aggregation Operators + +The 10 recency ops cover **time-since semantics** (how long since first / most-recent match), **windowed-recency booleans** (was the last match within window N?), and **streak counters** (consecutive matches and non-matches). All recency ops use **server processing-time** (`now_ms()` at the apply path) per [`project_redis_shaped_no_event_time_ever`](../../../.planning/PROJECT.md) — beava intentionally has no event-time concept (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing). + +| Op | Returns | Time source | Notes | +|----|---------|-------------|-------| +| [`bv.first_seen`](./first_seen.md) | `Datetime` (i64 ms) | server `now_ms()` at apply | First match's server arrival timestamp | +| [`bv.last_seen`](./last_seen.md) | `Datetime` (i64 ms) | server `now_ms()` at apply | Most recent match's server arrival timestamp | +| [`bv.age`](./age.md) | `i64` ms | computed at **read time** | `now_ms() - first_seen` | +| [`bv.has_seen`](./has_seen.md) | `bool` | n/a | Cumulative ever-matched flag | +| [`bv.time_since`](./time_since.md) | `i64` ms or `null` | computed at **read time** | `now_ms() - last_seen` | +| [`bv.time_since_last_n`](./time_since_last_n.md) | `i64` ms or `null` | computed at **read time**; `n` required | Generalization: ms since the kth most recent match | +| [`bv.streak`](./streak.md) | `i64` | event arrival order | Live consecutive-match counter | +| [`bv.max_streak`](./max_streak.md) | `i64` | event arrival order | All-time max of `streak`; never decreases | +| [`bv.negative_streak`](./negative_streak.md) | `i64` | event arrival order | Live consecutive-non-match counter (mirror of `streak`) | +| [`bv.first_seen_in_window`](./first_seen_in_window.md) | `bool` | `now_ms() - last_ms < window` at read time; `window=` required | "Was the last match within the last N ms?" | + +## Key invariants + +- **Server processing-time only.** Per [`project_redis_shaped_no_event_time_ever`](../../../.planning/PROJECT.md) (locked 2026-04-30) (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing), beava records server `now_ms()` at apply for `first_seen` / `last_seen`, and computes elapsed-ms using server `now_ms()` at read for `age` / `time_since` / `time_since_last_n` / `first_seen_in_window`. Producers cannot influence captured timestamps via payload fields for recency operators, though v0.1 supports event-time bucketing via a reserved `_ts` field. The recency operators themselves continue to use server processing-time for their timestamp semantics. +- **Read-time computation.** `age`, `time_since`, `time_since_last_n`, and `first_seen_in_window` change between reads **without any new events** — the right-hand side of the elapsed calculation is captured at query time, not apply time. This makes them useful staleness/recency features. +- **`bv.time_since_last_n` requires `n`.** Per [V0-MEM-GOV-02 BoundedByRequiredKwarg("n")](../../../.planning/REQUIREMENTS.md), the deque of timestamps must have a register-time ceiling. Missing `n` is rejected by the JSON-prelude shim with code `unbounded_op_in_lifetime_mode`. +- **`bv.first_seen_in_window` requires `window`.** The windowed-recency boolean is meaningless without a horizon length. The `window` parameter is enforced at register time; `"forever"` is rejected (use [`bv.has_seen`](./has_seen.md) for that semantic). +- **Cold-start behavior is per-op.** Streaks return `0`. Booleans (`has_seen`, `first_seen_in_window`) return `false`. Datetime/duration ops (`first_seen`, `last_seen`, `age`, `time_since`, `time_since_last_n`) return `null`. +- **Cold-entity eviction (`@bv.event(cold_after=...)`)** drops the underlying state per the Redis-TTL pattern (V0-MEM-GOV-01); recency state rebuilds from the next post-eviction event. +- **9 of 10 ops share `SeenState`** (`first_seen`, `last_seen`, `age`, `has_seen`, `time_since`) or related `StreakState` (`streak`, `max_streak`) / `NegativeStreakState` / `FirstSeenInWindowState` — registering several siblings on the same `where=` predicate costs roughly the same as registering one. + +## See also + +- [Operator catalog index](../index.md) — full 53-op catalogue +- [cost-class.md](../cost-class.md) — per-op CPU tier metadata (all recency ops are Tier 1) +- [Point/ordinal family](../point-ordinal/) — value-based first/last/N counterparts to the timestamp ops here +- Per-operator memory governance: [V0-MEM-GOV-02](../../../.planning/REQUIREMENTS.md) — every lifetime aggregation operator declares a finite per-entity memory ceiling at register-time +- [Pipeline DSL compilation rules](../../pipeline-dsl/compilation-rules.md) — how `bv.(...)` calls compile to JSON wire form diff --git a/docs/pipeline-dsl/compilation-rules.md b/docs/pipeline-dsl/compilation-rules.md index 3163e101..5733dd74 100644 --- a/docs/pipeline-dsl/compilation-rules.md +++ b/docs/pipeline-dsl/compilation-rules.md @@ -715,9 +715,9 @@ fixture (ALLOWED) or a structured error code (FORBIDDEN). | `e.with_columns(...) AFTER e.group_by(...)` | FORBIDDEN | `group_by` returns `GroupBy`; `with_columns` is not on the `GroupBy` interface. | `AttributeError` (Python); compile-time `TypeError` (TS); compile error (Go) | | `e.group_by("a").group_by("b")` | FORBIDDEN | `GroupBy` does not expose `.group_by()`; nested grouping is unsupported. | `AttributeError` (Python); compile-time `TypeError` (TS) | | `e.group_by("a").filter(...)` | FORBIDDEN | `GroupBy` does not expose stateless ops. Filter BEFORE the `group_by`. | `AttributeError` (Python); compile-time `TypeError` (TS) | -| Cross-event aggregation (`bv.sum(other_event.col("x"))` etc.) | FORBIDDEN per `project_redis_shaped_no_event_time_ever` | beava is Redis-shaped, processing-time only — no cross-stream joins ever. | `RegistrationError(code="joins_not_supported")` | -| `event_time` field on `@bv.event` | FORBIDDEN per `project_redis_shaped_no_event_time_ever` | Server-side `now_ms()` is the only time source; client-supplied event time is killed permanently. | `TypeError` at decoration time; `RegistrationError(code="event_time_not_supported_in_v0")` if it reaches the wire | -| `event_time_field=` / `tolerate_delay=` kwargs on `@bv.event` | FORBIDDEN per same lock | Same as above. | `TypeError` at decoration time | +| Cross-event aggregation (`bv.sum(other_event.col("x"))` etc.) | FORBIDDEN per `project_redis_shaped_no_event_time_ever` (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing) | beava is Redis-shaped, processing-time only — no cross-stream joins ever. ADR-004 introduces pragmatic event-time *bucketing* via reserved `_ts` field for v0.1, but does NOT change the FORBIDDEN status of cross-event joins. Watermarks, out-of-order buffering, and stream-stream joins remain OUT of scope. | `RegistrationError(code="joins_not_supported")` | +| `event_time` field on `@bv.event` | FORBIDDEN per `project_redis_shaped_no_event_time_ever` (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing) | Server-side `now_ms()` is the only time source for v0. ADR-004 introduces pragmatic event-time *bucketing* via reserved `_ts` field for v0.1, but does NOT change the FORBIDDEN status of user-defined `event_time` fields on `@bv.event`. Watermarks and out-of-order buffering remain OUT of scope. | `TypeError` at decoration time; `RegistrationError(code="event_time_not_supported_in_v0")` if it reaches the wire | +| `event_time_field=` / `tolerate_delay=` kwargs on `@bv.event` | FORBIDDEN per same lock (post-2026-05-19: partially overturned by ADR-004 for v0.1 bucketing) | Same as above. | `TypeError` at decoration time | | `bv.col("x") + 5` arithmetic in `where=` predicates | ALLOWED | Compiles to expr-string via `_BinOp.to_expr_string()`. | (no fixture; standard expression) | | Predicates nested deeper than 512 levels (e.g., 600-deep `&` chain) | FORBIDDEN — enforced at register time | Parse-time depth guard (`MAX_PARSE_DEPTH = 512`) mirrors the runtime evaluator's cap to prevent silent Null returns. | `RegistrationError(code="invalid_expression")` or `code="aggregation_invalid_where"` with reason mentioning "depth" | | `bv.col("x").isnull()` | ALLOWED | Compiles to `(x == null)` per `_col.py::isnull()`. | (no fixture; standard expression) | diff --git a/docs/pipeline-dsl/overview.md b/docs/pipeline-dsl/overview.md index d8697ae3..d22bbe3f 100644 --- a/docs/pipeline-dsl/overview.md +++ b/docs/pipeline-dsl/overview.md @@ -101,7 +101,8 @@ class Txn: Field types come from the [shared.md § Field types](../sdk-api/shared.md#field-types) vocabulary: `str`, `i64` (Python `int`), `f64` (Python `float`), `bool`, `bytes`, `datetime`. `event_time` fields are **rejected at decoration time** -per `project_redis_shaped_no_event_time_ever` — beava is processing-time only; +per `project_redis_shaped_no_event_time_ever` (post-2026-05-19: partially +overturned by ADR-004 for v0.1 bucketing) — beava is processing-time only; the server stamps wall-clock arrival time on every push. ### Function form (derived event) @@ -242,14 +243,20 @@ surfaces are out of scope: state at the table layer (e.g., `AStats` reads `A`, `BStats` reads `B`, and `CStats` reads the shared source `Txn` to act as a union). - **Joins** (`event ↔ event`, `event ↔ table`, `table ↔ table`) — permanently - killed per `project_redis_shaped_no_event_time_ever`. Compose via push/get + killed per `project_redis_shaped_no_event_time_ever` (post-2026-05-19: + partially overturned by ADR-004 for v0.1 bucketing). Compose via push/get patterns + entity-key sharding instead. Returns alongside tables in v0.1+ if/when justified by demand. - **`bv.union`** — deferred with joins. - **Event-time / watermarks / `event_time_field` / `tolerate_delay`** — - permanently killed per the same architectural lock. The server stamps - wall-clock arrival time on every push; `agg_windowed` operators bucket on - that. + permanently killed per the same architectural lock (post-2026-05-19: + partially overturned by ADR-004 for v0.1 bucketing). ADR-004 introduces + pragmatic event-time bucketing via a reserved `_ts` field for v0.1, meaning + event-time is no longer "permanently killed" but is supported in a limited, + pragmatic form. However, the broader event-time features like watermarks, + out-of-order buffering, and stream-stream joins remain OUT of scope. The + server stamps wall-clock arrival time on every push; `agg_windowed` operators + bucket on that. - **Session windows** (`bv.session(gap_ms=..., inner=...)`) — out of v0 + v0.1 per `.planning/ideas/session-windows-v0.1.md`. - **Table mutation surface** (`app.upsert / app.delete / app.retract`) —