Skip to content

Treat trace/span ids as bytes; implement typed & comparison matchers (align with policy spec #49) #69

Description

@jaronoff97

Summary

Trace/span ids are currently handled as untyped []byte whose encoding (raw vs hex) is inferred by length at sampling time. This produces a real correctness bug in hash_seed sampling and forces upstream callers (the Vector codec) to hex-encode ids so string matching works. The policy spec PR usetero/policy#49 — feat: handle typing better now defines a typed contract for bytes/identifier fields (plus typed equals and numeric comparison matchers). This issue tracks bringing policy-go in line with that contract, which fixes the trace-id problem at the root.

Background: what the spec PR defines

PR #49 adds, in spec.md and proto/tero/policy/v1:

  • Declared field types. *_TRACE_ID, *_SPAN_ID, *_PARENT_SPAN_ID are bytes; everything else is string (see the new Type column on the LogField/TraceField/MetricField enum tables and the Bytes and Identifier Fields section).
  • A typed match value. New Value message (bool_value/int_value/double_value/bytes_value/hex_value) for equals, and NumericValue (int_value/double_value) for gt/gte/lt/lte. Value has no string variant; NumericValue deliberately can't hold bytes/bool.
  • Hex as the canonical human encoding for bytes, decoded to raw bytes once at policy-compile time:
    • well-known bytes field → a plain exact: "<hex>" is auto-coerced to bytes (the field's type is known);
    • arbitrary bytes attribute → equals: { hex_value: "<hex>" } or equals: { bytes_value: "<base64>" }.
  • Matching semantics on a bytes field: exact/equals decode the literal once and compare bytes == bytes; regex/starts_with/ends_with/contains operate on the canonical lowercase-hex rendering; numeric comparators don't apply.
  • Always coerce between a field's declared type and the literal; and for performance, policies should still include a string matcher so the multi-pattern engine can pre-filter before a typed comparison.

Current behavior in policy-go and why it's wrong

The accessor contract is a single untyped []byte for every field — field.go:393 (trace), :330 (log), :367 (metric) — so a trace id and a string body are indistinguishable to the engine. Getters just return the stored bytes (matchable.go:410-415).

Two subsystems then disagree about the encoding:

  • Matching (Hyperscan). engine.go:141 reads value := c.Value(record, key.Ref) and scans it (engine.go:155 db.Scan(value)) against patterns compiled from the policy literal. exact: "<hex>" compiles to a hex regex, so a match only fires if the subject bytes are the hex text — i.e. callers must hex-encode ids.
  • Sampling. Needs raw bytes: extractRandomnessFromTraceID (probabilistic_sampler.go:270-295) wants the low 56 bits of the 16-byte id, and getTraceRandomnessWithSeed (:253-266) FNV-hashes the id.

The conflict is papered over by guessing the encoding from length (probabilistic_sampler.go:270-285, if len(traceID) == 32 { /* hex */ } else if len >= 7 { /* binary */ }).

Concrete bug: hash_seed mode hashes whatever bytes it's given with no normalization (probabilistic_sampler.go:260-265, h.Write(traceID)). Because matching forces ids to be hex-encoded, the FNV hash is computed over the hex ASCII, not the raw 16 bytes — so policy-go reaches a different sampling decision for the same trace than any implementation that hashes raw bytes, defeating the cross-collector determinism that hash_seed exists to provide.

Required changes (roughly sequenced)

  • 1. Regenerate proto bindings. After chore(deps): update actions/stale action to v10.2.0 - autoclosed #49 is pushed to buf.build/tero/policy, run task proto:generate to pull the new Value, NumericValue, hex_value, and the equals/gt/gte/lt/lte oneof arms.
  • 2. Implement the new matchers in the compiler. extractMatchPattern / extractMetricMatchPattern / extractTraceMatchPattern (internal/engine/compiled.go) currently handle only exact/regex/exists/starts_with/ends_with/contains. Add equals (typed equality; int/double share one numeric domain) and gt/gte/lt/lte (numeric comparison; non-numeric field → no match). Per chore(deps): update actions/stale action to v10.2.0 - autoclosed #49, an unset Value/NumericValue is a compile error and an invalid hex_value is a compile error.
  • 3. Compile bytes literals to raw bytes. For a bytes-typed field, decode the exact hex string and equals hex_value/bytes_value to raw bytes at compile time and compare bytes == bytes. For string operators on a bytes field, compile against the lowercase-hex rendering and hex-encode the subject at scan time.
  • 4. Fix the accessor contract to raw bytes. Identifier getters (matchable.go:410-415, log at :35-38) must return raw OTLP bytes (16/8), and this must be documented as the contract for WithTraceValue/WithLogValue (field.go:421, :463). This is the coordinated, behavior-changing step — see downstream below.
  • 5. Fix the sampler. Once ids arrive as raw bytes, delete the length-guess in extractRandomnessFromTraceID (probabilistic_sampler.go:270-285) — take the last 7 raw bytes — and let getTraceRandomnessWithSeed (:253-266) FNV-hash the raw bytes. This is what actually fixes the hash_seed inconsistency, and it is only correct after step 4.

Coordinated downstream change

Today the Vector OTLP codec hex-encodes ids so string matching lines up. Once matching compiles to raw bytes (steps 3–4), that codec must stop hex-encoding and pass raw bytes — the two sides flip together. After this, the codec's hex-walk can be deleted and policy-go becomes self-contained: it accepts raw Matchable bytes and owns the human-facing hex encoding entirely.

Acceptance criteria

  • A policy with trace_field: TRACE_FIELD_TRACE_ID, exact: "<hex>" matches a span whose accessor returns the raw 16-byte trace id (case-insensitive on the hex literal).
  • equals: { hex_value } and equals: { bytes_value } for the same bytes are interchangeable and both match raw bytes.
  • hash_seed sampling over a given raw trace id produces the same decision as the OTel reference / collector for the same id and seed (regression test against a known vector).
  • No length-based encoding inference remains in probabilistic_sampler.go.
  • Conformance suite (usetero/policy-conformance) passes, including any new bytes/identifier cases.

Interim safety

Until steps 2–5 land, regenerating bindings alone is safe: unknown oneof arms (equals/hex_value/…) compile to "no match condition set", which under the existing error-handling rules makes the policy inert and reports a compile error (fail-open). Existing exact: "<hex-id>" policies keep working only because callers still hex-encode; the underlying ambiguity is unchanged until the steps above are complete.


Spec reference: usetero/policy#49 — see spec.md sections "Typed and Comparison Matching" and "Bytes and Identifier Fields", and proto/tero/policy/v1/shared.proto (Value, NumericValue).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions