workload-replay: add tests for the workload anonymizer by jasonhernandez · Pull Request #36749 · MaterializeInc/materialize

jasonhernandez · 2026-05-27T16:30:29Z

Third in the stack — base workload-anonymize-parser (#36746), which is itself stacked on #36745. Merge those first.

Motivation

The anonymizer (mz_workload_anonymize.py) had zero automated tests despite being a privacy tool with a heuristic core — the worst combination. Everything in #36745 and #36746 was verified by hand. This codifies those checks as regression tests.

What's covered

A new misc/python/materialize/cli/mz_workload_anonymize_test.py (15 tests), colocated so the existing pytest --doctest-modules misc/python CI step picks it up with no pipeline changes:

End-to-end anonymization of a structurally complete workload — identifiers scrubbed, anonymized names present.
Leak regressions (the workload-replay: harden workload anonymization #36745 fixes): connection host/user, sink topic, and column-default literals are scrubbed.
Query string-literal redaction.
Cluster SIZE preserved (non-sensitive config replay needs).
The no-output-target error and --in-place overwrite.
--no-literals keeps literals while still anonymizing identifiers.
verify_anonymized: catches surviving identifiers and literals, accepts both '<REDACTED>' and 'literal_N' placeholders, and exempts cluster literals.
redact_literals_via_parser returns None (fallback signal) without the binary; the regex fallback warns.

Most tests force the regex fallback (monkeypatch _locate_redactor) so they're deterministic regardless of whether the mz-sql-anonymize helper is built. One test exercises the parser path (numeric-literal redaction) and is skipif'd when the binary is absent — so CI (no binary) runs 14 and skips 1; locally all 15 run.

Testing

All 15 pass locally (pytest); bin/fmt and ruff check clean.
Confirmed the file is collected by pytest's default *_test.py discovery.

🤖 Generated with Claude Code

The anonymizer had no automated coverage despite being a privacy tool with a heuristic core. Add a pytest module, colocated as `mz_workload_anonymize_test.py` so the existing `pytest --doctest-modules misc/python` CI step collects it. Coverage: - end-to-end anonymization of a structurally complete workload: identifiers scrubbed, anonymized names present; - regression tests for the connection/sink/source DDL literal leak (hosts, users, topics) and column defaults; - query string-literal redaction; - cluster SIZE preserved (non-sensitive config); - the no-output-target error and --in-place overwrite; - --no-literals keeps literals while still anonymizing identifiers; - verify_anonymized catches surviving identifiers and literals, accepts both the '<REDACTED>' and 'literal_N' placeholders, and exempts cluster literals; - redact_literals_via_parser returns None (fallback signal) without the binary, and the regex fallback warns. Most tests force the regex fallback so they are deterministic regardless of whether the mz-sql-anonymize helper is built. One test exercises the parser-backed path (numeric literal redaction) and is skipped when the binary is absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jasonhernandez force-pushed the workload-anonymize-tests branch from 24cff59 to 6bde0d6 Compare May 28, 2026 05:58

workload-replay tests: apply black formatting

80446b8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workload-replay: add tests for the workload anonymizer#36749

workload-replay: add tests for the workload anonymizer#36749
jasonhernandez wants to merge 2 commits into
workload-anonymize-parserfrom
workload-anonymize-tests

jasonhernandez commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jasonhernandez commented May 27, 2026

Motivation

What's covered

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant