workload-replay: add tests for the workload anonymizer#36749
Draft
jasonhernandez wants to merge 2 commits into
Draft
workload-replay: add tests for the workload anonymizer#36749jasonhernandez wants to merge 2 commits into
jasonhernandez wants to merge 2 commits into
Conversation
The anonymizer had no automated coverage despite being a privacy tool with a heuristic core. Add a pytest module, colocated as `mz_workload_anonymize_test.py` so the existing `pytest --doctest-modules misc/python` CI step collects it. Coverage: - end-to-end anonymization of a structurally complete workload: identifiers scrubbed, anonymized names present; - regression tests for the connection/sink/source DDL literal leak (hosts, users, topics) and column defaults; - query string-literal redaction; - cluster SIZE preserved (non-sensitive config); - the no-output-target error and --in-place overwrite; - --no-literals keeps literals while still anonymizing identifiers; - verify_anonymized catches surviving identifiers and literals, accepts both the '<REDACTED>' and 'literal_N' placeholders, and exempts cluster literals; - redact_literals_via_parser returns None (fallback signal) without the binary, and the regex fallback warns. Most tests force the regex fallback so they are deterministic regardless of whether the mz-sql-anonymize helper is built. One test exercises the parser-backed path (numeric literal redaction) and is skipped when the binary is absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24cff59 to
6bde0d6
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Third in the stack — base
workload-anonymize-parser(#36746), which is itself stacked on #36745. Merge those first.Motivation
The anonymizer (
mz_workload_anonymize.py) had zero automated tests despite being a privacy tool with a heuristic core — the worst combination. Everything in #36745 and #36746 was verified by hand. This codifies those checks as regression tests.What's covered
A new
misc/python/materialize/cli/mz_workload_anonymize_test.py(15 tests), colocated so the existingpytest --doctest-modules misc/pythonCI step picks it up with no pipeline changes:--in-placeoverwrite.--no-literalskeeps literals while still anonymizing identifiers.verify_anonymized: catches surviving identifiers and literals, accepts both'<REDACTED>'and'literal_N'placeholders, and exempts cluster literals.redact_literals_via_parserreturnsNone(fallback signal) without the binary; the regex fallback warns.Most tests force the regex fallback (monkeypatch
_locate_redactor) so they're deterministic regardless of whether themz-sql-anonymizehelper is built. One test exercises the parser path (numeric-literal redaction) and isskipif'd when the binary is absent — so CI (no binary) runs 14 and skips 1; locally all 15 run.Testing
pytest);bin/fmtandruff checkclean.*_test.pydiscovery.🤖 Generated with Claude Code