Skip to content

M8 conformance baseline: 4 advisory failures — HARDENING.md fix work #1

@trendvidia

Description

@trendvidia

ROADMAP M8 just landed cross-port. The new spec at docs/HARDENING.md defines the decoder-safety contract every port must enforce against untrusted input; the adversarial corpus tests it; the 9-port matrix workflow runs it on every protowire PR.

Companion check_decode reference for this port: scripts/check_decode.py (commit 51df4a6).

The workflow ships in advisory mode (continue-on-error: true) so failures do not block PRs. This issue tracks driving this port's regressions to zero so we can flip the gate to required for python.

Current baseline — 4 advisory failures (1 crash)

Corpus Verdict Reason
pxf/deep-nesting-200.pxf FAIL_VERDICT accepted; missing MaxNestingDepth cap (heavy lifting in C++ FFI)
pxf/deep-nesting-1000.pxf FAIL_VERDICT same
pxf/deep-nesting-100000.pxf FAIL_CRASH C++ FFI stack overflow leaks through to Python
pxf/invalid-utf8-string.pxf FAIL_VERDICT C++ FFI's PXF decoder doesn't enforce UTF-8 on proto3 string

The Python port passes most corpus inputs because:

  • pb/deep-submessage-200.binpb rejected by google.protobuf (which has built-in recursion limits).
  • pb/length-prefix-truncated.binpb rejected by google.protobuf.
  • pxf/long-numeric.pxf rejected by Python's int(str) raising ValueError on the int64 parser path.
  • pxf/lone-surrogate.pxf rejected at the lexer (already validates ValidRune).

What to fix

The 4 PXF failures are all on the C++ FFI side (the _protowire/module.cc extension and its protowire-cpp dependencies). Two HARDENING.md invariants in protowire-cpp will fix them upstream:

  • §RecursionMaxNestingDepth = 100 cap on the C++ PXF parser/decoder. Tracked in [protowire-cpp's M8 issue]; once that lands, this port inherits the fix automatically. The 100k-deep crash converts to a clean error returned through the FFI.
  • §UTF-8 — strict UTF-8 enforcement on proto3 string populating in protowire-cpp's PXF decoder. Same upstream fix.

If the maintainers want a Python-side defense-in-depth in addition (e.g. setting sys.setrecursionlimit lower, or wrapping the FFI call in a stack-budget check), that's optional but not required to clear the corpus — the upstream C++ fixes are sufficient.

The pure-Python envelope.py decoder also has a couple of IndexError-vs-ValueError exception-type leaks the cross-port review flagged (envelope.py:131-132, 85-93); those aren't tested by the current corpus but are good hygiene fixes.

Reproduce locally

# In the venv with `pip install -e .` already done:
WALLCLOCK_SECONDS=10 bash ../protowire/scripts/cross_security_check.sh

# Or run a single corpus input:
python3 scripts/check_decode.py --format pxf \
  --schema adversarial.v1.Tree \
  --proto ../protowire/testdata/adversarial/adversarial.proto \
  --input ../protowire/testdata/adversarial/pxf/deep-nesting-100000.pxf
echo $?  # 0 = accepted, 1 = clean reject, 134/139 = crash via FFI

Convergence

When all 4 corpus inputs pass on this port, comment here. We'll flip continue-on-error: false for the python matrix entry in protowire/.github/workflows/security.yml. When all 9 ports converge, the workflow becomes a required check.

Cross-port context: 8 sibling per-port issues track the same convergence. The full cross-port matrix lives in protowire's ROADMAP § M8. Most of this port's findings are upstream of protowire-cpp — see that repo's M8 issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    m8ROADMAP M8 — HARDENING.md conformance corpus

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions