M8 conformance baseline: 4 advisory failures — HARDENING.md fix work

ROADMAP M8 just landed cross-port. The new spec at [`docs/HARDENING.md`](https://github.com/trendvidia/protowire/blob/main/docs/HARDENING.md) defines the decoder-safety contract every port must enforce against untrusted input; the [adversarial corpus](https://github.com/trendvidia/protowire/tree/main/testdata/adversarial) tests it; the [9-port matrix workflow](https://github.com/trendvidia/protowire/blob/main/.github/workflows/security.yml) runs it on every `protowire` PR.

Companion `check_decode` reference for this port: [`scripts/check_decode.py`](https://github.com/trendvidia/protowire-python/blob/main/scripts/check_decode.py) (commit `51df4a6`).

The workflow ships in **advisory mode** (`continue-on-error: true`) so failures do not block PRs. This issue tracks driving this port's regressions to zero so we can flip the gate to required for `python`.

## Current baseline — 4 advisory failures (1 crash)

| Corpus | Verdict | Reason |
|---|---|---|
| `pxf/deep-nesting-200.pxf` | `FAIL_VERDICT` | accepted; missing `MaxNestingDepth` cap (heavy lifting in C++ FFI) |
| `pxf/deep-nesting-1000.pxf` | `FAIL_VERDICT` | same |
| `pxf/deep-nesting-100000.pxf` | **`FAIL_CRASH`** | C++ FFI stack overflow leaks through to Python |
| `pxf/invalid-utf8-string.pxf` | `FAIL_VERDICT` | C++ FFI's PXF decoder doesn't enforce UTF-8 on proto3 `string` |

The Python port passes most corpus inputs because:
- `pb/deep-submessage-200.binpb` rejected by `google.protobuf` (which has built-in recursion limits).
- `pb/length-prefix-truncated.binpb` rejected by `google.protobuf`.
- `pxf/long-numeric.pxf` rejected by Python's `int(str)` raising `ValueError` on the int64 parser path.
- `pxf/lone-surrogate.pxf` rejected at the lexer (already validates `ValidRune`).

## What to fix

The 4 PXF failures are all on the C++ FFI side (the `_protowire/module.cc` extension and its `protowire-cpp` dependencies). Two HARDENING.md invariants in protowire-cpp will fix them upstream:

- **§Recursion** — `MaxNestingDepth = 100` cap on the C++ PXF parser/decoder. Tracked in [protowire-cpp's M8 issue]; once that lands, this port inherits the fix automatically. The 100k-deep crash converts to a clean error returned through the FFI.
- **§UTF-8** — strict UTF-8 enforcement on proto3 `string` populating in protowire-cpp's PXF decoder. Same upstream fix.

If the maintainers want a Python-side defense-in-depth in addition (e.g. setting `sys.setrecursionlimit` lower, or wrapping the FFI call in a stack-budget check), that's optional but not required to clear the corpus — the upstream C++ fixes are sufficient.

The pure-Python `envelope.py` decoder also has a couple of `IndexError`-vs-`ValueError` exception-type leaks the cross-port review flagged (`envelope.py:131-132`, `85-93`); those aren't tested by the current corpus but are good hygiene fixes.

## Reproduce locally

```sh
# In the venv with `pip install -e .` already done:
WALLCLOCK_SECONDS=10 bash ../protowire/scripts/cross_security_check.sh

# Or run a single corpus input:
python3 scripts/check_decode.py --format pxf \
  --schema adversarial.v1.Tree \
  --proto ../protowire/testdata/adversarial/adversarial.proto \
  --input ../protowire/testdata/adversarial/pxf/deep-nesting-100000.pxf
echo $?  # 0 = accepted, 1 = clean reject, 134/139 = crash via FFI
```

## Convergence

When all 4 corpus inputs pass on this port, comment here. We'll flip `continue-on-error: false` for the `python` matrix entry in `protowire/.github/workflows/security.yml`. When all 9 ports converge, the workflow becomes a required check.

Cross-port context: 8 sibling per-port issues track the same convergence. The full cross-port matrix lives in `protowire`'s ROADMAP § M8. Most of this port's findings are upstream of `protowire-cpp` — see that repo's M8 issue.


Corpus	Verdict	Reason
`pxf/deep-nesting-200.pxf`	`FAIL_VERDICT`	accepted; missing `MaxNestingDepth` cap (heavy lifting in C++ FFI)
`pxf/deep-nesting-1000.pxf`	`FAIL_VERDICT`	same
`pxf/deep-nesting-100000.pxf`	`FAIL_CRASH`	C++ FFI stack overflow leaks through to Python
`pxf/invalid-utf8-string.pxf`	`FAIL_VERDICT`	C++ FFI's PXF decoder doesn't enforce UTF-8 on proto3 `string`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M8 conformance baseline: 4 advisory failures — HARDENING.md fix work #1

Current baseline — 4 advisory failures (1 crash)

What to fix

Reproduce locally

Convergence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

M8 conformance baseline: 4 advisory failures — HARDENING.md fix work #1

Description

Current baseline — 4 advisory failures (1 crash)

What to fix

Reproduce locally

Convergence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions