Skip to content

H264Packet should handle Annex B buffers with multiple NAL units #2086

@IcarusL

Description

@IcarusL

Related issues:

Problem

On RK3588, the Android H264 encoder may output one encoded frame as an Annex B access unit containing multiple NAL units.

For example:

SPS + PPS + SEI + IDR

or:

SEI + IDR

The current H264Packet logic appears to assume that one input ByteBuffer contains only one NAL unit. It removes only one Annex B start code, then treats the remaining data as a single NAL unit.

For example, this input:

00 00 00 01 06 ... 00 00 00 01 65 ...

is actually:

SEI + IDR

After removing only the first start code, the remaining data becomes:

06 ... 00 00 00 01 65 ...

The current logic may then package this as one FLV AVC NAL unit:

[length][SEI + startCode + IDR]

This is not valid AVC-in-FLV payload. FLV AVC payload should use length-prefixed NAL units and should not contain Annex B start codes inside the payload.

The correct output should be either:

[SEI length][SEI data][IDR length][IDR data]

or, if SEI is intentionally ignored:

[IDR length][IDR data]

Impact

This can produce malformed RTMP video packets.

In our case, the RTMP server closes the connection after receiving the invalid video data. After the socket is closed, Ktor may throw from its internal cio-to-nio-writer coroutine, which is tracked separately in #2085.

We also tried to work around this at the upper layer by slicing the input ByteBuffer or adjusting MediaFrame.Info to remove SEI before passing the frame to H264Packet.

However, that workaround is currently unreliable because of the ByteBuffer copy/range behavior described in #2083. A sliced heap ByteBuffer still shares the original backing array, and directly using array() does not return the visible sliced range.

Because of that, the only safe workaround right now is to create a new clean ByteBuffer that contains only the NAL units we want to send.

Expected behavior

H264Packet should not assume that one encoded input buffer contains exactly one NAL unit.

It should first parse the Annex B input buffer into individual NAL units, then process each NAL unit according to its type.

Expected handling:

  • split input by Annex B start codes: 00 00 01 and 00 00 00 01
  • remove all start codes before writing FLV AVC payload
  • write each NAL unit with its own 4-byte big-endian length prefix
  • handle SPS/PPS through AVC sequence header
  • handle SEI explicitly, either by keeping it as a separate NAL unit or dropping it intentionally
  • detect keyframe status by scanning all NAL units in the access unit, not only the first NAL unit

Root cause chain

RK3588 encoder outputs H264 buffer with SEI + IDR
        ↓
H264Packet removes only one start code
        ↓
SEI + startCode + IDR is packed as one FLV NAL unit
        ↓
RTMP server receives malformed AVC payload
        ↓
server closes socket
        ↓
Ktor internal writer coroutine throws

So #2085 is about safely handling the socket failure after the server closes the connection.

#2083 is about the ByteBuffer issue that blocks a simple upper-layer workaround.

This issue is about the actual RTMP/H264 packaging root cause: H264Packet should correctly parse Annex B access units containing multiple NAL units.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions