Skip to content

[+] Fix: Validate frame types in packets to prevent protocol violations#525

Open
Sy0307 wants to merge 9 commits into
mainfrom
fix/validate_frame_type
Open

[+] Fix: Validate frame types in packets to prevent protocol violations#525
Sy0307 wants to merge 9 commits into
mainfrom
fix/validate_frame_type

Conversation

@Sy0307

@Sy0307 Sy0307 commented Dec 15, 2025

Copy link
Copy Markdown
Collaborator

Thanks for @NErinola.

Following the QUIC RFC, we validate frame types in packets to prevent protocol violations, as the Table 3 shows:

  Table 3 lists and summarizes information about each frame type that
   is defined in this specification.  A description of this summary is
   included after the table.

    +============+======================+===============+======+======+
    | Type Value | Frame Type Name      | Definition    | Pkts | Spec |
    +============+======================+===============+======+======+
    | 0x00       | PADDING              | Section 19.1  | IH01 | NP   |
    +------------+----------------------+---------------+------+------+
    | 0x01       | PING                 | Section 19.2  | IH01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x02-0x03  | ACK                  | Section 19.3  | IH_1 | NC   |
    +------------+----------------------+---------------+------+------+
    | 0x04       | RESET_STREAM         | Section 19.4  | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x05       | STOP_SENDING         | Section 19.5  | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x06       | CRYPTO               | Section 19.6  | IH_1 |      |
    +------------+----------------------+---------------+------+------+
    | 0x07       | NEW_TOKEN            | Section 19.7  | ___1 |      |
    +------------+----------------------+---------------+------+------+
    | 0x08-0x0f  | STREAM               | Section 19.8  | __01 | F    |
    +------------+----------------------+---------------+------+------+
    | 0x10       | MAX_DATA             | Section 19.9  | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x11       | MAX_STREAM_DATA      | Section 19.10 | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x12-0x13  | MAX_STREAMS          | Section 19.11 | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x14       | DATA_BLOCKED         | Section 19.12 | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x15       | STREAM_DATA_BLOCKED  | Section 19.13 | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x16-0x17  | STREAMS_BLOCKED      | Section 19.14 | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x18       | NEW_CONNECTION_ID    | Section 19.15 | __01 | P    |
    +------------+----------------------+---------------+------+------+
    | 0x19       | RETIRE_CONNECTION_ID | Section 19.16 | __01 |      |
    +------------+----------------------+---------------+------+------+
    | 0x1a       | PATH_CHALLENGE       | Section 19.17 | __01 | P    |
    +------------+----------------------+---------------+------+------+
    | 0x1b       | PATH_RESPONSE        | Section 19.18 | ___1 | P    |
    +------------+----------------------+---------------+------+------+
    | 0x1c-0x1d  | CONNECTION_CLOSE     | Section 19.19 | ih01 | N    |
    +------------+----------------------+---------------+------+------+
    | 0x1e       | HANDSHAKE_DONE       | Section 19.20 | ___1 |      |
    +------------+----------------------+---------------+------+------+

                            Table 3: Frame Types

Comment thread src/transport/xqc_frame.c
case 0x01: /* PING */
case 0x1c: /* CONNECTION_CLOSE (transport) */
case 0x1d: /* CONNECTION_CLOSE (application) */
allowed = XQC_TRUE;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not true. Actually in RFC 9000 which says:

Sending a CONNECTION_CLOSE of type 0x1d in an Initial or Handshake packet could expose application state or be used to alter application state. A CONNECTION_CLOSE of type 0x1d MUST be replaced by a CONNECTION_CLOSE of type 0x1c when sending the frame in Initial or Handshake packets. Otherwise, information about the application state might be revealed. Endpoints MUST clear the value of the Reason Phrase field and SHOULD use the APPLICATION_ERROR code when converting to a CONNECTION_CLOSE of type 0x1c.

And also in Section 12.5 "Frames and Number Spaces" which says:

CONNECTION_CLOSE frames signaling errors at the QUIC layer (type
0x1c) MAY appear in any packet number space. CONNECTION_CLOSE
frames signaling application errors (type 0x1d) MUST only appear
in the application data packet number space.

@Yanmei-Liu Yanmei-Liu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — solid structure (RFC 9000 §12.4 Table 3 enforcement on receive + the matching send-side fix in xqc_packet_out.c). The XQC_PIF_FEC_RECOVERED exemption is also the right call to avoid regressing the FEC fix from #547.
Two must-fix items before merging:
1. 0x1d (CONNECTION_CLOSE with application reason) is misclassified as IH01.
It must be __01 (0-RTT and 1-RTT only). RFC 9000 §10.2.3 (lines 3322-3329):

A CONNECTION_CLOSE of type 0x1d MUST be replaced by a
CONNECTION_CLOSE of type 0x1c when sending the frame in Initial or
Handshake packets.

Reinforced by §12.4 line 3951 ("ih: Only a CONNECTION_CLOSE frame of type 0x1c can appear in Initial or Handshake packets") and §12.5 lines 4011-4014. Suggested split:

case 0x1c: /* CONNECTION_CLOSE (transport) — IH01 */
    allowed = XQC_TRUE;
    break;
case 0x1d: /* CONNECTION_CLOSE (application) — __01 */
    allowed = (pkt_flag_0rtt || pkt_flag_1rtt);
    break;

I cross-checked the rest of Table 3 (all 31 frame types) and this is the only deviation.
2. Initialize the locals in xqc_validate_frame_type_in_pkt (diff line 182).

xqc_bool_t allowed = XQC_FALSE;
xqc_bool_t pkt_flag_init = XQC_FALSE, pkt_flag_hsk = XQC_FALSE,
           pkt_flag_0rtt = XQC_FALSE, pkt_flag_1rtt = XQC_FALSE;

Fail-closed default for allowed is the right posture for a security-decision variable, and it protects against future early-returns being inserted before the assignments.
Non-blocking: a couple of unit tests (INIT+STREAM → PROTOCOL_VIOLATION; handshake-time xqc_write_streams_blocked_to_packet → buffered as 1-RTT) would round this out nicely.
Once the two items are in, this is good to merge. Nice work!

@Yanmei-Liu

Copy link
Copy Markdown
Collaborator

A few more code-quality nits worth folding in while you're iterating:

3. Hidden side effect in xqc_get_pkt_type_for_app_data_frame. The helper sets conn->conn_flag |= XQC_CONN_FLAG_HAS_0RTT; internally, but the get_* name reads as a pure query. Either rename to xqc_select_pkt_type_for_app_data_frame, or add a /* SIDE EFFECT: ... */ comment at the top.
4. Duplicated CAN_SEND_1RTT check in xqc_write_max_stream_data_to_packet. The trailing if (pkt_type == XQC_PTYPE_SHORT_HEADER && !CAN_SEND_1RTT) buff_pkt = XQC_TRUE; re-implements logic the helper already owns. They agree today but will drift next time someone touches the helper. Suggest absorbing the "caller already picked an app-data slot — verify readiness" path into the helper itself, so all three call sites collapse to:

xqc_pkt_type_t pkt_type = xqc_select_pkt_type_for_app_data_frame(conn, requested, &buff_pkt);

5. PROTOCOL_VIOLATION log line is missing pkt_num. When this fires in production there's no way to grep back to the offending packet. Add packet_in->pi_pkt.pkt_num to the format. (Existing PROTOCOL_VIOLATION log lines elsewhere in the codebase don't include it either — good chance to set the better pattern.)
6. Flatten the switch. Combined with the 0x1d split from must-fix #1, replacing the default: if/else range check with explicit cases makes the table-vs-code mapping 1:1 with RFC 9000 §12.4 Table 3:

case 0x00: case 0x01: case 0x1c:                     /* IH01 */
    allowed = XQC_TRUE; break;
case 0x02: case 0x03: case 0x06:                     /* IH_1 */
    allowed = (pkt_flag_init || pkt_flag_hsk || pkt_flag_1rtt); break;
case 0x07: case 0x1b: case 0x1e:                     /* ___1 */
    allowed = pkt_flag_1rtt; break;
case 0x04: case 0x05:                                /* __01 */
case 0x08 ... 0x1a:                                  /* (or list explicitly if GCC range syntax is off-limits) */
case 0x1d:
    allowed = (pkt_flag_0rtt || pkt_flag_1rtt); break;
default:
    return XQC_OK;                                   /* unknown → upstream switch handles */
}

3 & 4 are structural; 5 & 6 are polish. None block direction — together with the two must-fix items, they'll round this into a really clean contribution.

@Sy0307 Sy0307 force-pushed the fix/validate_frame_type branch from 4408503 to e8469ec Compare May 7, 2026 13:05

@Yanmei-Liu Yanmei-Liu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢修复,3 个修复方向都是对的,特别是 fix 3 顺手把"INIT/HSK 包里出现 STREAMS_BLOCKED"这个预存的 RFC 9000 §12.4 违反也修了。但有 2 个 Critical 问题不能放过,建议补全后再合并。

必须修改

1. handshake 期 conn_close 错误码判定边界冲突(修复 2 引入 wire 级 RFC 违反)

src/transport/xqc_packet_out.c:792

xqc_bool_t is_app = err_code >= H3_NO_ERROR;

H3_NO_ERROR = 0x100,但 RFC 9000 §20.1 把 0x0100-0x01FF 划给 transport-level CRYPTO_ERROR。xquic 自身在 src/transport/xqc_conn.c:6311tls_err | TRA_CRYPTO_ERROR_BASE 写入这个区间,include/xquic/xqc_errno.h:27-28 也定义 TRA_HS_CERTIFICATE_VERIFY_FAIL=0x1FE / TRA_CRYPTO_ERROR=0x1FF

handshake 期 TLS 失败时,err_code 落在 0x100-0x1FF,新代码 is_app && (INIT||HSK) 进入分支,把真实的 CRYPTO_ERROR 覆写为 TRA_APPLICATION_ERROR=0x0C,发到 wire。结果:

  • 对端只看到 "application error 0xC",丢失了 TLS alert 类型 / cert verify failure 等关键诊断
  • 这本身违反 RFC §10.2.3 "不要泄露 application 状态"的精神——但反过来把真正的 transport 错误也吞掉了

修复:

/* H3 application error codes are >= 0x200 (xqc_errno.h);
   [0x100, 0x1FF] is reserved for transport CRYPTO_ERROR per RFC 9000 §20.1 */
xqc_bool_t is_app = err_code >= 0x200;

或者更彻底:在 xqc_write_conn_close_to_packet 接口加 err_kind 入参(TRANSPORT/APPLICATION),从源头区分,避免依赖数值范围判定。

2. 扩展 frame 在 INIT/HSK 中静默放行(修复 1 漏 RFC 9221)

src/transport/xqc_frame.c:201-203

if (frame_type > 0x1e) {
    return XQC_OK;
}

这一刀切放行了 xquic 已经接收处理的所有扩展帧。最关键的:

RFC 9221 §4: "An endpoint that receives a DATAGRAM frame in an Initial or Handshake packet MUST treat the receipt of such a frame as a connection error of type PROTOCOL_VIOLATION."

DATAGRAM (0x30/0x31) 在 INIT/HSK 中现在不会被拒。其他应纳入校验的:

Frame 数值 应限制
DATAGRAM 0x30 / 0x31 __01 (RFC 9221 §4 MUST)
ACK_EXT 0xB1 IH_1
MP_* 多路径帧 0x15228c00..0c0c __01
FEC 0xfec5 / 0xfec6 __01

参见 src/transport/xqc_frame_parser.h:20-30xqc_frame.c:387-484,xquic 实际处理这些帧但校验白名单里没列。

修复:在 switch 之后、> 0x1e 兜底之前,显式加扩展帧分支,至少把 DATAGRAM 显式纳入 __01 校验。

3. 补关键测试

当前覆盖矩阵约 2/120:

  • 修复 1 只测 (INIT, 0x1d) 一个组合
  • 修复 2 只测 INIT 路径,没测 HSK,也没测 err_code 边界值
  • 修复 3 测了 3/6 个 frame(漏 STREAMS_BLOCKED / MAX_STREAM_DATA / MAX_STREAMS)

至少补:

/* DATAGRAM in INIT 必须被拒(RFC 9221 §4) */
char dgram[] = {0x30, 0x00};
pi.pi_pkt.pkt_type = XQC_PTYPE_INIT;
pi.pos = dgram; pi.last = dgram + sizeof(dgram);
CU_ASSERT(xqc_process_frames(conn, &pi) == -XQC_EPROTO);

/* err_code 边界四点:0xFF / 0x100 / 0x1FF / 0x200 */
/* STREAMS_BLOCKED / MAX_STREAM_DATA / MAX_STREAMS 三个对称 case */

4. 测试名误导

xqc_test_handshake_app_conn_close_is_converted 实际跑的是 fresh client INIT 状态(xqc_state_to_pkt_type 返回 INIT),HSK 分支根本没覆盖。要么改名为 ..._initial_app_conn_close_is_converted,要么补一个真正进入 HSK 状态的测试。

建议合并后跟进

  • 拆 PR:3 个修复(frame validation / conn_close err 转换 / packet type selector + buffering)是 3 个不相关问题。混打使 bisect 和 revert 困难。
  • 修复 3 双操作冗余:6 个 callsite 都是 xqc_send_queue_move_to_high_pri() + if (buff_pkt) xqc_conn_buff_1rtt_packet()。后者内部会先 remove_send 再 insert_buff,high_pri 入队是空操作。改为 if (buff_pkt) buff_1rtt; else move_to_high_pri;
  • selector 副作用xqc_select_pkt_type_for_app_data_frame 选 0-RTT 时同时置 XQC_CONN_FLAG_HAS_0RTT,违反"select"单一职责。OR 写入幂等不会出错,但语义混乱,建议外移到调用方。
  • 死代码注释pkt_type >= XQC_PTYPE_NUM(!init && !hsk && !0rtt && !1rtt) 是防御性死代码(enum 0-5,VN/RETRY 走 xqc_packet_decrypt_single 上游隔离),加注释说明意图。
  • 端到端测试:从 xqc_engine_packet_process 喂伪造 INIT(0x30) 验证整条 stack 真的产生 PROTOCOL_VIOLATION CONNECTION_CLOSE。
  • 表驱动测试(frame_type, pkt_type) → expected_ret 跑全 30×4,至少覆盖 NEW_TOKEN / HANDSHAKE_DONE / STREAM / ACK 关键交叉点。

小结

逐帧矩阵实现在标准 0x00-0x1e 区间是干净的,handshake 期 0x1d→0x1c 转换的 wire format 也对。但错误码空间冲突会把真实 CRYPTO_ERROR 抹掉,扩展帧兜底会让 RFC 9221 MUST 被静默绕过。fix 3 顺手解决的预存 RFC 违反是 PR 隐藏价值,值得肯定。补全 1-4 项后重新提交。

@Sy0307 Sy0307 force-pushed the fix/validate_frame_type branch from 18906c3 to d8a9978 Compare May 29, 2026 09:07

@Yanmei-Liu Yanmei-Liu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢迭代修复,上轮两个 Critical 都处理干净了:DATAGRAM 已纳入校验,crypto error 范围已单独判断。Table 3 映射逐帧比对全部正确。

但有两个问题需要处理一下:

必须修改

1. CI 编译失败

xqc_test_parse_stream_frame_inner 被删了但 xqc_test_stream_frame_offset_overflow (267行) 还在调它,Ubuntu 上 -Werror 直接挂掉。

恢复这个函数就行,注意给 pi.pi_pkt.pkt_type 设成 XQC_PTYPE_SHORT_HEADER,否则新的验证器会因为 pkt_type 默认值 0 (INIT) 拒掉 STREAM 帧。

2. 测试覆盖补一补

当前帧类型拒绝只测了 5 个组合,Handshake 包类型一个都没测过。至少补几个关键路径:

  • ACK (0x02) 在 0-RTT 中被拒 -- 这是安全敏感路径,恶意 0-RTT 里塞 ACK 能干扰对端状态
  • HANDSHAKE_DONE (0x1e) 在 Handshake 中被拒 -- 覆盖 HSK 作为拒绝目标
  • NEW_TOKEN (0x07) 在 0-RTT 中被拒 -- 覆盖 ___1 类别
  • 流控帧缓冲补全 STREAMS_BLOCKED / MAX_STREAM_DATA / MAX_STREAMS,跟已测的 3 个是同源模板但没保险就是没保险

建议后续跟进

  • is_app 那段加个注释说明 H3_NO_ERROR == TRA_CRYPTO_ERROR_BASE == 0x100 的重合,以及为什么在 INIT/HSK 阶段不会有 H3 错误所以当前逻辑是安全的
  • xqc_process_stream_frame 里原来的 INIT/HSK 检查现在是 dead code 了(集中验证器已覆盖),保留没问题但标个注释说是 defense-in-depth
  • buff/high_pri 那 6 个 callsite 可以改成 if (buff_pkt) buff_1rtt; else move_to_high_pri; 去掉空操作
  • 1-RTT 路径里 crypto error (0x100-0x1FF) 还是会被 is_app 判成 application error,走 0x1d 帧类型,这是预存问题,可以后续 PR 统一修

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants