Summary
The ptrace and eBPF tracers don't track splice(2) syscalls. Files read via splice — most notably by modern cat(1) — show read=0 in the tracer report even though the bytes did flow through the file descriptor in the kernel.
The preload tracer is not affected: it interposes at the libc level (open / openat) and credits open(O_RDONLY) as evidence of a read via the schema's OpenRead event, independent of how the bytes ultimately moved.
Reproducer
Running the bare ptrace tracer binary against cat:
$ echo "hello" > /tmp/test.txt
$ /home/ubuntu/roar/rust/target/release/roar-tracer report.msgpack cat /tmp/test.txt
hello
Decoded report (filtered to the file of interest):
read=False written=False path=/tmp/test.txt
Same result via roar run:
$ roar run --tracer ptrace bash -c "echo hello > test.txt ; cat test.txt"
$ roar dag
# shows out:1, in:0 — the cat read is missing
Confirmed reproducible on eBPF too (per user testing).
Root cause
Strace of cat /tmp/test.txt:
openat(AT_FDCWD, "/tmp/test.txt", O_RDONLY|O_CLOEXEC) = 3
splice(3, NULL, 5, NULL, 1048576, 0) = 6 ← splice, not read()
splice(4, NULL, 1, NULL, 6, 0) = 6
splice(3, NULL, 5, NULL, 1048576, 0) = 0 ← EOF
Modern coreutils cat uses splice(2) to move bytes directly from the input fd to stdout's pipe, never calling read(2). The tracers track:
- ptrace (
rust/tracers/ptrace/src/main.rs): SYS_READ / SYS_PREAD64 / SYS_READV / SYS_PREADV / SYS_PREADV2, plus SYS_WRITE family, SYS_SENDFILE, SYS_COPY_FILE_RANGE.
- eBPF (
rust/tracers/ebpf/userspace/src/events.rs): equivalent EventType::Read / Write / Sendfile / CopyFileRange variants.
Neither lists SYS_SPLICE. Since splice() from a regular-file fd to a pipe is semantically a read of the regular file, but the tracer never sees an event tied to that fd, the file ends up read=0.
sendfile and copy_file_range are already handled — splice is the same kind of syscall and should mirror that shape.
Scope of impact
Narrower than it sounds. The tools that use splice() for their primary I/O are mostly viewers and copiers:
cat <file> — splices file → pipe (above).
cp --reflink=auto on filesystems that support it.
- Sometimes
dd with certain options.
Data-processing tools (python, awk, jq, grep, sed) overwhelmingly use read(2), so real pipelines (e.g. clean.sh → aggregate.sh → report.sh) capture reads correctly today.
The bug surfaces when a user includes a cat step purely to inspect / debug an intermediate output mid-pipeline, or when the lineage step deliberately uses cat/cp as the action.
This is a pre-existing gap — not a regression from any recent change — and was caught while investigating P1-run3 chained-pipeline write capture (the write side was fixed in commit 5936e4e for preload before 0.3.0 and works correctly on ptrace + eBPF; only the read side via splice is missing).
Proposed fix
ptrace (small, ~15 lines)
In rust/tracers/ptrace/src/main.rs:
- Entry handler for
SYS_SPLICE: capture fd_in (arg0) and fd_out (arg2) into pending_writes-style structures so the exit handler can confirm bytes-flowed before crediting.
- Exit handler: if
ret_val > 0, mirror the SENDFILE logic — mark_read_with_thread(pid, fd_in) and mark_path_written_with_thread(fd_out_path) (if fd_out is a tracked file fd, not a pipe).
Pattern is already in the file for SYS_SENDFILE and SYS_COPY_FILE_RANGE.
eBPF (bigger lift)
- Add
EventType::Splice to the schema (rust/tracers/ebpf/userspace/src/events.rs and the kernel-side BPF).
- Kernel-side: attach to the
sys_enter_splice / sys_exit_splice tracepoints, emit an event with (fd_in, fd_out, bytes).
- Userspace handler: mirror the existing
Sendfile / CopyFileRange cases.
Test
Add coverage analogous to test_preload_shell_pipelines.py::test_awk_redirect_then_cat, but for ptrace and eBPF — assert that cat <file> produces a read=True entry in the tracer report.
Priority
P2 — narrow user-visible impact (cat-style steps, not data-processing pipelines), pre-existing, and the workaround (use python -c 'print(open("x").read())' or similar) exists. Not blocking the next release.
Summary
The ptrace and eBPF tracers don't track
splice(2)syscalls. Files read via splice — most notably by moderncat(1)— showread=0in the tracer report even though the bytes did flow through the file descriptor in the kernel.The preload tracer is not affected: it interposes at the libc level (
open/openat) and creditsopen(O_RDONLY)as evidence of a read via the schema'sOpenReadevent, independent of how the bytes ultimately moved.Reproducer
Running the bare ptrace tracer binary against
cat:Decoded report (filtered to the file of interest):
Same result via
roar run:Confirmed reproducible on eBPF too (per user testing).
Root cause
Strace of
cat /tmp/test.txt:Modern coreutils
catusessplice(2)to move bytes directly from the input fd to stdout's pipe, never callingread(2). The tracers track:rust/tracers/ptrace/src/main.rs):SYS_READ/SYS_PREAD64/SYS_READV/SYS_PREADV/SYS_PREADV2, plusSYS_WRITEfamily,SYS_SENDFILE,SYS_COPY_FILE_RANGE.rust/tracers/ebpf/userspace/src/events.rs): equivalentEventType::Read/Write/Sendfile/CopyFileRangevariants.Neither lists
SYS_SPLICE. Sincesplice()from a regular-file fd to a pipe is semantically a read of the regular file, but the tracer never sees an event tied to that fd, the file ends upread=0.sendfileandcopy_file_rangeare already handled —spliceis the same kind of syscall and should mirror that shape.Scope of impact
Narrower than it sounds. The tools that use
splice()for their primary I/O are mostly viewers and copiers:cat <file>— splices file → pipe (above).cp --reflink=autoon filesystems that support it.ddwith certain options.Data-processing tools (
python,awk,jq,grep,sed) overwhelmingly useread(2), so real pipelines (e.g.clean.sh → aggregate.sh → report.sh) capture reads correctly today.The bug surfaces when a user includes a
catstep purely to inspect / debug an intermediate output mid-pipeline, or when the lineage step deliberately usescat/cpas the action.This is a pre-existing gap — not a regression from any recent change — and was caught while investigating P1-run3 chained-pipeline write capture (the write side was fixed in commit
5936e4efor preload before 0.3.0 and works correctly on ptrace + eBPF; only the read side via splice is missing).Proposed fix
ptrace (small, ~15 lines)
In
rust/tracers/ptrace/src/main.rs:SYS_SPLICE: capturefd_in(arg0) andfd_out(arg2) intopending_writes-style structures so the exit handler can confirm bytes-flowed before crediting.ret_val > 0, mirror theSENDFILElogic —mark_read_with_thread(pid, fd_in)andmark_path_written_with_thread(fd_out_path)(iffd_outis a tracked file fd, not a pipe).Pattern is already in the file for
SYS_SENDFILEandSYS_COPY_FILE_RANGE.eBPF (bigger lift)
EventType::Spliceto the schema (rust/tracers/ebpf/userspace/src/events.rsand the kernel-side BPF).sys_enter_splice/sys_exit_splicetracepoints, emit an event with(fd_in, fd_out, bytes).Sendfile/CopyFileRangecases.Test
Add coverage analogous to
test_preload_shell_pipelines.py::test_awk_redirect_then_cat, but for ptrace and eBPF — assert thatcat <file>produces aread=Trueentry in the tracer report.Priority
P2 — narrow user-visible impact (cat-style steps, not data-processing pipelines), pre-existing, and the workaround (use
python -c 'print(open("x").read())'or similar) exists. Not blocking the next release.