Skip to content

DavidLee18/pagurus

Repository files navigation

pagurus

A Clang/LLVM plugin for C memory safety, with two paths:

  1. Sound path — annotate your code with the pg(...) dialect and pagurus proves memory + thread safety, rejecting anything not provably safe. Activated automatically on any file that uses the dialect.
  2. Confirmed-bug path — on plain, un-annotated C, pagurus reports only high-confidence, confirmed memory bugs — never a heuristic or a false positive. This is the default for un-annotated code.

You pick the path by how much you annotate: nothing → a precise bug-finder on legacy C; full pg(...) annotations → a sound checker. A file is routed to the sound engine the moment it uses any pg(...) annotation (override with -Xclang -plugin-arg-pagurus -Xclang heuristic / … sound).

What does it check?

Confirmed-bug path (plain C, no annotations)

Only high-confidence, confirmed memory bugs — no lints, no heuristics, no false positives.

AST level (-fplugin=)

Rule Name Description
E001 use-after-free Dereference or call after free()
E002 double-free free() called twice on the same pointer
E004 return-of-local return &local, return &s.f, return arr, or return p where p = &local — dangling reference
E005 null-deref Dereference without null check after malloc
E006 uninit-use Variable read before initialisation
E011 array-oob Constant array index out of declared bounds
E022 invalid-free free() of a non-heap target: &x, p + n, a local array, or a pointer holding a stack address
E023 use-after-realloc Using the old pointer after q = realloc(p, …) may have freed/moved it

LLVM IR level (-fpass-plugin=)

Rule Name Analysis method
IR-E001 use-after-free AliasAnalysis::isMustAlias + DominatorTree
IR-E001b use-after-free (GEP) GEP element of freed object
IR-E002 double-free Two free() calls that MustAlias
IR-E011 array-oob ScalarEvolution proves a loop index leaves a constant-size or symbolic (malloc) object's bounds; see IR_SCEV_OOB.md
IR-E022 invalid-free free() of a non-heap object (getUnderlyingObject → GlobalVariable/Alloca); module pass, runs at -O0

The IR-level checks route through Clang's DiagnosticsEngine (via the LLVMContext diagnostic handler), so they appear as real error:s with a source caret, are visible to IDEs, and are tested with -verify exactly like the AST checks (tests/run_ir_tests.sh, the pagurus_ir CTest). They need codegen + SSA, so the function-pass witnesses run at -O1; IR-E022 is a module pass and runs even at -O0.

The leak, lifetime, borrow-conflict, drop, lint (strcpy/format-string), and data-race checks the heuristic engine used to emit are not in this path — they are either inherently heuristic (and misfire on idiomatic C) or now belong to the sound path. They moved to the sound engine or were retired.

Sound path (pg(...)-annotated C)

The confirmed-bug path above is an unsound bug-finder (high precision, accepts false negatives). The sound engine is the opposite: over pg(...)-annotated code it rejects all in-scope UB — what is not provably safe is an error. It is a separate analysis (src/pagurus_sound.cpp); sound-dump prints its ownership IR.

The dialect adds capability/bounds/lock annotations through a single pg(...) macropg(owned), pg(mut, a), pg(ref, a), pg(count, n), pg(guarded_by, m), pg(requires, m), pg(send)/pg(sync), pg(drop, free_fn) — that expands to __attribute__((annotate("pagurus::…"))). It is real C, ignored by plain compilers, and (since only pg is a macro) hijacks no common identifiers. Example: pg(owned) char *dup(pg(ref, a) const char *src pg(count, n), size_t n). See DIALECT.md for the spec and SOUND_REDESIGN.md for the engine and the staged build.

Rule Name Domain
E500–E505 use-after-free / double-free / invalid-free / use-after-move / uninit-use / leak (incl. pg(drop) types) temporal
E520–E525 alias-violation / mutate-through-shared / use-while-borrowed / dangling-borrow / lifetime-mismatch / missing-capability borrow & lifetime (mut-XOR-ref)
E540–E543 out-of-bounds / null-deref / bad-bounds / ptr-past-end spatial (static-only)
E560–E564 unguarded-access / missing-capability-at-call / non-send / non-sync / lock-order concurrency (lockset + threads)
clang-18 -fsyntax-only -I std -fplugin=./build/pagurus_plugin.so \
  -Xclang -plugin-arg-pagurus -Xclang sound  my_dialect_code.c

The sound engine uses a CFG ownership IR, a monotone fixpoint state-dataflow (sound "unsafe-if-any-predecessor" join), location-sensitive borrow loans, a static bounds check, and a flow-sensitive lockset. Its oracle suite lives in tests/sound/.

Routing: a translation unit that uses the dialect — any pg(...) annotation — is checked by the sound engine automatically, no flag needed; un-annotated C goes to the confirmed-bug engine (so existing code is unaffected). sound forces the sound engine; heuristic forces the confirmed-bug engine even on annotated code.

Quick start

Build

# Ubuntu 24.04
sudo apt install clang-18 llvm-18-dev libclang-18-dev cmake

mkdir build && cd build
cmake .. \
  -DLLVM_DIR=$(llvm-config-18 --cmakedir) \
  -DClang_DIR=/usr/lib/llvm-18/lib/cmake/clang
make -j$(nproc)

See BUILDING.md for detailed build instructions and runtime dependencies.

Usage

# AST checks only (E001–E021):
clang -fplugin=./build/pagurus_plugin.so -c your_file.c

# AST + LLVM IR analysis (adds GEP/bitcast/loop-carried checks + drop injection):
clang -fplugin=./build/pagurus_plugin.so \
      -fpass-plugin=./build/pagurus_plugin.so \
      -g -O0 -c your_file.c

Note: #pragma pagurus is retired. Annotations are written with the pg(...) dialect macro (sound path); plain C needs no annotations (confirmed-bug path). See DIALECT.md.

Multi-file projects

Run pagurus across an entire codebase using the included tools:

# Check all files under src/ with 4 parallel jobs
./pagurus-check --plugin=./build/pagurus_plugin.so \
                --cflags="-Iinclude" \
                --jobs=4 --dir=src

# Or use a compilation database
bear make
./pagurus-check --plugin=./build/pagurus_plugin.so \
                --compile-db=compile_commands.json

Integrate into Makefiles:

# myproject/Makefile
PAGURUS_PLUGIN = /path/to/build/pagurus_plugin.so
include /path/to/pagurus.mk

# Then run:
# make pagurus-check

See INTEGRATION.md for complete integration guide.

Key features

  • Non-lexical lifetimes (NLL): Precise borrow tracking with loan release at last use
  • Control flow analysis: Conditional and loop-aware borrow propagation
  • Inter-procedural: Function summaries for return-alias and parameter effects
  • Move semantics: Rust-style ownership transfer for drop-annotated types
  • Drop injection: Automatic RAII-style cleanup at IR level with -fpass-plugin=
  • Source transformation: Produces plain C code without pagurus annotations
  • Two-tier analysis: AST for precision + IR for patterns invisible at source level

Documentation

License

MIT

About

🚧 a rust-style borrow-checking clang-plugin for C, inspired by CORAL!

Topics

Resources

License

Stars

Watchers

Forks

Contributors