Transformer inference in C++. A header-only library providing tensor operations and transformer building blocks.
The library has zero external dependencies — it relies only on the C++ standard library. OpenMP #pragma directives are used in a few hot loops for optional parallelism but are silently ignored by compilers without OpenMP support.
- CMake 3.16+
- C++20 compiler (GCC 11+, Clang 14+)
cmake -S . -B build
cmake --build buildTests use GoogleTest and are fetched automatically via CMake's FetchContent — no manual install needed.
cmake -S . -B build -DBUILD_TESTS=ON
cmake --build build
cd build && ctest --output-on-failureThe library is header-only. Link against the transformers_lite CMake target:
add_subdirectory(transformers-lite)
target_link_libraries(your_target PRIVATE transformers_lite)Then include the headers you need:
#include <transformers-lite/tensor.hpp>
#include <transformers-lite/ops.hpp>
using namespace transformers;
int main() {
Tensor<CPU, float> x(Shape(4));
x(0) = 1.0f;
x(1) = 2.0f;
x(2) = 3.0f;
x(3) = 4.0f;
}Most custom ops only require a new expression struct — no changes to the library internals.
An expression must satisfy the TensorExpr<E, COMPUTE, T> concept:
| Method | Description |
|---|---|
Shape outputShape() const |
Returns the shape of the result |
void evalInto(TensorView<T>& out) const |
Writes the computed result into out |
Tensor::operator= detects this interface automatically, so the assignment syntax result = myExpr(...) just works.
#include <transformers-lite/core/ops/exprs.hpp>
namespace transformers_lite {
template <template <class> class COMPUTE, typename T>
struct ClampExpr {
using value_type = T;
TensorView<T> x;
T low, high;
Shape outputShape() const { return x.shape(); }
void evalInto(TensorView<T>& out) const {
for (size_t i = 0; i < x.size(); ++i)
out[i] = std::clamp(x[i], low, high);
}
};
template <template <class> class COMPUTE, typename T>
auto clamp(const Tensor<COMPUTE, T>& x, T low, T high) -> ClampExpr<COMPUTE, T> {
return {TensorView<T>{x}, low, high};
}
} // namespace transformers_liteUsage:
Tensor<CPU, float> x(Shape(4), {-2.f, 0.5f, 1.5f, 3.f});
Tensor<CPU, float> y;
y = clamp(x, 0.f, 1.f); // y == [0, 0.5, 1, 1]If the op is compute-intensive and needs an AVX-512 (or future CUDA) fast path, add the dispatch inside evalInto using the existing Ops<COMPUTE, T> pattern:
- Add the method to
Ops<CPU, T>ininclude/transformers-lite/core/ops/cpu_ops.hpp. - Add an AVX-512 implementation in
include/transformers-lite/core/ops/avx512_ops.hppunder#ifdef __AVX512F__, then dispatch withif constexprinside theOps<CPU, T>method. - Call
Ops<COMPUTE, T>::yourMethod(...)fromevalInto.