This component provides Python bindings via pybind11 to build SDFGs with Python.
Additionally, this component provides the @native decorator, which automatically
converts Python functions into SDFGs and codegens them for the available target.
- Bindings: Build SDFGs programmatically with Python
- AST Parser: Automatically compile Python/NumPy code to optimized native code
- Targets: Support for CPU (sequential, OpenMP), CUDA GPUs, and other accelerators
To build the Python component from sources run pip on the component's directory:
pip install -e python/- Python >= 3.11
- NumPy >= 1.19.0
The following system dependencies must be installed (same as core components):
sudo apt-get install -y libgmp-dev libzstd-dev
sudo apt-get install -y nlohmann-json3-dev
sudo apt-get install -y libboost-graph-dev libboost-graph1.74.0
sudo apt-get install -y libisl-devClang/LLVM 19 is required for code generation. Install it with:
# Ubuntu/Debian
sudo apt-get install -y clang-19 llvm-19For CUDA support, you also need the NVIDIA CUDA Toolkit installed.
The @native decorator is the primary way to use the Python frontend. It automatically:
- Parses the Python function's AST
- Converts it to an SDFG representation
- Applies optimizations based on the target
- Compiles to native code
- Executes and returns results
from docc.python import native
import numpy as np
@native
def vector_add(A, B, C, N):
for i in range(N):
C[i] = A[i] + B[i]
# Usage
N = 1024
A = np.random.rand(N).astype(np.float64)
B = np.random.rand(N).astype(np.float64)
C = np.zeros(N, dtype=np.float64)
vector_add(A, B, C, N) # JIT compiles and executesThe @native decorator accepts a target parameter to specify the code generation backend:
No scheduling or optimization is applied. The SDFG is compiled as-is without parallelization. Useful for debugging or when you want to manually control the generated code.
@native(target="none")
def simple_loop(A, B, N):
for i in range(N):
B[i] = A[i] * 2.0Generates optimized sequential code with SIMD vectorization.
import math
@native(target="sequential")
def vectorized_sin(A, B):
for i in range(A.shape[0]):
B[i] = math.sin(A[i])
N = 128
A = np.random.rand(N).astype(np.float64)
B = np.zeros(N, dtype=np.float64)
vectorized_sin(A, B)Generates parallel code using OpenMP. Suitable for multi-core CPUs. Loops are automatically parallelized with appropriate scheduling.
@native(target="openmp", category="desktop")
def parallel_add(A, B, C, N):
for i in range(N):
C[i] = A[i] + B[i]
# Executes in parallel across CPU cores
N = 1000000
A = np.random.rand(N).astype(np.float64)
B = np.random.rand(N).astype(np.float64)
C = np.zeros(N, dtype=np.float64)
parallel_add(A, B, C, N)Generates CUDA code for NVIDIA GPUs. Loops are mapped to GPU thread blocks and threads.
@native(target="cuda", category="server")
def gpu_add(A, B, C, N):
for i in range(N):
C[i] = A[i] + B[i]
# Executes on GPU (data is automatically transferred)
N = 1024
A = np.random.rand(N).astype(np.float64)
B = np.random.rand(N).astype(np.float64)
C = np.zeros(N, dtype=np.float64)
gpu_add(A, B, C, N)The category parameter provides hints to the scheduler about the target hardware:
"edge""desktop""server"
@native(target="openmp", category="desktop")
def cpu_kernel(A, B):
...
@native(target="cuda", category="server")
def gpu_kernel(A, B):
...For more control, you can manually compile and cache the SDFG:
@native(target="openmp")
def my_kernel(A, B, C, N):
for i in range(N):
C[i] = A[i] + B[i]
# Pre-compile with sample arguments
compiled = my_kernel.compile(A, B, C, N)
# Reuse compiled version
result = compiled(A, B, C, N)
# Access the underlying SDFG
sdfg = my_kernel.last_sdfgThis component is part of docc and is published under the BSD-3-Clause license. See LICENSE for details.