Skip to content

Latest commit

 

History

History

README.md

Embind / pybind11 binding codegen

This tool parses annotated C++ headers via libclang and emits backend-specific binding source. Today it generates Embind bindings (bindings/wasm/mdx_bindings.cpp); the IR is backend-neutral, so adding a pybind11 emitter is mostly mechanical.

Quick start

# Embind (default backend) -> bindings/wasm/<module>_bindings.cpp
python -m tools.codegen.codegen mdx
python -m tools.codegen.codegen m2
python -m tools.codegen.codegen m3

# pybind11 -> bindings/python/<module>_bindings.cpp
python -m tools.codegen.codegen mdx --backend pybind11
python -m tools.codegen.codegen m2  --backend pybind11
python -m tools.codegen.codegen m3  --backend pybind11

scripts/build-wasm.ps1 runs the Embind codegen automatically before cmake --build. scripts/build-python.ps1 does the equivalent for the pybind11 backend, builds the .pyd, and stages it into bindings/python/.

Workflow when changing bindings

To add, remove, or alter a binding, edit the C++ header, not the generated .cpp:

  1. Add or modify an @bind annotation in include/whiteout/....
  2. Run python -m tools.codegen.codegen mdx.
  3. Build: scripts\build-wasm.ps1.
  4. Test: node --test tests\wasm\smoke.test.js.

mdx_bindings.cpp carries a // AUTOGENERATED banner; never edit it directly.

Annotation reference

Annotations live in C++ doc comments (///, /**…*/, or trailing ///< on a field). The codegen reads them via libclang's cursor.raw_comment.

/// @bind                                  // include this declaration
/// @bind value_object                     // bind as Embind value_object
/// @bind rename=isHd                      // rename the field/JS member
/// @bind skip                             // exclude this field
/// @bind array_with_view                  // vector<u8>: also emit *View()
/// @bind js_name=MdxNoParent              // override JS-side name
/// @bind cpp_expr=Track<f32>::kFoo        // override C++ source expression
/// @bind fields=x;y;z                     // explicit field list (anon unions)
/// @bind track_template, instantiate=...  // template marker (Track<T>)

Multiple modifiers per line are comma-separated:

/// @bind value_object, fields=x;y;z

Where to put the annotation

libclang's raw_comment association rules vary; both leading (/// @bind) and trailing (///< @bind) forms work, but a field with both a leading /// and a trailing ///< will only see the trailing one. So when a field already has a trailing description, put the annotation in the trailing comment:

std::vector<u8> vertexGroups;  ///< @bind array_with_view — Bone groups per vertex

The em-dash () or -- separates the annotation from the human-readable text.

Module configuration

Per-format settings live in tools/codegen/modules/<name>.py:

CONFIG = ModuleConfig(
    name='mdx',
    cpp_namespace='whiteout::mdx',
    js_prefix='Mdx',                   # MdxBone, MdxLayer, …
    embind_block='mdx',                # EMSCRIPTEN_BINDINGS(mdx) { … }
    headers=[
        'include/whiteout/vector_types.h',
        'include/whiteout/models/mdx/types.h',
        'include/whiteout/models/mdx/structures.h',
    ],
    output_path='bindings/wasm/mdx_bindings.cpp',
    include_dirs=['include'],
    skip_vector_js_names=['VectorU8', 'VectorString'],  # registered in bindings.cpp
)

When you add a sibling module (M2, M3, WEM), copy modules/mdx.py and update the four headers/prefix/namespace fields.

Architecture

tools/codegen/
├── annotations.py     # @bind comment parser
├── ir.py              # backend-neutral data classes (BindModule, BindClass…)
├── parser.py          # libclang -> IR
├── emit_embind.py     # IR -> Embind C++
├── emit_pybind.py     # IR -> pybind11 C++ (placeholder)
├── codegen.py         # CLI: python -m tools.codegen.codegen <module>
└── modules/
    └── mdx.py         # per-module config

The IR is intentionally minimal — every binding is a class, an enum, a constant, a Track instantiation, or a vector container. The Embind emitter resolves naming (Mdx prefix, value_object vs class, vector container names) from the IR plus a small set of conventions that match the project's hand-written code.

Backends

Both Embind and pybind11 are implemented. The IR is shared; only the emitter differs:

Concept Embind pybind11
Class class_<C>("C").constructor<>() py::class_<C>(m, "C").def(py::init<>())
Read/write field .property(name, &C::f) .def_readwrite(name, &C::f)
Value object value_object<V>("V").field(...) regular py::class_ (no separate concept)
Enum enum_<E>("E").value(...) py::enum_<E>(m, "E").value(...)
Vector container register_vector<T>("VecT") PYBIND11_MAKE_OPAQUE(...) + py::bind_vector(...)
Constant constant("Name", v) m.attr("Name") = v
Bytes view typed_memory_view(size, ptr) py::memoryview::from_memory(...)

Naming: Embind keeps the JS-style Mdx/M2/M3 prefix on every class name (MdxBone, M2Bone). pybind11 strips the prefix because every type already lives in a submodule (whiteout.mdx.Bone, whiteout.m2.Bone). Python keyword names (None, True, False) are renamed (NONE, TRUE, FALSE) on the Python side.

The same @bind annotations drive both backends. There are no backend-specific overrides yet; if you need one (e.g. @bind pybind11_skip), add a check in the relevant emitter.