Python SDK for writing endpoints that run on Cozy's worker pool. You write a decorated function, the SDK handles discovery, scheduling, model loading, cancellation, file I/O, streaming, and reporting back to the control plane.
Three endpoint kinds:
- Inference — request/response, optionally streaming.
- Training — long-running, stateful, periodic checkpoints.
- Conversion — produces weight artifacts on a destination repo.
pip install gen-worker[torch] # for PyTorch inference/training
pip install gen-worker[vision] # add torchvision for image/video models
pip install gen-worker # plain Python (e.g. API-proxy endpoints)Optional extras: [images] for gw.io.read_image / write_image,
[audio] for gw.io.read_audio, [trainer] for trainer-class endpoints.
Two files when deploying through Tensorhub's generated-Dockerfile path.
Tensorhub generates the Dockerfile when endpoint.toml has build hints,
installs your dependencies, runs discovery, and wires the runtime entrypoint.
endpoint.toml:
schema_version = 1
main = "myendpoint.main"
[[build.profiles]]
name = "default"
accelerator = "none"
python = "3.12"
dependencies = ["gen-worker>=0.7.5", "msgspec"]main.py:
import msgspec
from gen_worker import RequestContext, inference_function
class Input(msgspec.Struct):
prompt: str
class Output(msgspec.Struct):
text: str
@inference_function
def run(ctx: RequestContext, payload: Input) -> Output:
return Output(text=f"got: {payload.prompt}")That's it. cozyctl endpoint deploy (or the platform UI) takes it from here.
For custom base images, multi-stage builds, or non-pip setup, add a Dockerfile;
Tensorhub will use it instead of generating one.
Declare model dependencies on the decorator's models={...} kwarg. The worker
loads and caches each binding; your function receives the live instance.
from diffusers import StableDiffusionXLPipeline
from gen_worker import Repo, Resources, inference_function
sdxl = Repo("base_model", "stabilityai/stable-diffusion-xl-base-1.0")
@inference_function(
resources=Resources(requires_gpu=True, min_vram_gb=12.0),
models={"pipe": sdxl.flavor("bf16")},
)
def generate(ctx, pipe: StableDiffusionXLPipeline, payload: Input) -> Output:
images = pipe(payload.prompt).images
return Output(image=gw_io.write_image(ctx, "out", images[0]))Resources is the per-function hardware envelope plus dynamic cost shape (used
by the orchestrator for placement and admission). Repo(name, default_ref) is
the binding. The name is the stable model-slot config key Tensorhub can update
after publish; default_ref is only the initial/default repo ref. The old
Repo(ref) / HFRepo(ref) / CivitaiRepo(ref) shape still works for existing
endpoints and uses the model parameter name as the slot key when discovered.
Fixed pick — function pins one specific (repo, flavor?, tag?):
models={"pipe": Repo("base_model", "acme/flux").flavor("bf16")}Dispatch pick — payload-driven, keyed by a Literal[...]-typed field:
from typing import Literal
class Input(msgspec.Struct):
variant: Literal["nf4", "int8"]
prompt: str
@inference_function(
resources=Resources(requires_gpu=True, min_vram_gb=14.0),
models={"pipe": dispatch(
field="variant",
table={
"nf4": flux.flavor("nf4"),
"int8": flux.flavor("int8"),
},
)},
)
def generate(ctx, pipe, payload: Input) -> Output: ...Override-allowed — caller may substitute the default, subject to a pipeline-class allowlist the tenant declares:
models={"pipe": flux.flavor("bf16").allow_override(StableDiffusionXLPipeline)}The caller then sends {"prompt": "...", "_models": {"pipe": "acme/my-finetune:prod#bf16"}}
to substitute. Class mismatch → request rejected before dispatch.
Top-level gen_worker exports only what endpoint authors need:
- Decorators + bindings:
inference_function,Resources,Repo,Dispatch,dispatch - Context types:
RequestContext,ConversionContext,DatasetContext,TrainingContext - Value types:
Asset,ImageAsset,VideoAsset,AudioAsset,MediaAsset,Tensors,Compute - Errors:
ValidationError,RetryableError,FatalError,ResourceError,AuthError,CanceledError,OutputTooLargeError,InputTooLargeError,WorkerError - Helpers:
Clamp,iter_transformers_text_deltas,load_loras,apply_low_vram_config,with_oom_retry - I/O codecs:
gen_worker.io(read_image,read_audio,write_image,read_bytes,open,exists)
Training and conversion live in their own submodules: gen_worker.trainer,
gen_worker.conversion, gen_worker.clone.
gen-worker run executes one endpoint method in the local Python
interpreter against a JSON payload — no docker-compose, no orchestrator.
pip install -e .
gen-worker run --payload '{"prompt": "hello"}'stdout for results, stderr for events; exit 0 / 1 / 2 / 3 / 130 for
success / user-exception / usage / model-resolution / SIGINT. Full
two-input model, the three CLI shapes (run / serve + invoke /
repl), ergonomic field=value args, --offline story, SIGINT
semantics, and worked examples in docs/local-dev.md.
The machine-readable host-integration contract (versioning, describe --json, the NDJSON protocol, the serve sidecar) lives in
docs/host-integration.md.
pytest lives in the dev optional-dependency extra, so the supported
command is:
uv run --extra dev pytestPlain uv run pytest would fall through to a global launcher — always pass
--extra dev. Never pip install gen-worker globally: a stale
~/.local install silently shadows the working tree (tests/conftest.py
hard-fails if gen_worker resolves outside src/).
- docs/endpoint-authoring.md — full reference: the
three layers,
Resources, bindings,dispatch,allow_override, multi-param injection, the_modelsenvelope, atomic substitution. - docs/local-dev.md —
gen-worker runCLI: two-input invocation model,--offlinestory, SIGINT semantics, exit codes, worked examples. - docs/endpoint-toml.md —
endpoint.tomlreference: build modes, placement fields, build hints,BASE_IMAGEinjection. - docs/dockerfile.md — when to provide your own
Dockerfile, the three Dockerfile contract points, when
ARG BASE_IMAGEmatters, multi-profile builds. - docs/scaling-hints.md —
Resourcescost-shape fields used by the orchestrator for admission and scheduling. - docs/endpoint-envs.md — tenant-defined envs/secrets attached to a deployed endpoint at runtime.
Working endpoints to copy from in examples/:
marco-polo/— minimal inference endpointtraining-smoke/— minimal trainerfrom-scratch/— boilerplate template