Incipit

"here begins" — a BMAD-style mega-prompt wizard. It takes a rough software idea through a compressed elicitation flow and assembles one structured spec (a "mega-prompt") you paste into a coding agent. Instead of hoping you remembered to specify everything, Incipit interrogates the idea for you, QA's the result, and helps you converge it before you ship it to the agent.

Bring your own model. Talks to any OpenAI-compatible /v1 endpoint — Ollama, LM Studio, llama.cpp llama-server, vLLM, or OpenAI itself.
UI: FastAPI + HTMX, server-rendered, no build step. Set your endpoint and model from the in-app ⚙ Model settings panel (or via env).
Tested with qwen3.6:35b served over an OpenAI-compatible endpoint (llama-swap); any reasonably capable instruct model works.

Quickstart (bring your own model)

git clone <this-repo> promptgen && cd promptgen
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8911

Open http://localhost:8911, click ⚙ Model settings, set your endpoint and model, then start dumping your idea. Defaults assume a local Ollama at http://localhost:11434/v1 — click Test / list models in the panel to pull the list of models your endpoint exposes.

Example endpoints (set the base URL in the settings panel):

Runtime	Base URL	API key
Ollama	`http://localhost:11434/v1`	—
LM Studio	`http://localhost:1234/v1`	—
llama.cpp `llama-server`	`http://localhost:8080/v1`	—
llama-swap (tested: `qwen3.6:35b`)	`http://<host>:<port>/v1`	bearer
OpenAI	`https://api.openai.com/v1`	required

Runtime endpoint changes are restricted to localhost, api.openai.com, and the host from PROMPTGEN_OPENAI_BASE_URL by default. For another trusted host, set PROMPTGEN_ALLOWED_BASE_URL_HOSTS=host.example.com before starting the app.

"Disable thinking" toggle: local reasoning models (Qwen, etc.) can burn the whole token budget on a hidden think channel and return empty content. Turning this on sends chat_template_kwargs.enable_thinking=false. Leave it off for OpenAI and most hosted APIs — they reject the parameter.

The flow

The wizard is fully async — each step kicks off a background generation and streams progress over SSE while you keep interacting. The final mega-prompt is assembled deterministically (string concat, no LLM call) and is downloadable as Markdown.

Brain dump — describe what you want to build and why. Pick only whether it's a new project or an existing codebase (paste a repo link and Incipit folds the README + structure into drafting). The form factor / platform is inferred from your idea, not asked — so a stale dropdown choice can't contradict the spec.
Clarify — the model asks the handful of questions that actually matter, each with an [ASSUMPTION] default. Answer what you care about; blanks fall back to the assumption.
Draft — six spec sections are drafted sequentially (each sees the prior done sections):
1. Goals & Background
2. Functional Requirements
3. Non-Functional Requirements
4. Tech Constraints & Stack
5. Acceptance Criteria
6. Out of Scope
Click any section to hand-edit it, or use the per-section Refine menu to have the model redo it (critique, identify risks, expand, simplify).
🎉 Party review (optional) — convene a BMAD-style round table of personas
- a facilitator that debate the spec (or, at step 2, your clarifying questions) to consensus and propose changes you approve or deny.
Finish — copy or download the assembled mega-prompt.

An automated QA / fix / verify pass is being reworked on the qa-flow branch and is intentionally not part of this flow right now.

🌙 Shoot the Moon runs the whole thing hands-off from just the idea: it infers the platform and details, takes the generated assumptions as answers, drafts the full spec, convenes the round table, and applies the consensus automatically. You can still edit and refine afterward.

To change what the spec contains, edit app/wizard/sections.yaml (section list

order); the per-section refine menu is app/wizard/elicitation.yaml; prompt wording lives in app/wizard/prompts/*.md.j2.

Configuration

All config is environment variables (PROMPTGEN_*) — see .env.example. A local .env is auto-loaded if present. Anything you save in the ⚙ Model settings panel is written to .promptgen.json (gitignored) and takes precedence on the next run, so you configure your endpoint once.

There is no authentication — run it on localhost or a trusted network only.

There's no test suite or build step for the app itself. For a fast dev loop, point it at any running endpoint and run uvicorn as above.

Advanced: DiffusionGemma backend (GPU)

Incipit was originally built around DiffusionGemma 26B-A4B-it run through llama-diffusion-cli (llama.cpp PR #24423, which has no HTTP server yet — the app drives a persistent -cnv subprocess over stdin/stdout). This path requires building llama.cpp from a pinned PR and a GPU, and is selected with PROMPTGEN_BACKEND=diffusion-cnv (or diffusion-oneshot). It is not needed for the OpenAI-compatible path above.

# model (one-time, ~16G):
hf download unsloth/diffusiongemma-26B-A4B-it-GGUF diffusiongemma-26B-A4B-it-Q4_K_M.gguf \
  --local-dir /path/to/models/diffusiongemma-26B-A4B-it-GGUF

# build the image (compiles llama-diffusion-cli from the pinned PR; CUDA, sm_120):
podman build -t localhost/promptgen:v3 .

Backends (PROMPTGEN_BACKEND):

Value	What
`openai` (default)	any OpenAI-compatible endpoint
`diffusion-cnv`	persistent `llama-diffusion-cli -cnv` subprocess
`diffusion-oneshot`	one CLI process per call (model reload each call)

See docs/multi-gpu-diffusiongemma.md for the multi-GPU writeup, and the Containerfile header / patches/ for the pinned-PR build and carried patches. The diffusion CLI protocol the -cnv backend depends on (turn marker, one line per turn, /clear between calls) is documented in app/llm/diffusion_cnv.py; re-pin the PR SHA deliberately.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.impeccable		.impeccable
.opencodereview		.opencodereview
app		app
docs		docs
patches		patches
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Containerfile		Containerfile
DESIGN.md		DESIGN.md
LICENSE		LICENSE
PRODUCT.md		PRODUCT.md
README.md		README.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incipit

Quickstart (bring your own model)

The flow

Configuration

Advanced: DiffusionGemma backend (GPU)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Incipit

Quickstart (bring your own model)

The flow

Configuration

Advanced: DiffusionGemma backend (GPU)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages