Problem
The Python API LightRAG.ainsert() supports split_by_character and split_by_character_only parameters for custom chunk splitting. However, the HTTP API endpoints /documents/text and /documents/texts in lightrag/api/document_routes.py do not expose these parameters — they are not included in the Pydantic request models and not passed through to ainsert().
This forces HTTP API users to rely solely on the built-in token-based chunker, even when they have pre-chunked content with a known separator.
Use Case
We pre-chunk documents with a semantic chunker (heading-aware, with breadcrumbs and atomic blocks) before sending to LightRAG. We join chunks with a unique separator and want LightRAG to split on it, preserving our chunk boundaries as-is.
Without split_by_character in the HTTP API, the only options are:
- Send each chunk as a separate document (
/documents/texts with N items) — creates N doc_ids per file, breaks deletion, deduplication, and doc_status tracking.
- Use the Python API directly — not possible when LightRAG runs as a separate service.
Proposed Change
Add split_by_character and split_by_character_only fields to InsertTextRequest and InsertTextsRequest in document_routes.py, and pass them through to rag.ainsert().
InsertTextRequest
class InsertTextRequest(BaseModel):
text: str
# ... existing fields ...
split_by_character: Optional[str] = Field(
default=None,
description="Character(s) to split the text on instead of token-based chunking",
)
split_by_character_only: bool = Field(
default=False,
description="If True, split only on split_by_character without token-based fallback",
)
Route handler
await rag.ainsert(
request.text,
split_by_character=request.split_by_character,
split_by_character_only=request.split_by_character_only,
)
Same for InsertTextsRequest / /documents/texts.
Notes
- Fully backward compatible — both fields are optional with defaults matching current behavior.
- We have a working patch in production (LightRAG v1.4.14) and can submit a PR.
Problem
The Python API
LightRAG.ainsert()supportssplit_by_characterandsplit_by_character_onlyparameters for custom chunk splitting. However, the HTTP API endpoints/documents/textand/documents/textsinlightrag/api/document_routes.pydo not expose these parameters — they are not included in the Pydantic request models and not passed through toainsert().This forces HTTP API users to rely solely on the built-in token-based chunker, even when they have pre-chunked content with a known separator.
Use Case
We pre-chunk documents with a semantic chunker (heading-aware, with breadcrumbs and atomic blocks) before sending to LightRAG. We join chunks with a unique separator and want LightRAG to split on it, preserving our chunk boundaries as-is.
Without
split_by_characterin the HTTP API, the only options are:/documents/textswith N items) — creates Ndoc_ids per file, breaks deletion, deduplication, anddoc_statustracking.Proposed Change
Add
split_by_characterandsplit_by_character_onlyfields toInsertTextRequestandInsertTextsRequestindocument_routes.py, and pass them through torag.ainsert().InsertTextRequest
Route handler
Same for
InsertTextsRequest//documents/texts.Notes