Provider-agnostic TypeScript client for Anthropic, OpenAI, and Google Gemini with shared message types, streaming, conversations, cost tracking, and pluggable session storage.
- One
LLMClientsurface for Anthropic, OpenAI, and Gemini - Canonical request/response types, including tools and multimodal parts
- OpenAI uses the stateless Responses API under the hood while library-owned conversation state remains the source of truth
defineTool()helper for typed tool definitions- Non-streaming and streaming completions with explicit
stream.cancel() - Conversation state with running token and cost totals
- Automatic tool execution in conversations, including streaming pause/execute/resume
- Context trimming via sliding window or summarisation strategies
- Session persistence with
InMemorySessionStore,PostgresSessionStore, andRedisSessionStore - Automatic Postgres session persistence when
DATABASE_URLis present - Built-in framework-agnostic Session API handler with
Request/Responseendpoints - Model routing, fallback chains, weighted A/B routing, and usage logging
- Live provider model discovery via
client.models.listRemote({ provider }) - Google Embedding 2 support through
client.embed() - OpenAI batch speech support through
client.speak()andclient.transcribe() - Optional retrieval helpers via
unified-llm-client/retrieval - Budget breach policies:
throw,warn, orskip - Usage aggregation export as JSON or CSV
- Edge-safe core imports with Node-only Postgres features loaded lazily
LLMClient.mock()for deterministic tests
Once this repo is on GitHub, install it in another project with:
pnpm add github:07rjain/LLMlibraryor:
pnpm add git+https://github.com/07rjain/LLMlibrary.gitThe package runs prepare during Git installs, so the consumer project gets a built dist output automatically.
pnpm installCreate a local environment file from the example:
cp .env.example .envThe library reads provider keys from environment variables when they are not passed directly.
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
OPENAI_ORG_ID=
OPENAI_PROJECT_ID=
GEMINI_API_KEY=
DATABASE_URL=If DATABASE_URL is set, LLMClient will automatically use PostgresSessionStore.fromEnv() for conversation() calls unless you pass an explicit sessionStore.
import { LLMClient } from 'unified-llm-client';
const client = LLMClient.fromEnv({
defaultModel: 'gpt-4o',
});
const response = await client.complete({
messages: [{ content: 'Say hello in one sentence.', role: 'user' }],
});
console.log(response.text);
console.log(response.usage.costUSD);Use response.usage.costUSD for arithmetic, alerts, and persistence. response.usage.cost is the pre-formatted display string.
import { LLMClient, SlidingWindowStrategy } from 'unified-llm-client';
const client = LLMClient.fromEnv({
defaultModel: 'gpt-4o',
});
const conversation = await client.conversation({
contextManager: new SlidingWindowStrategy({
maxMessages: 12,
maxTokens: 16_000,
}),
sessionId: 'customer-support-1',
system: 'You are concise and operational.',
});
await conversation.send('Summarise the last user issue.');
console.log(conversation.totals);
console.log(conversation.toMarkdown());const stream = client.stream({
messages: [{ content: 'Stream one sentence.', role: 'user' }],
});
for await (const chunk of stream) {
if (chunk.type === 'text-delta') {
process.stdout.write(chunk.delta);
}
}
// Or cancel explicitly if the caller navigates away.
stream.cancel(new Error('Request no longer needed.'));const csv = await client.exportUsage('csv', {
tenantId: 'tenant-1',
});
console.log(csv);Speech usage is tracked separately because the units are different from text tokens:
const speechCsv = await client.exportSpeechUsage('csv', {
tenantId: 'tenant-1',
});OpenAI batch text-to-speech and speech-to-text are available as explicit APIs:
const speech = await client.speak({
input: 'Your appointment is confirmed for 10 AM.',
model: 'gpt-4o-mini-tts',
voice: 'alloy',
format: 'mp3',
estimatedOutputSeconds: 4,
});
const transcript = await client.transcribe({
input: {
data: audioBase64,
filename: 'call.wav',
mediaType: 'audio/wav',
},
inputAudioSeconds: 42,
model: 'gpt-4o-mini-transcribe',
});
console.log(speech.audio); // Uint8Array
console.log(transcript.text);
console.log(speech.usage?.costUSD);Use usage.costUSD for arithmetic and billing. usage.cost is a formatted display string. Speech audio and transcripts are not stored by the library; keep storage and retention in your application layer.
SummarisationStrategy accepts a summarizer() callback. In production, point that callback at a cheaper model or internal summarisation service.
import { LLMClient, SummarisationStrategy } from 'unified-llm-client';
const client = LLMClient.fromEnv({
defaultModel: 'gpt-4o',
});
const conversation = await client.conversation({
contextManager: new SummarisationStrategy({
keepLastMessages: 2,
maxMessages: 10,
summarizer: async (messages) => {
const summary = await client.complete({
messages: [
{
content: `Summarise this conversation history:\n${JSON.stringify(messages)}`,
role: 'user',
},
],
model: 'gpt-4o-mini',
});
return summary.text;
},
}),
});import { LLMClient, PostgresSessionStore } from 'unified-llm-client';
const client = new LLMClient({
defaultModel: 'gpt-4o',
sessionStore: PostgresSessionStore.fromEnv(),
});RedisSessionStore is bring-your-own-client. Pass any Redis client that implements get(), set(), del(), and either scanIterator() or keys().
import { LLMClient, RedisSessionStore } from 'unified-llm-client';
const sessionStore = new RedisSessionStore({
client: redisClient,
ttlSeconds: 3600,
});
const client = new LLMClient({
defaultModel: 'gpt-4o',
sessionStore,
});The package also exports a framework-agnostic session API handler. It accepts standard Request objects and returns standard Response objects, so it can be mounted in Express, Fastify, Hono, Next.js route handlers, Cloudflare Workers, or plain Node HTTP adapters.
import { LLMClient, PostgresSessionStore, createSessionApi } from 'unified-llm-client';
const store = PostgresSessionStore.fromEnv();
const client = LLMClient.fromEnv({
defaultModel: 'gpt-4o',
sessionStore: store,
});
const sessionApi = createSessionApi({
client,
sessionStore: store,
});
const response = await sessionApi.handle(
new Request('https://example.test/sessions', {
body: JSON.stringify({ sessionId: 'demo-session', system: 'Be concise.' }),
headers: { 'content-type': 'application/json' },
method: 'POST',
}),
);Supported endpoints include:
POST /sessionsPOST /sessions/{id}/messageGET /sessions/{id}GET /sessions/{id}/messagesDELETE /sessions/{id}POST /sessions/{id}/compactPOST /sessions/{id}/forkGET /sessions
For the full endpoint contract and the OpenAI Responses-style mapping notes, see SESSION_API.md.
Use client.models.listRemote({ provider }) when you want the provider's current live model list instead of the checked-in local registry.
const googleModels = await client.models.listRemote({
provider: 'google',
});
console.log(googleModels[0]?.id);
console.log(googleModels[0]?.supportedActions);listRemote() is discovery-only. It does not auto-register those models into the local routing registry, so complete() and stream() still require either a known built-in model or a manual client.models.register(...) step.
Google Embedding 2 is the current embeddings surface for v1.
import { LLMClient } from 'unified-llm-client';
const client = LLMClient.fromEnv({
defaultEmbeddingModel: 'gemini-embedding-2',
});
const embedding = await client.embed({
input: 'Refunds are available for 30 days after purchase.',
purpose: 'retrieval_document',
providerOptions: {
google: {
title: 'Refund Policy',
},
},
});
console.log(embedding.embeddings[0]?.values.length);
console.log(embedding.usage?.inputTokens);client.embed() is separate from complete() and conversation(). Embedding and generation can use different providers in the same application flow.
The package also ships optional app-layer retrieval helpers. They do not hide retrieval inside LLMClient; they help you compose retrieval before generation.
import { LLMClient } from 'unified-llm-client';
import {
chunkText,
createDenseRetriever,
createInMemoryKnowledgeStore,
createPostgresKnowledgeStore,
formatRetrievedContext,
} from 'unified-llm-client/retrieval';
import { cleanText, stripHtml } from 'unified-llm-client/chunking';
const client = LLMClient.fromEnv({
defaultEmbeddingModel: 'gemini-embedding-2',
defaultModel: 'gpt-4o',
});
const knowledgeStore = createPostgresKnowledgeStore({
connectionString: process.env.DATABASE_URL,
});
await knowledgeStore.ensureSchema();
const retriever = createDenseRetriever({
embed: client,
embedding: {
model: 'gemini-embedding-2',
},
store: knowledgeStore,
});
const results = await retriever.search({
filter: {
botId: 'bot-1',
embeddingProfileId: 'profile-2026-04-24',
knowledgeSpaceId: 'kb-support',
tenantId: 'tenant-1',
},
query: 'What is the refund window?',
topK: 4,
});
const context = formatRetrievedContext(results, {
maxResults: 4,
maxTokens: 900,
});
const answer = await client.complete({
messages: [
{
content: `Question: What is the refund window?\n\n${context.text}`,
role: 'user',
},
],
});Chunking helpers are now available as a separate subpath:
const cleaned = cleanText(stripHtml('<h1>Refund Policy</h1><p>Refunds last 30 days.</p>'));
const chunks = chunkText(cleaned, {
chunkSize: 900,
overlap: 120,
});The retrieval module currently includes:
KnowledgeStoreRetrieverchunkText()cleanText()stripHtml()createDenseRetriever()createHybridRetriever()createInMemoryKnowledgeStore()createPostgresKnowledgeStore()InMemoryKnowledgeStorePostgresKnowledgeStorecreatePgvectorHnswIndexSql()mergeRetrievalCandidates()formatRetrievedContext()
createDenseRetriever() and createHybridRetriever() now also accept optional rerank hooks, and PostgresKnowledgeStore now exposes active-profile and reindex helpers such as activateEmbeddingProfile(), getActiveEmbeddingProfile(), listKnowledgeSources(), and markKnowledgeSourcesNeedingReindex().
activateEmbeddingProfile() now throws a clear runtime error when the target knowledge space is missing or when the embedding profile does not belong to that scoped space. It no longer fails silently on scope mismatches.
When you use PostgresKnowledgeStore, search requests must stay fully scoped. Pass tenantId, botId, knowledgeSpaceId, and embeddingProfileId, and use the same embedding profile for chunk ingestion and live query embedding. The retrieval helpers intentionally do not take over chunking, ingestion queues, provider-managed reranking services, or automatic retrieval inside complete() / conversation().
For local demos, tests, or single-process apps that do not need Postgres yet, you can swap in the in-memory store:
const knowledgeStore = createInMemoryKnowledgeStore();InMemoryKnowledgeStore keeps chunks and vectors in process memory, supports the same retriever-facing search interface, and mirrors the main upsert helpers. It is useful for local development and examples, but it is not durable and should not replace PostgresKnowledgeStore for production retrieval.
formatRetrievedContext() also supports explicit score display modes so users do not misread raw retrieval scores as probabilities:
const context = formatRetrievedContext(results, {
includeScores: true,
scoreDisplay: 'relative_top_1',
});scoreDisplay: 'raw'prints labels such asraw dense similarity,raw lexical relevance, orraw fused rank scorescoreDisplay: 'relative_top_1'prints a display-only score normalized against the top shown result and clearly marks it as not a probability
- Edge/browser-safe core surface:
LLMClient,Conversation, routing, in-memory storage, utilities, andSessionApi - Node-only persistence:
PostgresSessionStoreandPostgresUsageLogger - Runtime safety probe:
pnpm edgecheck
- OpenAI automatic prompt caching works on supported models, and request-side hints are exposed via
providerOptions.openai.promptCaching. - Anthropic block-level and top-level
cache_controlare exposed for cacheable content and tool definitions. - Gemini implicit caching benefits supported models automatically, and explicit cache usage is exposed via
providerOptions.google.promptCaching.cachedContentplusclient.googleCaches. - Implementation planning lives in docs/PROMPT_CACHING_REPORT.md and the active task tracker lives in prompt_caching_todo.md.
OpenAI request hints:
const openaiResponse = await client.complete({
model: 'gpt-4o',
messages: [{ content: 'Summarize the support FAQ.', role: 'user' }],
providerOptions: {
openai: {
promptCaching: {
key: 'support-faq-v1',
retention: '24h',
},
},
},
});Anthropic block and tool cache control:
const anthropicResponse = await client.complete({
model: 'claude-sonnet-4-6',
messages: [
{
role: 'user',
content: [
{
type: 'document',
url: 'https://example.com/policy.pdf',
mediaType: 'application/pdf',
cacheControl: { type: 'ephemeral', ttl: '1h' },
},
{
type: 'text',
text: 'Answer using the cached policy document.',
},
],
},
],
providerOptions: {
anthropic: {
cacheControl: { type: 'ephemeral' },
},
},
tools: [
{
name: 'lookup_policy',
description: 'Look up policy details',
cacheControl: { type: 'ephemeral' },
parameters: {
type: 'object',
properties: {
topic: { type: 'string' },
},
required: ['topic'],
},
},
],
});Gemini explicit cache lifecycle plus reuse:
const cache = await client.googleCaches.create({
model: 'gemini-2.5-flash',
displayName: 'Support FAQ',
messages: [{ content: 'Refunds are available for 30 days.', role: 'user' }],
ttl: '3600s',
});
const geminiResponse = await client.complete({
model: 'gemini-2.5-flash',
messages: [{ content: 'What is the refund window?', role: 'user' }],
providerOptions: {
google: {
promptCaching: {
cachedContent: cache.name,
},
},
},
});Gemini cache names are returned in the provider format cachedContents/{id} and can be passed back directly as cachedContent. Cache creation accepts the normal library model id such as gemini-2.5-flash; the Gemini adapter normalizes it to models/{model} for the cache API. Per-request generation cost includes cached-read discounts when cachedContentTokenCount is returned, but it does not include cache creation or persistence cost.
- Documentation website:
https://07rjain.github.io/LLMlibrary/ - User guide hub: docs/README.md
- Getting started: docs/GETTING_STARTED.md
- Completions and streaming: docs/COMPLETIONS_AND_STREAMING.md
- Conversations and tools: docs/CONVERSATIONS_AND_TOOLS.md
- Persistence and Session API: docs/PERSISTENCE_AND_SESSION_API.md
- Production guide: docs/PRODUCTION_GUIDE.md
- Docs local dev server:
pnpm docs:dev - Docs production build:
pnpm docs:build - API reference source:
pnpm docs:api - Session API contract: SESSION_API.md
- PRD decisions: docs/PRD_DECISIONS.md
- Provider comparison: docs/PROVIDER_COMPARISON.md
- Speech API research report: docs/SPEECH_API_RESEARCH_REPORT.md
- Prompt caching report: docs/PROMPT_CACHING_REPORT.md
- Prompt caching task tracker: prompt_caching_todo.md
- OpenAI Responses migration report: docs/OPENAI_RESPONSES_MIGRATION_REPORT.md
- Migration guide: docs/MIGRATION_GUIDE.md
- Cost and pricing policy: docs/COST_AND_PRICING.md
- Roadmap: docs/ROADMAP.md
- Current project state: PROJECT_STATUS.md
- Validation handoff notes: TEST_AGENT_HANDOFF.md
pnpm sizecheck
pnpm depcheck
pnpm edgecheck
pnpm bench:complete
pnpm bench:first-token
pnpm bench:memory
pnpm bench:concurrency
pnpm pricecheckOptional live-provider smoke tests stay opt-in:
LIVE_TESTS=1 pnpm test:live
LIVE_TESTS=1 pnpm test:embeddings:live
pnpm test:prompt-caching:livepnpm typecheck
pnpm lint
pnpm test
pnpm build