Unified LLM Client

Provider-agnostic TypeScript client for Anthropic, OpenAI, and Google Gemini with shared message types, streaming, conversations, cost tracking, and pluggable session storage.

Features

One LLMClient surface for Anthropic, OpenAI, and Gemini
Canonical request/response types, including tools and multimodal parts
OpenAI uses the stateless Responses API under the hood while library-owned conversation state remains the source of truth
defineTool() helper for typed tool definitions
Non-streaming and streaming completions with explicit stream.cancel()
Conversation state with running token and cost totals
Automatic tool execution in conversations, including streaming pause/execute/resume
Context trimming via sliding window or summarisation strategies
Session persistence with InMemorySessionStore, PostgresSessionStore, and RedisSessionStore
Automatic Postgres session persistence when DATABASE_URL is present
Built-in framework-agnostic Session API handler with Request/Response endpoints
Model routing, fallback chains, weighted A/B routing, and usage logging
Live provider model discovery via client.models.listRemote({ provider })
Google Embedding 2 support through client.embed()
OpenAI batch speech support through client.speak() and client.transcribe()
Optional retrieval helpers via unified-llm-client/retrieval
Budget breach policies: throw, warn, or skip
Usage aggregation export as JSON or CSV
Edge-safe core imports with Node-only Postgres features loaded lazily
LLMClient.mock() for deterministic tests

Install

Use As A Library From GitHub

Once this repo is on GitHub, install it in another project with:

pnpm add github:07rjain/LLMlibrary

or:

pnpm add git+https://github.com/07rjain/LLMlibrary.git

The package runs prepare during Git installs, so the consumer project gets a built dist output automatically.

Develop Locally

pnpm install

Create a local environment file from the example:

cp .env.example .env

Environment

The library reads provider keys from environment variables when they are not passed directly.

ANTHROPIC_API_KEY=
OPENAI_API_KEY=
OPENAI_ORG_ID=
OPENAI_PROJECT_ID=
GEMINI_API_KEY=
DATABASE_URL=

If DATABASE_URL is set, LLMClient will automatically use PostgresSessionStore.fromEnv() for conversation() calls unless you pass an explicit sessionStore.

Quick Start

import { LLMClient } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

const response = await client.complete({
  messages: [{ content: 'Say hello in one sentence.', role: 'user' }],
});

console.log(response.text);
console.log(response.usage.costUSD);

Use response.usage.costUSD for arithmetic, alerts, and persistence. response.usage.cost is the pre-formatted display string.

Conversations

import { LLMClient, SlidingWindowStrategy } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

const conversation = await client.conversation({
  contextManager: new SlidingWindowStrategy({
    maxMessages: 12,
    maxTokens: 16_000,
  }),
  sessionId: 'customer-support-1',
  system: 'You are concise and operational.',
});

await conversation.send('Summarise the last user issue.');
console.log(conversation.totals);
console.log(conversation.toMarkdown());

Streaming

const stream = client.stream({
  messages: [{ content: 'Stream one sentence.', role: 'user' }],
});

for await (const chunk of stream) {
  if (chunk.type === 'text-delta') {
    process.stdout.write(chunk.delta);
  }
}

// Or cancel explicitly if the caller navigates away.
stream.cancel(new Error('Request no longer needed.'));

Usage Export

const csv = await client.exportUsage('csv', {
  tenantId: 'tenant-1',
});

console.log(csv);

Speech usage is tracked separately because the units are different from text tokens:

const speechCsv = await client.exportSpeechUsage('csv', {
  tenantId: 'tenant-1',
});

Speech

OpenAI batch text-to-speech and speech-to-text are available as explicit APIs:

const speech = await client.speak({
  input: 'Your appointment is confirmed for 10 AM.',
  model: 'gpt-4o-mini-tts',
  voice: 'alloy',
  format: 'mp3',
  estimatedOutputSeconds: 4,
});

const transcript = await client.transcribe({
  input: {
    data: audioBase64,
    filename: 'call.wav',
    mediaType: 'audio/wav',
  },
  inputAudioSeconds: 42,
  model: 'gpt-4o-mini-transcribe',
});

console.log(speech.audio); // Uint8Array
console.log(transcript.text);
console.log(speech.usage?.costUSD);

Use usage.costUSD for arithmetic and billing. usage.cost is a formatted display string. Speech audio and transcripts are not stored by the library; keep storage and retention in your application layer.

Summarisation Strategy

SummarisationStrategy accepts a summarizer() callback. In production, point that callback at a cheaper model or internal summarisation service.

import { LLMClient, SummarisationStrategy } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

const conversation = await client.conversation({
  contextManager: new SummarisationStrategy({
    keepLastMessages: 2,
    maxMessages: 10,
    summarizer: async (messages) => {
      const summary = await client.complete({
        messages: [
          {
            content: `Summarise this conversation history:\n${JSON.stringify(messages)}`,
            role: 'user',
          },
        ],
        model: 'gpt-4o-mini',
      });

      return summary.text;
    },
  }),
});

Session Stores

Postgres

import { LLMClient, PostgresSessionStore } from 'unified-llm-client';

const client = new LLMClient({
  defaultModel: 'gpt-4o',
  sessionStore: PostgresSessionStore.fromEnv(),
});

Redis

RedisSessionStore is bring-your-own-client. Pass any Redis client that implements get(), set(), del(), and either scanIterator() or keys().

import { LLMClient, RedisSessionStore } from 'unified-llm-client';

const sessionStore = new RedisSessionStore({
  client: redisClient,
  ttlSeconds: 3600,
});

const client = new LLMClient({
  defaultModel: 'gpt-4o',
  sessionStore,
});

Session API

The package also exports a framework-agnostic session API handler. It accepts standard Request objects and returns standard Response objects, so it can be mounted in Express, Fastify, Hono, Next.js route handlers, Cloudflare Workers, or plain Node HTTP adapters.

import { LLMClient, PostgresSessionStore, createSessionApi } from 'unified-llm-client';

const store = PostgresSessionStore.fromEnv();
const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
  sessionStore: store,
});

const sessionApi = createSessionApi({
  client,
  sessionStore: store,
});

const response = await sessionApi.handle(
  new Request('https://example.test/sessions', {
    body: JSON.stringify({ sessionId: 'demo-session', system: 'Be concise.' }),
    headers: { 'content-type': 'application/json' },
    method: 'POST',
  }),
);

Supported endpoints include:

POST /sessions
POST /sessions/{id}/message
GET /sessions/{id}
GET /sessions/{id}/messages
DELETE /sessions/{id}
POST /sessions/{id}/compact
POST /sessions/{id}/fork
GET /sessions

For the full endpoint contract and the OpenAI Responses-style mapping notes, see SESSION_API.md.

Remote Model Discovery

Use client.models.listRemote({ provider }) when you want the provider's current live model list instead of the checked-in local registry.

const googleModels = await client.models.listRemote({
  provider: 'google',
});

console.log(googleModels[0]?.id);
console.log(googleModels[0]?.supportedActions);

listRemote() is discovery-only. It does not auto-register those models into the local routing registry, so complete() and stream() still require either a known built-in model or a manual client.models.register(...) step.

Embeddings

Google Embedding 2 is the current embeddings surface for v1.

import { LLMClient } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultEmbeddingModel: 'gemini-embedding-2',
});

const embedding = await client.embed({
  input: 'Refunds are available for 30 days after purchase.',
  purpose: 'retrieval_document',
  providerOptions: {
    google: {
      title: 'Refund Policy',
    },
  },
});

console.log(embedding.embeddings[0]?.values.length);
console.log(embedding.usage?.inputTokens);

client.embed() is separate from complete() and conversation(). Embedding and generation can use different providers in the same application flow.

Retrieval Helpers

The package also ships optional app-layer retrieval helpers. They do not hide retrieval inside LLMClient; they help you compose retrieval before generation.

import { LLMClient } from 'unified-llm-client';
import {
  chunkText,
  createDenseRetriever,
  createInMemoryKnowledgeStore,
  createPostgresKnowledgeStore,
  formatRetrievedContext,
} from 'unified-llm-client/retrieval';
import { cleanText, stripHtml } from 'unified-llm-client/chunking';

const client = LLMClient.fromEnv({
  defaultEmbeddingModel: 'gemini-embedding-2',
  defaultModel: 'gpt-4o',
});

const knowledgeStore = createPostgresKnowledgeStore({
  connectionString: process.env.DATABASE_URL,
});

await knowledgeStore.ensureSchema();

const retriever = createDenseRetriever({
  embed: client,
  embedding: {
    model: 'gemini-embedding-2',
  },
  store: knowledgeStore,
});

const results = await retriever.search({
  filter: {
    botId: 'bot-1',
    embeddingProfileId: 'profile-2026-04-24',
    knowledgeSpaceId: 'kb-support',
    tenantId: 'tenant-1',
  },
  query: 'What is the refund window?',
  topK: 4,
});

const context = formatRetrievedContext(results, {
  maxResults: 4,
  maxTokens: 900,
});

const answer = await client.complete({
  messages: [
    {
      content: `Question: What is the refund window?\n\n${context.text}`,
      role: 'user',
    },
  ],
});

Chunking helpers are now available as a separate subpath:

const cleaned = cleanText(stripHtml('<h1>Refund Policy</h1><p>Refunds last 30 days.</p>'));
const chunks = chunkText(cleaned, {
  chunkSize: 900,
  overlap: 120,
});

The retrieval module currently includes:

KnowledgeStore
Retriever
chunkText()
cleanText()
stripHtml()
createDenseRetriever()
createHybridRetriever()
createInMemoryKnowledgeStore()
createPostgresKnowledgeStore()
InMemoryKnowledgeStore
PostgresKnowledgeStore
createPgvectorHnswIndexSql()
mergeRetrievalCandidates()
formatRetrievedContext()

createDenseRetriever() and createHybridRetriever() now also accept optional rerank hooks, and PostgresKnowledgeStore now exposes active-profile and reindex helpers such as activateEmbeddingProfile(), getActiveEmbeddingProfile(), listKnowledgeSources(), and markKnowledgeSourcesNeedingReindex().

activateEmbeddingProfile() now throws a clear runtime error when the target knowledge space is missing or when the embedding profile does not belong to that scoped space. It no longer fails silently on scope mismatches.

When you use PostgresKnowledgeStore, search requests must stay fully scoped. Pass tenantId, botId, knowledgeSpaceId, and embeddingProfileId, and use the same embedding profile for chunk ingestion and live query embedding. The retrieval helpers intentionally do not take over chunking, ingestion queues, provider-managed reranking services, or automatic retrieval inside complete() / conversation().

For local demos, tests, or single-process apps that do not need Postgres yet, you can swap in the in-memory store:

const knowledgeStore = createInMemoryKnowledgeStore();

InMemoryKnowledgeStore keeps chunks and vectors in process memory, supports the same retriever-facing search interface, and mirrors the main upsert helpers. It is useful for local development and examples, but it is not durable and should not replace PostgresKnowledgeStore for production retrieval.

formatRetrievedContext() also supports explicit score display modes so users do not misread raw retrieval scores as probabilities:

const context = formatRetrievedContext(results, {
  includeScores: true,
  scoreDisplay: 'relative_top_1',
});

scoreDisplay: 'raw' prints labels such as raw dense similarity, raw lexical relevance, or raw fused rank score
scoreDisplay: 'relative_top_1' prints a display-only score normalized against the top shown result and clearly marks it as not a probability

Runtime Support

Edge/browser-safe core surface: LLMClient, Conversation, routing, in-memory storage, utilities, and SessionApi
Node-only persistence: PostgresSessionStore and PostgresUsageLogger
Runtime safety probe: pnpm edgecheck

Prompt Caching Status

OpenAI automatic prompt caching works on supported models, and request-side hints are exposed via providerOptions.openai.promptCaching.
Anthropic block-level and top-level cache_control are exposed for cacheable content and tool definitions.
Gemini implicit caching benefits supported models automatically, and explicit cache usage is exposed via providerOptions.google.promptCaching.cachedContent plus client.googleCaches.
Implementation planning lives in docs/PROMPT_CACHING_REPORT.md and the active task tracker lives in prompt_caching_todo.md.

Prompt Caching Examples

OpenAI request hints:

const openaiResponse = await client.complete({
  model: 'gpt-4o',
  messages: [{ content: 'Summarize the support FAQ.', role: 'user' }],
  providerOptions: {
    openai: {
      promptCaching: {
        key: 'support-faq-v1',
        retention: '24h',
      },
    },
  },
});

Anthropic block and tool cache control:

const anthropicResponse = await client.complete({
  model: 'claude-sonnet-4-6',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'document',
          url: 'https://example.com/policy.pdf',
          mediaType: 'application/pdf',
          cacheControl: { type: 'ephemeral', ttl: '1h' },
        },
        {
          type: 'text',
          text: 'Answer using the cached policy document.',
        },
      ],
    },
  ],
  providerOptions: {
    anthropic: {
      cacheControl: { type: 'ephemeral' },
    },
  },
  tools: [
    {
      name: 'lookup_policy',
      description: 'Look up policy details',
      cacheControl: { type: 'ephemeral' },
      parameters: {
        type: 'object',
        properties: {
          topic: { type: 'string' },
        },
        required: ['topic'],
      },
    },
  ],
});

Gemini explicit cache lifecycle plus reuse:

const cache = await client.googleCaches.create({
  model: 'gemini-2.5-flash',
  displayName: 'Support FAQ',
  messages: [{ content: 'Refunds are available for 30 days.', role: 'user' }],
  ttl: '3600s',
});

const geminiResponse = await client.complete({
  model: 'gemini-2.5-flash',
  messages: [{ content: 'What is the refund window?', role: 'user' }],
  providerOptions: {
    google: {
      promptCaching: {
        cachedContent: cache.name,
      },
    },
  },
});

Gemini cache names are returned in the provider format cachedContents/{id} and can be passed back directly as cachedContent. Cache creation accepts the normal library model id such as gemini-2.5-flash; the Gemini adapter normalizes it to models/{model} for the cache API. Per-request generation cost includes cached-read discounts when cachedContentTokenCount is returned, but it does not include cache creation or persistence cost.

Docs

Documentation website: https://07rjain.github.io/LLMlibrary/
User guide hub: docs/README.md
Getting started: docs/GETTING_STARTED.md
Completions and streaming: docs/COMPLETIONS_AND_STREAMING.md
Conversations and tools: docs/CONVERSATIONS_AND_TOOLS.md
Persistence and Session API: docs/PERSISTENCE_AND_SESSION_API.md
Production guide: docs/PRODUCTION_GUIDE.md
Docs local dev server: pnpm docs:dev
Docs production build: pnpm docs:build
API reference source: pnpm docs:api
Session API contract: SESSION_API.md
PRD decisions: docs/PRD_DECISIONS.md
Provider comparison: docs/PROVIDER_COMPARISON.md
Speech API research report: docs/SPEECH_API_RESEARCH_REPORT.md
Prompt caching report: docs/PROMPT_CACHING_REPORT.md
Prompt caching task tracker: prompt_caching_todo.md
OpenAI Responses migration report: docs/OPENAI_RESPONSES_MIGRATION_REPORT.md
Migration guide: docs/MIGRATION_GUIDE.md
Cost and pricing policy: docs/COST_AND_PRICING.md
Roadmap: docs/ROADMAP.md
Current project state: PROJECT_STATUS.md
Validation handoff notes: TEST_AGENT_HANDOFF.md

Quality And Performance

pnpm sizecheck
pnpm depcheck
pnpm edgecheck
pnpm bench:complete
pnpm bench:first-token
pnpm bench:memory
pnpm bench:concurrency
pnpm pricecheck

Optional live-provider smoke tests stay opt-in:

LIVE_TESTS=1 pnpm test:live
LIVE_TESTS=1 pnpm test:embeddings:live
pnpm test:prompt-caching:live

Testing

pnpm typecheck
pnpm lint
pnpm test
pnpm build

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
Test_Droid		Test_Droid
dist-types		dist-types
docs		docs
scripts		scripts
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LLM_API_Reference.md		LLM_API_Reference.md
LLM_Client_Library_PRD.docx		LLM_Client_Library_PRD.docx
LLM_Library_Tasks.docx		LLM_Library_Tasks.docx
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
SESSION_API.md		SESSION_API.md
TEST_AGENT_HANDOFF.md		TEST_AGENT_HANDOFF.md
chatbot_widget_PRD.docx		chatbot_widget_PRD.docx
chatbot_widget_report.docx		chatbot_widget_report.docx
embeddings_todo.md		embeddings_todo.md
eslint.config.mjs		eslint.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
prettier.config.mjs		prettier.config.mjs
prompt_caching_todo.md		prompt_caching_todo.md
response_api_report.md		response_api_report.md
todo.md		todo.md
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
tsconfig.src.json		tsconfig.src.json
tsconfig.test.json		tsconfig.test.json
tsup.config.ts		tsup.config.ts
typedoc.json		typedoc.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified LLM Client

Features

Install

Use As A Library From GitHub

Develop Locally

Environment

Quick Start

Conversations

Streaming

Usage Export

Speech

Summarisation Strategy

Session Stores

Postgres

Redis

Session API

Remote Model Discovery

Embeddings

Retrieval Helpers

Runtime Support

Prompt Caching Status

Prompt Caching Examples

Docs

Quality And Performance

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unified LLM Client

Features

Install

Use As A Library From GitHub

Develop Locally

Environment

Quick Start

Conversations

Streaming

Usage Export

Speech

Summarisation Strategy

Session Stores

Postgres

Redis

Session API

Remote Model Discovery

Embeddings

Retrieval Helpers

Runtime Support

Prompt Caching Status

Prompt Caching Examples

Docs

Quality And Performance

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages