Provider-agnostic audio transcription + summarization for Bun and Node.js. Auto free-to-paid fallback, persistent storage (local / SQLite / S3), and strong Arabic support out of the box.
- Requirements
- Install
- Quick Start
- Audio Sources
- Transcription Providers
- Summarization Providers
- Storage Backends
- Logging & Diagnostics
- What Gets Stored
- HTTP Integration (Hono)
- Environment Variables
- Node.js Compatibility
- CI/CD — Publish to npm
| Runtime | Minimum version | Notes |
|---|---|---|
| Bun | 1.0.0+ | Recommended. SQLite built-in. |
| Node.js | 18.0.0+ | Requires native fetch. SQLite needs better-sqlite3. |
# Bun
bun add future-audio-intelligence
# Node.js
npm install future-audio-intelligenceInstall only what you need:
# S3 storage backend
bun add @aws-sdk/client-s3
# SQLite on Node.js (Bun has built-in bun:sqlite — no package needed)
npm install better-sqlite3import {
AudioIntelligencePipeline,
AutoFreeToPaidTranscriber,
AutoFreeToPaidSummarizer,
createStorageFromEnv,
} from 'future-audio-intelligence';
const pipeline = new AudioIntelligencePipeline({
transcriber: new AutoFreeToPaidTranscriber({
deepgramApiKey: process.env.DEEPGRAM_API_KEY, // free first
openaiApiKey: process.env.OPENAI_API_KEY, // paid fallback
}),
summarizer: new AutoFreeToPaidSummarizer({
openrouterApiKey: process.env.OPENROUTER_API_KEY, // free first
geminiApiKey: process.env.GEMINI_API_KEY, // free fallback
openaiApiKey: process.env.OPENAI_API_KEY, // paid last resort
}),
storage: createStorageFromEnv(), // reads AUDIO_STORAGE env var
});
const { transcription, summary, saved } = await pipeline.process(
{ type: 'path', path: './meeting.mp3' },
{
transcribe: { language: 'ar' },
summarize: { style: 'bullets' },
},
);
console.log(transcription.text);
console.log(transcription.usage); // { durationSec: 139.4 }
console.log(summary?.summary);
console.log(summary?.usage); // { inputTokens: 221, outputTokens: 60 }
console.log(saved?.id); // uuid-v4
console.log(saved?.location); // outputs/uuid.jsonFour source types are supported — providers handle conversion internally:
// Local file
{ type: 'path', path: './audio.mp3' }
// Remote URL (Deepgram fetches it directly — no upload from your server)
{ type: 'url', url: 'https://example.com/audio.mp3' }
// In-memory Buffer (e.g. from a file upload)
{ type: 'buffer', data: buffer, filename: 'audio.mp3', mimeType: 'audio/mpeg' }
// Blob (e.g. from browser File input)
{ type: 'blob', data: blob }| Provider | Env var | Billing | Limit | Notes |
|---|---|---|---|---|
| Deepgram Nova-3 | DEEPGRAM_API_KEY |
Per second | — | Free: 200h/month. Best Arabic dialect support |
| OpenAI Whisper | OPENAI_API_KEY |
Per minute | 25 MB | Paid. whisper-1 / gpt-4o-transcribe |
| Local Whisper | — | Free | — | Requires Whisper binary + model file on disk |
Tries providers in order. Falls back automatically on error or empty transcript:
new AutoFreeToPaidTranscriber({
deepgramApiKey: process.env.DEEPGRAM_API_KEY, // tried first
openaiApiKey: process.env.OPENAI_API_KEY, // fallback if Deepgram fails
// deepgramModel: 'nova-3', // optional override
// openaiModel: 'whisper-1',
})import { DeepgramTranscriber, OpenAITranscriber } from 'future-audio-intelligence';
new DeepgramTranscriber({ apiKey: '...', defaultModel: 'nova-3' })
new OpenAITranscriber({ apiKey: '...', defaultModel: 'whisper-1' })transcribe: {
language: 'ar', // ISO 639-1 code, or 'auto' to detect
diarize: true, // speaker separation (Deepgram)
wordTimestamps: true, // word-level timestamps
vocabulary: ['اسم خاص', 'مصطلح'], // boost domain terms
}| Provider | Env var | Billing | Default model | Notes |
|---|---|---|---|---|
| OpenRouter | OPENROUTER_API_KEY |
Free models available | llama-3.3-70b-instruct:free |
Gateway to 300+ models |
| Gemini | GEMINI_API_KEY |
Free tier | gemini-2.5-flash |
1M token context |
| OpenAI | OPENAI_API_KEY |
Paid | gpt-4o-mini |
128K context |
Tries providers in order. Falls back on error or empty response:
new AutoFreeToPaidSummarizer({
openrouterApiKey: process.env.OPENROUTER_API_KEY, // free first
geminiApiKey: process.env.GEMINI_API_KEY, // free fallback
openaiApiKey: process.env.OPENAI_API_KEY, // paid last resort
// openrouterModel: 'meta-llama/llama-3.3-70b-instruct:free',
// geminiModel: 'gemini-2.5-flash',
// openaiModel: 'gpt-4o-mini',
})import { OpenRouterSummarizer, GeminiSummarizer, OpenAISummarizer } from 'future-audio-intelligence';
new OpenRouterSummarizer({ apiKey: '...', defaultModel: 'anthropic/claude-sonnet-4' })
new GeminiSummarizer({ apiKey: '...', defaultModel: 'gemini-2.5-pro' })
new OpenAISummarizer({ apiKey: '...', defaultModel: 'gpt-4o' })summarize: { style: 'brief' } // 2–3 sentences
summarize: { style: 'bullets' } // bullet points
summarize: { style: 'detailed' } // multi-paragraph
summarize: { style: 'action_items' } // decisions + TODOs (structured JSON)
summarize: { style: 'meeting_notes' } // structured meeting format
summarize: {
style: 'custom',
customPrompt: 'Summarize this in Egyptian Arabic with 3 bullet points.',
outputLanguage: 'Egyptian Arabic',
maxTokens: 500,
}One JSON file per operation: {outputDir}/{id}.json
AUDIO_STORAGE=local
AUDIO_OUTPUT_DIR=./outputs # optional, default: ./outputsSingle queryable database file. Auto-detects runtime:
- Bun → uses built-in
bun:sqlite(no package needed) - Node.js → requires
better-sqlite3:
npm install better-sqlite3 # Node.js onlyAUDIO_STORAGE=sqlite
SQLITE_PATH=./audio-intelligence.db # optionalRequires @aws-sdk/client-s3:
bun add @aws-sdk/client-s3
# or
npm install @aws-sdk/client-s3AUDIO_STORAGE=s3
S3_BUCKET=my-bucket
S3_REGION=us-east-1
S3_PREFIX=audio-intelligence/ # optional key prefix
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
# Or use an IAM role — credentials are optional if role is attachedEach operation is stored under: s3://{bucket}/{prefix}{id}/result.json
const store = createStorageFromEnv();
// Get a single record by UUID
const record = await store.getById('550e8400-e29b-41d4-a716-446655440000');
if (record) {
console.log(record.transcription.text);
console.log(record.summary?.summary);
}
// List recent operations (most recent first)
const history = await store.list({ limit: 20, offset: 0 });import {
LocalOutputStore,
SQLiteOutputStore,
S3OutputStore,
} from 'future-audio-intelligence';
const store = new SQLiteOutputStore({ dbPath: './my.db' });
const store = new LocalOutputStore({ outputDir: './data' });
const store = new S3OutputStore({ bucket: 'my-bucket', region: 'us-east-1' });AutoFreeToPaidTranscriber and AutoFreeToPaidSummarizer log every attempt and fallback to stderr by default:
[16:46:15.641] [transcriber] starting — source=path, lang=ar, providers=deepgram → openai
[16:46:15.652] [transcriber] ┌─ attempt 1/2 → deepgram
[16:46:22.618] [transcriber] └─ ✓ success (1/2): deepgram | model=nova-3 | lang=ar | chars=498 | 6966ms
[16:46:22.619] [summarizer] starting — 498 chars, style=bullets, providers=openrouter → gemini → openai
[16:46:22.619] [summarizer] ┌─ attempt 1/3 → openrouter
[16:47:44.633] [summarizer] └─ ✗ failed (1/3): openrouter — HTTP 429 — Provider returned error
[16:47:44.633] [summarizer] ┌─ attempt 2/3 → gemini
[16:47:52.126] [summarizer] └─ ✓ success (2/3): gemini | model=gemini-2.5-flash | tokens=221→60 | 7493ms
Silence logs:
new AutoFreeToPaidSummarizer({
// ...
logger: () => {}, // no-op
})Custom logger (e.g. pipe to your app logger):
new AutoFreeToPaidTranscriber({
// ...
logger: (msg) => myLogger.debug(msg),
}){
"id": "550e8400-e29b-41d4-a716-446655440000",
"audioName": "meeting",
"processedAt": "2026-05-12T17:00:00.000Z",
"transcription": {
"text": "...",
"language": "ar",
"durationSec": 139.4,
"usage": { "durationSec": 139.4 },
"provider": "deepgram",
"model": "nova-3"
},
"summary": {
"summary": "...",
"actionItems": ["..."],
"topics": ["..."],
"provider": "gemini",
"model": "gemini-2.5-flash",
"usage": { "inputTokens": 221, "outputTokens": 60 }
}
}| Field | Why excluded | How to enable |
|---|---|---|
transcription.segments |
Timestamps per sentence — large, rarely needed after processing | includeSegments: true |
transcription.words |
Word-level timestamps — very large, for subtitle generation only | includeWords: true |
transcription.raw |
Raw provider response — for debugging only | includeRaw: true |
summary.raw |
Raw LLM response — for debugging only | includeRaw: true |
new AudioIntelligencePipeline({
transcriber,
summarizer,
storage,
storeOptions: {
includeSegments: true, // default: false
includeWords: true, // default: false
includeRaw: true, // default: false
},
});See examples/hono-integration.ts for a full example. Key endpoints:
POST /api/transcribe multipart file upload → transcript + summary
POST /api/transcribe-url { url, language?, style? } → transcript + summary
GET /api/records list saved operations (most recent first)
GET /api/records/:id get one operation by UUID
Response shape:
{
"id": "uuid-v4",
"transcript": "...",
"language": "ar",
"usage": {
"transcription": { "durationSec": 139.4 },
"summarization": { "inputTokens": 221, "outputTokens": 60 }
},
"summary": "...",
"actionItems": ["..."],
"topics": ["..."],
"providers": {
"transcription": "deepgram",
"summarization": "gemini"
}
}Copy .env.example → .env.
- Go to console.deepgram.com
- Sign up → create a project
- API Keys → Create a new API key
- Free tier: 200 hours/month — no credit card required
- Go to platform.openai.com/api-keys
- Sign in → Create new secret key
- Add billing at platform.openai.com/settings/billing
- Used as: Whisper transcription (
whisper-1) +gpt-4o-minisummarization
- Go to openrouter.ai → Sign in with GitHub
- Keys → Create Key
- Many models are free (Llama, Mistral, Gemma). No credit card needed for free models.
- Default model used:
meta-llama/llama-3.3-70b-instruct:free
- Go to aistudio.google.com
- Sign in with a Google account
- Click Get API key → Create API key
- Free tier is generous (1,500 requests/day for
gemini-2.5-flash)
Only needed if
AUDIO_STORAGE=s3
- Go to console.aws.amazon.com/iam
- Users → Create user → attach policy
AmazonS3FullAccess(or a scoped policy) - Security credentials → Create access key → choose "Application running outside AWS"
- Copy
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY - Create a bucket at console.aws.amazon.com/s3 → set
S3_BUCKETandS3_REGION
Tip: On EC2 / ECS / Lambda, attach an IAM role instead — omit
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYentirely.
| Variable | Required | Default | Description |
|---|---|---|---|
DEEPGRAM_API_KEY |
For Deepgram | — | STT — free 200h/month |
OPENAI_API_KEY |
For OpenAI | — | STT + summarization paid fallback |
OPENROUTER_API_KEY |
For OpenRouter | — | Summarization — free models available |
GEMINI_API_KEY |
For Gemini | — | Summarization — free tier |
AUDIO_STORAGE |
No | local |
local | sqlite | s3 |
AUDIO_OUTPUT_DIR |
No | ./outputs |
Local storage directory |
SQLITE_PATH |
No | ./audio-intelligence.db |
SQLite file path |
S3_BUCKET |
If S3 | — | S3 bucket name |
S3_REGION |
If S3 | — | AWS region (e.g. us-east-1) |
S3_PREFIX |
No | audio-intelligence/ |
Key prefix inside the bucket |
AWS_ACCESS_KEY_ID |
If S3, no IAM | — | AWS access key |
AWS_SECRET_ACCESS_KEY |
If S3, no IAM | — | AWS secret key |
The library targets Node.js 18+ and Bun 1+.
| Feature | Bun | Node.js 18+ | Notes |
|---|---|---|---|
fetch (global) |
✅ | ✅ | Node 18 added native fetch |
| SQLite | ✅ built-in | better-sqlite3 |
Auto-detected at runtime |
| S3 | ✅ | ✅ | Needs @aws-sdk/client-s3 |
ESM (type: module) |
✅ | ✅ | Uses createRequire — no bare require() |
FormData / Blob |
✅ | ✅ | Node 18+ has both |
Push a version tag to trigger the GitHub Actions workflow:
# 1. Bump version in package.json
bun version patch # or minor / major
# 2. Tag and push
git tag v1.0.1
git push origin v1.0.1The workflow (.github/workflows/publish.yml) runs:
bun installbun testbun run buildnpm publish --access public --provenance
Required secret: Add NPM_TOKEN in your GitHub repository → Settings → Secrets → Actions.
MIT