Notely

Automatically transform video/audio lectures into structured Markdown notes

Notely is a Python SDK that uses ASR, OCR, and LLM technologies to automatically convert lecture videos, audio recordings, and presentations into high-quality Markdown notes.

Core Features

🎯 High-Quality Speech Recognition - FunASR (Chinese CER < 3%), Whisper (multilingual)
📊 Intelligent OCR - PaddleOCR + key frame deduplication
🤖 Multi-LLM Support - OpenAI, Zhipu AI, Anthropic, Moonshot, DeepSeek
🧠 Three-Layer Enhancement Architecture - Comprehension → Structuring → Polishing
✂️ Semantic Chunking - Intelligent text segmentation (2000 tokens, 1000 overlap)
📐 LaTeX Formula Support - Mathematical notation rendering
🌍 Language Auto-Detection - Automatic transcript language detection
⚡ Concurrent Processing - Parallel chunk processing for efficiency
✨ Beautiful Output - Structured Markdown with automatic formatting
🔧 Flexible Configuration - Simple initialization with deep customization support

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/0xarcher/notely.git
cd notely

# Install dependencies (recommended: uv)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --all-extras

# Or use pip
pip install -e ".[all]"

# Install FFmpeg (required)
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

2. Basic Usage

from notely import Notely, NotelyConfig, EnhancerConfig, LLMConfig

# Method 1: From configuration object
config = NotelyConfig(
    enhancer=EnhancerConfig(
        llm=LLMConfig(
            api_key="sk-xxx",
            model="gpt-4o",
        )
    )
)
notely = Notely(config)

# Process lecture video (async)
import asyncio

result = asyncio.run(notely.process("lecture.mp4"))

# Save notes
result.save("notes.md")

# Method 2: From dictionary (simpler)
notely = Notely.from_dict({
    "llm": {
        "api_key": "sk-xxx",
        "model": "gpt-4o",
    }
})

result = asyncio.run(notely.process("lecture.mp4"))
result.save("notes.md")

# Method 3: From YAML file (recommended for complex configs)
# Create config.yaml first, then:
notely = Notely.from_yaml("config.yaml")
result = asyncio.run(notely.process("lecture.mp4"))

3. Usage Flow

Example Output:

# Introduction to Machine Learning

> 📌 Course Info: 45 minutes | Instructor: Prof. Zhang

## 📌 Course Overview

This lecture introduces the basic concepts of machine learning...

## 📚 Key Concepts

### What is Machine Learning

**Machine learning** is a technology that enables computers to learn from data...

### Types of Machine Learning

| Type | Characteristics | Use Cases |
|------|----------------|-----------|
| **Supervised Learning** | Labeled data | Classification, Regression |
| **Unsupervised Learning** | Unlabeled data | Clustering, Dimensionality Reduction |
| **Reinforcement Learning** | Environmental feedback | Games, Robotics |

## 💡 Key Takeaways

1. Machine learning is a core AI technology
2. Algorithm selection depends on data type and task
3. **Feature engineering** is crucial for model performance

Detailed Usage Guide

Initialization

Method 1: From Dictionary (Simplest)

from notely import Notely

# Basic usage
notely = Notely.from_dict({
    "llm": {"api_key": "sk-xxx", "model": "gpt-4o"}
})

# With environment variable
import os
notely = Notely.from_dict({
    "llm": {"api_key": os.getenv("OPENAI_API_KEY"), "model": "gpt-4o"}
})

Method 2: Switch LLM Provider

import os

# Use Zhipu AI
notely = Notely.from_dict({
    "llm": {
        "api_key": os.getenv("ZHIPU_API_KEY"),
        "provider": "zhipu",
        "model": "glm-4",
    }
})

# Use Anthropic
notely = Notely.from_dict({
    "llm": {
        "api_key": os.getenv("ANTHROPIC_API_KEY"),
        "provider": "anthropic",
        "model": "claude-3-opus-20240229",
    }
})

# Use Moonshot
notely = Notely.from_dict({
    "llm": {
        "api_key": os.getenv("MOONSHOT_API_KEY"),
        "provider": "moonshot",
        "model": "moonshot-v1-8k",
    }
})

Method 3: Custom OpenAI-Compatible Endpoint

notely = Notely.from_dict({
    "llm": {
        "api_key": "sk-xxx",
        "provider": "openai",
        "model": "qwen-plus",
        "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
    }
})

Method 4: Full Configuration

import os

notely = Notely.from_dict({
    # LLM configuration
    "llm": {
        "api_key": os.getenv("OPENAI_API_KEY"),
        "provider": "openai",
        "model": "gpt-4o",
        "base_url": "https://api.openai.com/v1",  # Optional
        "temperature": 0.3,  # Lower for consistency (default: 0.3)
        "max_tokens": 4096,
    },

    # ASR configuration
    "asr": {
        "backend": "funasr",  # Recommended for Chinese: funasr, multilingual: whisper
        "device": "cuda",     # Use cuda with GPU, otherwise cpu
        "model": "iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
        "language": "zh",
    },

    # OCR configuration
    "ocr": {
        "backend": "paddleocr",
        "language": "ch",  # Chinese: ch, English: en
        "use_gpu": True,
    },

    # Enhancement settings (NEW)
    "enhancer": {
        "chunk_size": 2000,        # Maximum chunk size in tokens (default: 2000)
        "chunk_overlap": 1000,     # Overlap between chunks in tokens (default: 1000)
        "language": None,          # Output language: 'zh', 'en', or None for auto-detect
        "max_concurrent": 5,       # Maximum concurrent API calls
    },
})

Method 5: From YAML File (Recommended)

Create config.yaml:

llm:
  api_key: sk-xxx
  provider: openai
  model: gpt-4o
  temperature: 0.3
  max_tokens: 4096

asr:
  backend: funasr
  device: cuda
  language: zh

ocr:
  backend: paddleocr
  language: ch
  use_gpu: true

enhancer:
  chunk_size: 2000
  chunk_overlap: 1000
  language: null  # Auto-detect
  max_concurrent: 5

Then load:

notely = Notely.from_yaml("config.yaml")

Supported LLM Providers

Provider	Provider Value	Recommended Models
OpenAI	`openai`	gpt-4o, gpt-4-turbo
Zhipu AI	`zhipu`	glm-4, glm-4-plus
Anthropic	`anthropic`	claude-3-opus, claude-3-sonnet
Moonshot	`moonshot`	moonshot-v1-8k, moonshot-v1-32k
DeepSeek	`deepseek`	deepseek-chat
Custom	`custom`	Any OpenAI-compatible API

Processing Different Input Formats

Process Video

import asyncio

# Basic usage
result = asyncio.run(notely.process("lecture.mp4"))

# Or use await in async function
async def main():
    result = await notely.process("lecture.mp4")
    result.save("notes.md")

asyncio.run(main())

Process Audio

# Same API for audio files
result = asyncio.run(notely.process("podcast.mp3"))
result.save("notes.md")

Access Processing Results

import asyncio

result = asyncio.run(notely.process("lecture.mp4"))

# Get Markdown content
print(result.markdown)

# Get transcript
print(result.transcript.full_text)
print(f"Duration: {result.transcript.duration:.1f} seconds")
print(f"Segments: {len(result.transcript.segments)}")

# Get OCR results
for ocr_result in result.ocr_results:
    print(ocr_result.full_text)

# Get metadata
print(result.metadata)

# Save to file
result.save("output/notes.md")

How It Works

Processing Pipeline

Architecture Overview

Notely uses a three-layer enhancement pipeline to transform raw transcripts into structured notes:

1. Comprehension Layer - Extracts semantic information from transcript chunks

Minimum 300 words per chunk summary
Preserves all technical details, formulas, and examples
Concurrent processing for efficiency

2. Structuring Layer - Organizes comprehension results into coherent sections

Minimum 200 words per major section
Topic-based organization (not chronological)
Cross-chunk concept merging

3. Formatting Layer - Beautifies markdown with LaTeX formula support

Mathematical notation rendering
Consistent heading hierarchy
Emoji icons for visual clarity

Key Steps:

Input Processing - Extract audio and key frames from video
ASR Transcription - Speech to text with timestamps (FunASR for Chinese, Whisper for multilingual)
OCR Recognition - Extract text from slides/frames using PaddleOCR
Semantic Chunking - Split transcript into 2000-token chunks with 1000-token overlap
Comprehension - Extract semantic information from each chunk (parallel processing)
Structuring - Organize all chunks into coherent sections by topic
Format Output - Beautify Markdown with LaTeX support

FAQ

1. How to choose ASR backend?

Chinese content: Recommended funasr (higher accuracy, CER < 3%)
Multilingual content: Use whisper (supports 99+ languages)
No GPU: Use whisper + device="cpu"

# Chinese lectures
notely = Notely.from_dict({
    "llm": {"api_key": "sk-xxx"},
    "asr": {"backend": "funasr", "device": "cuda"}
})

# English lectures
notely = Notely.from_dict({
    "llm": {"api_key": "sk-xxx"},
    "asr": {"backend": "whisper", "device": "cpu"}
})

2. How to reduce costs?

Use cheaper models: gpt-4o-mini, glm-4-flash
Adjust max_tokens to limit output length
Use domestic LLMs (Zhipu, Moonshot, DeepSeek)

notely = Notely.from_dict({
    "llm": {
        "api_key": os.getenv("ZHIPU_API_KEY"),
        "provider": "zhipu",
        "model": "glm-4-flash",  # Cheaper
        "max_tokens": 2048,      # Limit output
    }
})

3. How to improve processing speed?

Use GPU acceleration: device="cuda"
Reduce chunk size: chunk_size=1500
Increase concurrent processing: max_concurrent=10

notely = Notely.from_dict({
    "llm": {"api_key": "sk-xxx"},
    "asr": {"device": "cuda"},
    "enhancer": {
        "chunk_size": 1500,
        "max_concurrent": 10,
    }
})

4. How to handle long videos?

Notely automatically handles long videos, but it's recommended to:

Ensure sufficient memory and disk space
Use GPU acceleration
Consider splitting videos manually for very long content

5. What video formats are supported?

All formats supported by FFmpeg:

Video: mp4, avi, mov, mkv, flv, wmv, webm
Audio: mp3, wav, m4a, flac, aac, ogg

Project Structure

notely/
├── src/notely/
│   ├── __init__.py          # Main entry point
│   ├── core.py              # Core orchestrator
│   ├── config.py            # Configuration classes
│   ├── models.py            # Data models
│   ├── asr/                 # ASR backends
│   │   ├── funasr.py        # FunASR
│   │   └── whisper.py       # Whisper
│   ├── ocr/                 # OCR backends
│   │   └── paddle.py        # PaddleOCR
│   ├── llm/                 # LLM client
│   │   └── client.py        # OpenAI-compatible client
│   ├── enhancer/            # Three-layer enhancement pipeline
│   │   ├── comprehension.py # Comprehension agent
│   │   ├── structuring.py   # Structuring agent
│   │   ├── enhancer.py      # Main enhancer orchestrator
│   │   └── models.py        # Enhancer data models
│   ├── prompts/             # Prompt management
│   │   ├── comprehension.py # Comprehension prompts
│   │   ├── structuring.py   # Structuring prompts
│   │   └── registry.py      # Prompt registry
│   ├── formatter/           # Markdown formatter
│   └── utils/               # Utility functions
│       ├── audio.py         # Audio processing
│       ├── video.py         # Video processing
│       └── language.py      # Language detection
├── examples/                # Example code
├── tests/                   # Unit tests
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
└── pyproject.toml

Development Guide

Setup Development Environment

# Clone repository
git clone https://github.com/0xarcher/notely.git
cd notely

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install all dependencies
uv sync --all-extras

# Install FFmpeg
brew install ffmpeg  # macOS

Code Standards

# Format code
uv run ruff format .

# Check code
uv run ruff check .

# Auto-fix
uv run ruff check --fix .

Run Tests

# Run all tests
uv run pytest

# Run specific test
uv run pytest tests/test_core.py

# Generate coverage report
uv run pytest --cov=notely --cov-report=html

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

Quick Start:

Fork this repository
Create feature branch: git checkout -b feature/your-feature
Commit changes: git commit -m "feat: add your feature"
Push branch: git push origin feature/your-feature
Submit Pull Request

License

This project is licensed under the MIT License - see LICENSE file for details.

Acknowledgments

Notely is built on these excellent open-source projects:

FunASR - Alibaba ASR toolkit
PaddleOCR - Baidu OCR toolkit
Whisper - OpenAI speech recognition model
pdfplumber - PDF text extraction

Contact

Made with ❤️ by Archer

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
docs/images		docs/images
examples		examples
src/notely		src/notely
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

Notely

Core Features

Quick Start

1. Installation

2. Basic Usage

3. Usage Flow

Detailed Usage Guide

Initialization

Method 1: From Dictionary (Simplest)

Method 2: Switch LLM Provider

Method 3: Custom OpenAI-Compatible Endpoint

Method 4: Full Configuration

Method 5: From YAML File (Recommended)

Supported LLM Providers

Processing Different Input Formats

Process Video

Process Audio

Access Processing Results

How It Works

Processing Pipeline

Architecture Overview

FAQ

1. How to choose ASR backend?

2. How to reduce costs?

3. How to improve processing speed?

4. How to handle long videos?

5. What video formats are supported?

Project Structure

Development Guide

Setup Development Environment

Code Standards

Run Tests

Contributing

License

Acknowledgments

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages