GitHub - eld120/fluffy-funicular

This repository contains ocr_transcribe.py, a small Python script that uses pytesseract to OCR images and create a markdown file (e.g. chapter_1.md).

Prerequisites

Tesseract OCR must be installed on your machine and the tesseract binary must be on PATH.

Install Tesseract (platform-specific)

macOS:
```
brew install tesseract
```

Linux (Debian/Ubuntu) or Windows (WSL):

sudo apt update
sudo apt install -y tesseract-ocr libtesseract-dev

Install Python dependencies in your venv (on Windows or inside WSL). We pin a small set of packages here; adjust versions if needed.

Using pip:

python -m pip install numpy pillow pytesseract

Using uv (if you prefer to manage packages with uv):

uv add numpy pillow pytesseract

Run the script on your chapter folder:

python ocr_transcribe.py --input-dir static/chapter_1 --output-file chapter_1.md

Or run the script using uv run (uses the active uv environment):

uv run python ocr_transcribe.py --input-dir static/chapter_1 --output-file chapter_1.md

Options:

--resize: max width in pixels to resize images before OCR (default 2000).
--lang: tesseract language codes (default eng).

Notes on privacy and cost:

Script runs locally with pytesseract so your images are not uploaded.

If you want, I can add a small helper to compress/resize images prior to uploading to an API, but given your preference for local processing this README guides running under WSL for Windows users.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
static/chapter_1		static/chapter_1
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTORS		CONTRIBUTORS
COPYRIGHT		COPYRIGHT
README.md		README.md
chapter_1.md		chapter_1.md
lab_architecture.md		lab_architecture.md
lab_setup_notes.md		lab_setup_notes.md
ocr_transcribe.py		ocr_transcribe.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vm_reset_rebuild_guide.md		vm_reset_rebuild_guide.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages