Skip to content

craigtrim/gutenfetchen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gutenfetchen

PyPI version Python 3.10+ License: MIT PyPI Downloads PyPI Downloads/Month Code style: ruff Type checked: mypy

Verb, pseudo-German. gutenfetchen (/ˈɡuːtənˌfɛtʃən/) "to do the good fetching." From guten (good) + fetchen (to fetch), conjugated in the infinitive as if it were a proper German verb. Because downloading public-domain literature should feel orderly, efficient, and vaguely Teutonic.

Download plain-text e-books from Project Gutenberg with a single command.

Why gutenfetchen?

Most Gutenberg tools (Gutenberg, gutenbergpy) require building a local metadata database before you can do anything - a process that can take hours. gutenfetchen skips all of that.

  • Zero setup - queries the Gutendex API directly, no local database required
  • Smart deduplication - filters out duplicate editions, keeps the highest-quality version
  • Clean output - strips Project Gutenberg boilerplate headers/footers by default
  • Prefers UTF-8 - automatically selects the best plain-text encoding available
  • Dry-run mode - preview results before downloading anything

Install

pip install gutenfetchen

Usage

Search by title:

gutenfetchen "tale of two cities"

Search by author:

gutenfetchen --author "joseph conrad"

Combine author + title filter:

gutenfetchen "heart" --author "joseph conrad"

Download random e-texts:

gutenfetchen --random 5

Preview without downloading:

gutenfetchen --author "jane austen" --dry-run

Limit results and set output directory:

gutenfetchen --author "mark twain" --n 3 -o ./my_texts/

Keep Gutenberg boilerplate (skip cleaning):

gutenfetchen "moby dick" --no-clean

Clean existing files on disk:

gutenfetchen clean ./gutenberg_texts/
gutenfetchen clean file1.txt file2.txt
gutenfetchen clean --dry-run ./gutenberg_texts/

The clean subcommand runs the same boilerplate-stripping pipeline used during download. It is idempotent — running it on already-clean texts leaves them unchanged.

Options

positional:
  title                  Search by title (e.g., 'tale of two cities')

options:
  --author NAME          Search by author name (e.g., 'joseph conrad')
  --random N             Download N random e-texts
  --n N                  Maximum number of texts to download
  -o, --output-dir DIR   Output directory (default: ./gutenberg_texts/)
  --dry-run              List matching books without downloading
  --no-clean             Skip stripping Project Gutenberg boilerplate

About

CLI for downloading Project Gutenberg e-texts - orderly, efficient, and vaguely Teutonic

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors