Skip to content

Latest commit

 

History

History
225 lines (163 loc) · 6.52 KB

File metadata and controls

225 lines (163 loc) · 6.52 KB

Backend

Backend service for the Library project.

This part contains the FastAPI API, database layer, migrations, background workers, parser jobs, mail workflows, and S3-compatible storage integration.

Stack

  • Python 3.12
  • FastAPI
  • SQLAlchemy asyncio
  • Alembic
  • PostgreSQL
  • Redis
  • RabbitMQ
  • Taskiq
  • MinIO/S3-compatible storage
  • Poetry

Structure

backend/
|-- app/
|   |-- api/v1/          # Route modules
|   |-- core/            # Config, database, broker, scheduler, storage
|   |-- core/models/     # SQLAlchemy models and enums
|   |-- crud/            # Database access helpers
|   |-- dependencies/    # FastAPI dependencies
|   |-- mailing/         # Mail sending and templates
|   |-- schemas/         # Pydantic schemas
|   |-- scripts/         # One-off runnable scripts
|   |-- services/        # Domain services
|   |-- tasks/           # Taskiq background tasks
|   `-- run.py           # Uvicorn entrypoint
|-- alembic/             # Alembic migrations
|-- Dockerfile
|-- docker-compose.yml
|-- pyproject.toml
`-- .env.example

Environment

Create a local environment file before running the backend:

cp .env.example .env

Important environment groups:

  • POSTGRES_* configures PostgreSQL.
  • REDIS_* configures Redis.
  • RMQ_* configures RabbitMQ.
  • MINIO_* configures MinIO and default buckets.
  • MAIL_* configures SMTP mail sending.
  • SECURE_* configures JWT, cookies, CSRF, and token TTL values.
  • CORS_* configures allowed frontend origins and headers.
  • ADMIN_* configures bootstrap admin creation.
  • PARSER_* configures external book parsing.
  • RUN_* configures the API host and port.

Automatic Gutendex Parser

The backend contains an automatic parser for importing public-domain book data from Gutendex. The default source is configured by PARSER_API_BASE and points to:

https://gutendex.com

Gutendex is used as a catalog API. The parser reads book metadata from Gutendex JSON responses and follows the plain-text file links provided in each book's formats object.

Parser Flow

  1. Category discovery starts at GET /books/?sort=popular.
  2. The parser reads subjects and bookshelves from popular books and turns them into local categories.
  3. For each discovered category, it requests GET /books/?topic=<category>&sort=popular.
  4. Every book is checked against the current category to avoid unrelated search results.
  5. The parser extracts the title, first author, author birth/death years, first summary as annotation, and cover URL.
  6. It chooses the best plain-text format from formats, preferring text/plain, .txt.utf-8, and .txt links.
  7. It downloads the text file, removes Project Gutenberg start/end boilerplate, normalizes Unicode/control characters, and keeps paragraph breaks.
  8. The cleaned text is split into pages using PARSER_PAGE_CHARS.
  9. Books are saved in batches with their category, author, metadata, cover URL, and generated BookPage rows.

Duplicate And Limit Handling

The parser avoids importing the same book twice by checking the combination of title, author, and category before text download and again before save.

It also limits the import size so a scheduled run remains controlled:

  • PARSER_DISCOVERY_PAGES_LIMIT limits how many popular catalog pages are scanned for categories.
  • PARSER_MAX_CATEGORIES limits the number of categories imported per run.
  • PARSER_MAX_AUTHORS_PER_CATEGORY limits author variety inside one category.
  • PARSER_MAX_BOOKS_PER_AUTHOR limits how many books one author can contribute to one category.
  • PARSER_BATCH_SIZE controls how many parsed books are saved per database batch.
  • PARSER_REQUEST_RETRIES controls retry attempts for Gutendex JSON and text requests.

How It Runs

The parser implementation lives in app/services/parser/.

Main entry points:

  • app/services/parser/runner.py runs the full parser flow.
  • app/tasks/parser_tasks.py exposes the scheduled Taskiq task.
  • app/scripts/run_parser_once.py queues one parser task manually through the broker.

Docker services related to the parser:

  • worker executes parser and email tasks from RabbitMQ.
  • scheduler queues the parser on the configured cron schedule.
  • parser_bootstrap queues one parser run when started.

Run parser bootstrap once:

poetry run python -m app.scripts.run_parser_once

Run the worker that executes parser jobs:

poetry run taskiq worker app.core.broker:broker app.tasks.parser_tasks app.tasks.email_tasks --log-level INFO --max-prefetch 1 --ack-type when_executed --shutdown-timeout 30

Run the scheduler:

poetry run taskiq scheduler app.core.scheduler:scheduler app.tasks.parser_tasks --log-level INFO

Docker Run

From the repository root, start the whole project:

docker compose up -d --build

From this directory, start only the backend stack:

docker compose up -d --build

The API container runs migrations before starting the server:

alembic -c app/alembic.ini upgrade head && python -m app.run

Default local URLs from .env.example:

  • API: http://localhost:8000
  • OpenAPI docs: http://localhost:8000/docs
  • RabbitMQ UI: http://localhost:15672
  • MinIO UI: http://localhost:9001

Local Development

Install dependencies:

poetry install

Run migrations:

poetry run alembic -c app/alembic.ini upgrade head

Run the API:

poetry run python -m app.run

Run a worker:

poetry run taskiq worker app.core.broker:broker app.tasks.parser_tasks app.tasks.email_tasks --log-level INFO --max-prefetch 1 --ack-type when_executed --shutdown-timeout 30

Run the scheduler:

poetry run taskiq scheduler app.core.scheduler:scheduler app.tasks.parser_tasks --log-level INFO

Run parser bootstrap once:

poetry run python -m app.scripts.run_parser_once

Migrations

Create a migration after model changes:

poetry run alembic -c app/alembic.ini revision --autogenerate -m "describe change"

Apply migrations:

poetry run alembic -c app/alembic.ini upgrade head

Common Places

  • API routes: app/api/v1/
  • Application factory: app/api/main.py
  • Runtime config: app/core/config.py
  • Database models: app/core/models/
  • CRUD layer: app/crud/
  • Pydantic schemas: app/schemas/
  • Auth/security helpers: app/services/, app/dependencies/
  • Background tasks: app/tasks/
  • Parser implementation: app/services/parser/
  • Scheduled parser task: app/tasks/parser_tasks.py
  • One-time parser queue script: app/scripts/run_parser_once.py
  • Migrations: alembic/versions/