Skip to content

Integrate Hugging Face summarization pipeline #14

@justinmadison

Description

@justinmadison

Summary

Add an abstractive summarization step so each article gets a short “TL;DR.”

Motivation

  • Helps end-users absorb long articles quickly.
  • Demonstrates multi-document summarization (e.g. daily digest).

Scope

None

Acceptance Criteria

  • summarize(text) produces a concise summary (<200 chars)
  • summarize_task(article_id) saves "summary" to the article record
  • CLI summarize command runs without errors and prints confirmation
  • Tests pass in CI and README clearly describes both execution paths

Additional Context

Details

  • Category: nlp
  • Priority: P1
  • Estimate: 2d
  • Dependencies:
    • Database connection module (nlp/db.py) in place
    • Articles already normalized and persisted

Tasks

  1. Add dependencies
    • Add transformers and torch to /nlp/requirements.txt.
  2. Core function signature
    • Define in /nlp/core.py:
      def summarize(text: str) -> str
      
  3. Celery task hook
    • In /nlp/tasks.py, register:
      @app.task
      def summarize_task(article_id: str) -> str
      
  4. CLI entrypoint
    • In /nlp/cli.py, expose:
      python -m nlp.cli summarize --article-id=<id>
      
  5. Tests & documentation
    • Unit test that summarize() returns a non-empty string under 200 chars.
    • Test that summarize_task() updates the DB with a "summary" field.
    • Update /nlp/README.md with:
      • Installation steps
      • How to run the Celery task
      • How to invoke the CLI command

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions