Skip to content

Integrate Retrieval-Augmented Generation (RAG) QA pipeline #34

@justinmadison

Description

@justinmadison

Summary

Build a RAG-style QA system over the news corpus so users can ask questions (e.g. “What happened to Company X?”) and receive synthesized, context-grounded answers.

Motivation

  • Provides a powerful, natural-language interface to the article database.
  • end-to-end retrieval, prompt construction, and LLM integration.
  • Lays groundwork for advanced analytics and conversational features.

Scope

None

Acceptance Criteria

  • retrieve_context(question) returns at least top_k context snippets

  • generate_answer(question, contexts) returns a coherent answer string

  • qa_task(question) executes end-to-end and returns the generated answer

  • CLI qa command runs without errors and prints the answer

  • All tests pass in CI and README clearly documents the HF-based RAG workflow

Additional Context

Details

  • Category: nlp
  • Priority: P1
  • Estimate: 3d
  • Dependencies:
    • Embedding & vector-store pipeline in place
    • Database connection module (nlp/db.py)
    • HF model weights available locally or via Hugging Face Hub

Tasks

  1. Add dependencies
    • Add sentence-transformers, faiss-cpu, and transformers to /nlp/requirements.txt.
  2. Core function signatures (/nlp/core.py)
    • def retrieve_context(question: str, top_k: int = 5) -> List[str]
    • def generate_answer(question: str, contexts: List[str]) -> str
  3. Celery task hook (/nlp/tasks.py)
    • @app.task def qa_task(question: str, top_k: int = 5) -> str
    • Should call retrieve_context() then generate_answer().
  4. CLI entrypoint (/nlp/cli.py)
    python -m nlp.cli qa --question="What happened to Company X?" --top-k=5
  5. Tests & documentation
    • Retrieval test (/nlp/tests/test_retrieval.py): assert retrieve_context() returns at least top_k snippets.
    • Generation test (/nlp/tests/test_generation.py): with sample contexts, assert generate_answer() returns a non-empty string.
    • Task test (/nlp/tests/test_qa_task.py): mock core functions, assert qa_task() returns the expected answer.
    • Update /nlp/README.md with installation steps, vector-store setup, Celery usage, and CLI example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions