Skip to content

jsuryanm/text-summarization-system

Repository files navigation

End-to-End Text Summarization System

License: MIT Python 3.10+ FastAPI Docker

A production-ready, end-to-end text summarization system built with BART-base, deployed on AWS with FastAPI and Streamlit. This project demonstrates MLOps best practices with modular architecture, configuration-driven pipelines, and containerized deployment.

🌟 Highlights

  • State-of-the-art NLP: Fine-tuned Facebook's BART-base model for abstractive summarization
  • Production Architecture: Clean, modular design following software engineering best practices
  • MLOps Integration: Configuration-driven pipelines with reproducible experiments
  • Dual Interface: REST API (FastAPI) + Interactive Web UI (Streamlit)
  • Cloud Deployment: Fully deployed on AWS with Docker containers
  • Performance: Achieved ROUGE-1: 42.73, ROUGE-2: 20.29, ROUGE-L: 39.70

πŸ“‹ Table of Contents


πŸš€ Features

Core Capabilities

  • Abstractive Summarization: Generates human-like summaries using transformer architecture
  • Configurable Pipeline: YAML-based configuration for easy experimentation
  • Artifact Management: Smart training skip logic based on existing artifacts
  • Comprehensive Evaluation: ROUGE metrics (ROUGE-1, ROUGE-2, ROUGE-L, ROUGE-Lsum)
  • REST API: Production-ready FastAPI endpoints
  • Interactive UI: User-friendly Streamlit interface with latency monitoring

MLOps Features

  • Modular component-based architecture
  • Configuration-driven training and inference
  • Automated data validation
  • Model versioning and artifact tracking
  • Docker containerization for reproducibility
  • CI/CD ready structure

πŸ— Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     User Interface Layer                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Streamlit Web UI    β”‚      β”‚   REST API (FastAPI)  β”‚   β”‚
β”‚  β”‚  (Port 8501)         │◄────►│   (Port 8000)         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Application Layer                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  Prediction  β”‚  β”‚  Training    β”‚  β”‚  Evaluation  β”‚     β”‚
β”‚  β”‚  Pipeline    β”‚  β”‚  Pipeline    β”‚  β”‚  Pipeline    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Component Layer                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Data    β”‚  β”‚   Data     β”‚  β”‚  Model  β”‚  β”‚  Model   β”‚  β”‚
β”‚  β”‚ Ingestion│─►│Validation  │─►│ Trainer │─►│Evaluationβ”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Model Layer                              β”‚
β”‚         BART-base (facebook/bart-base) Fine-tuned           β”‚
β”‚              on CNN/DailyMail Dataset                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ›  Technology Stack

Core Technologies

  • Framework: PyTorch
  • Model: Hugging Face Transformers (BART-base)
  • API: FastAPI
  • UI: Streamlit
  • Containerization: Docker, Docker Compose

ML/NLP Stack

  • transformers (Hugging Face)
  • datasets (Hugging Face)
  • evaluate (ROUGE metrics)
  • torch
  • tokenizers

DevOps & Cloud

  • AWS EC2 (compute)
  • AWS S3 (model storage)
  • Docker (containerization)
  • GitHub Actions (CI/CD)

Data & Utilities

  • pandas
  • numpy
  • PyYAML (configuration)
  • python-box (config access)
  • ensure (data validation)

πŸ“ Project Structure

text-summarization-system/
β”‚
β”œβ”€β”€ config/
β”‚   └── config.yaml              # Pipeline configuration
β”‚
β”œβ”€β”€ src/summarizer/
β”‚   β”œβ”€β”€ components/              # Core components
β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ data_validation.py
β”‚   β”‚   β”œβ”€β”€ data_transformation.py
β”‚   β”‚   β”œβ”€β”€ model_trainer.py
β”‚   β”‚   └── model_evaluation.py
β”‚   β”‚
β”‚   β”œβ”€β”€ pipeline/                # Pipeline orchestration
β”‚   β”‚   β”œβ”€β”€ stage_01_data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ stage_02_data_validation.py
β”‚   β”‚   β”œβ”€β”€ stage_03_data_transformation.py
β”‚   β”‚   β”œβ”€β”€ stage_04_model_trainer.py
β”‚   β”‚   β”œβ”€β”€ stage_05_model_evaluation.py
β”‚   β”‚   └── prediction.py
β”‚   β”‚
β”‚   β”œβ”€β”€ config/                  # Configuration management
β”‚   β”‚   └── configuration.py
β”‚   β”‚
β”‚   β”œβ”€β”€ entity/                  # Data models
β”‚   β”‚   └── config_entity.py
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/                   # Utilities
β”‚   β”‚   └── common.py
β”‚   β”‚
β”‚   β”œβ”€β”€ constants/               # Constants
β”‚   β”‚   └── __init__.py
β”‚   β”‚
β”‚   └── logging/                 # Logging setup
β”‚       └── __init__.py
β”‚
β”œβ”€β”€ artifacts/                   # Generated artifacts (gitignored)
β”‚   β”œβ”€β”€ data_ingestion/
β”‚   β”œβ”€β”€ data_validation/
β”‚   β”œβ”€β”€ data_transformation/
β”‚   β”œβ”€β”€ model_trainer/
β”‚   └── model_evaluation/
β”‚
β”œβ”€β”€ research/                    # Jupyter notebooks for experimentation
β”‚
β”œβ”€β”€ .github/workflows/           # CI/CD pipelines
β”‚
β”œβ”€β”€ app.py                       # FastAPI application
β”œβ”€β”€ streamlit_app.py             # Streamlit application
β”œβ”€β”€ main.py                      # Training pipeline entry point
β”œβ”€β”€ params.yaml                  # Training hyperparameters
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ setup.py                     # Package setup
β”œβ”€β”€ pyproject.toml               # Project metadata
β”‚
β”œβ”€β”€ api.Dockerfile               # FastAPI container
β”œβ”€β”€ streamlit.Dockerfile         # Streamlit container
β”œβ”€β”€ docker-compose.yml           # Multi-container orchestration
β”‚
└── README.md                    # This file

πŸ’» Installation

Prerequisites

  • Python 3.10 or higher
  • CUDA-capable GPU (recommended, NVIDIA RTX 4060 or better)
  • 8GB+ RAM
  • Docker (for containerized deployment)

Local Setup

  1. Clone the repository
git clone https://github.com/jsuryanm/text-summarization-system.git
cd text-summarization-system
  1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Install the package in editable mode
pip install -e .

🎯 Usage

Training Pipeline

Run the complete training pipeline:

python main.py

This executes all stages sequentially:

  1. Data Ingestion: Downloads CNN/DailyMail dataset
  2. Data Validation: Validates dataset structure
  3. Data Transformation: Tokenizes and prepares data
  4. Model Training: Fine-tunes BART-base model
  5. Model Evaluation: Computes ROUGE metrics

The pipeline includes smart artifact checking - if a stage has already been completed, it will be skipped automatically.

API Server

Start the FastAPI server:

uvicorn app:app --host 0.0.0.0 --port 8000 --reload

Access API documentation at: http://localhost:8000/docs

Streamlit UI

Launch the interactive web interface:

streamlit run streamlit_app.py

Access the UI at: http://localhost:8501

Programmatic Usage

from summarizer.pipeline.prediction import PredictionPipeline

# Initialize pipeline
predictor = PredictionPipeline()

# Generate summary
text = """
Your long article text here...
"""

summary = predictor.predict(text)
print(f"Summary: {summary}")

πŸ“Š Model Performance

Evaluation Metrics

The model was evaluated on the CNN/DailyMail test set using ROUGE metrics:

Metric Score Interpretation
ROUGE-1 39.43 Strong unigram overlap - good content coverage
ROUGE-2 17.65 Solid bigram matching - maintains fluency
ROUGE-L 26.89 Good structural similarity
ROUGE-Lsum 36.34 High summary-level coherence

Understanding ROUGE Metrics

  • ROUGE-1: Measures word-level overlap between generated and reference summaries
  • ROUGE-2: Evaluates phrase-level (bigram) similarity and fluency
  • ROUGE-L: Based on longest common subsequence, captures word order
  • ROUGE-Lsum: Summary-level ROUGE-L, standard for CNN/DailyMail

Training Details

  • Base Model: facebook/bart-base (139M parameters)
  • Dataset: CNN/DailyMail (10% used due to hardware constraints)
  • Hardware: NVIDIA RTX 4060
  • Training Epochs: Configurable via params.yaml
  • Optimizer: AdamW
  • Scheduler: Linear warmup with decay

Note: Training was performed on 10% of the dataset due to GPU memory limitations. With a more powerful GPU (e.g., A100, V100), you can train on the full dataset for improved performance.


πŸ”Œ API Documentation

Endpoints

Health Check

GET /

Response:

{
  "status": "healthy",
  "message": "Text Summarization API is running"
}

Predict Summary

POST /predict
Content-Type: application/json

Request Body:

{
  "text": "Your long article or document text here..."
}

Response:

{
  "summary": "Concise generated summary of the input text",
  "inference_time_ms": 234.5
}

Error Response:

{
  "detail": "Error message"
}

Example using cURL

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your article text here"}'

Example using Python

import requests

url = "http://localhost:8000/predict"
payload = {
    "text": "Your long article text here..."
}

response = requests.post(url, json=payload)
result = response.json()

print(f"Summary: {result['summary']}")
print(f"Inference Time: {result['inference_time_ms']}ms")

🐳 Docker Deployment

Build Images

Build FastAPI container:

docker build -f api.Dockerfile -t summarization-api:latest .

Build Streamlit container:

docker build -f streamlit.Dockerfile -t summarization-ui:latest .

Run Containers

Run API server:

docker run -d -p 8000:8000 --name api-server summarization-api:latest

Run Streamlit UI:

docker run -d -p 8501:8501 --name ui-server summarization-ui:latest

Docker Compose

For running both services together:

docker-compose up -d

This will start:

  • FastAPI server on http://localhost:8000
  • Streamlit UI on http://localhost:8501

Stop services:

docker-compose down

☁️ AWS Deployment

Architecture Overview

Internet
   β”‚
   β”œβ”€β–Ί AWS EC2 Instance (FastAPI) β†’ Port 8000
   β”‚
   └─► AWS EC2 Instance (Streamlit) β†’ Port 8501
        β”‚
        └─► AWS S3 (Model Artifacts)

Deployment Steps

  1. Prepare EC2 Instances

    • Launch 2 EC2 instances (t2.medium or better)
    • Configure security groups (allow ports 8000, 8501, 22)
    • Install Docker and Docker Compose
  2. **Configure GitHub Actions self-hosted runner in EC2

  3. Deploy Containers

    • SSH into EC2 instances
    • Pull Docker images or build from source
  4. Configure Load Balancer (Optional)

    • Set up Application Load Balancer
    • Configure health checks
    • Enable auto-scaling

GitHub Secrets

AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
ECR_API_REPO=ecr_repo_name
ECR_UI_REPO=ecr_repo_name
AWS_ACCOUNT_ID=account_id
AWS_REGION=region

βš™οΈ Configuration

config/config.yaml

Controls pipeline behavior:

artifacts_root: artifacts

data_ingestion:
  root_dir: artifacts/data_ingestion
  source_URL: https://github.com/entbappy/Branching-tutorial/raw/master/summarizer-data.zip
  local_data_file: artifacts/data_ingestion/data.zip
  unzip_dir: artifacts/data_ingestion

data_validation:
  root_dir: artifacts/data_validation
  STATUS_FILE: artifacts/data_validation/status.txt
  ALL_REQUIRED_FILES: ["train", "test", "validation"]

# ... additional configuration

params.yaml

The example given below for training is a rough example. I used a lot more parameters for the TrainingArguments function please refer my params.yaml for the exact parameters.

Defines training hyperparameters:

TrainingArguments:
  num_train_epochs: 1
  warmup_steps: 500
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 1
  weight_decay: 0.01
  logging_steps: 10
  evaluation_strategy: steps
  eval_steps: 500
  save_steps: 1e6
  gradient_accumulation_steps: 16
  fp16: true  # Mixed precision training

Tip: Adjust per_device_train_batch_size and gradient_accumulation_steps based on your GPU memory.


Development Guidelines

  • Follow PEP 8 style guidelines
  • Add unit tests for new features
  • Update documentation as needed
  • Ensure all tests pass before submitting PR

πŸ“ Future Enhancements

  • Upload model artifacts to S3 for persistence
  • Implement A/B testing for model versions
  • Add support for multi-document summarization
  • Integrate monitoring with Prometheus/Grafana
  • Add support for multiple languages
  • Implement user feedback loop
  • Create mobile-friendly UI
  • Add batch processing endpoints

πŸ› Known Issues

  • Training on full dataset requires high-memory GPU (16GB+ VRAM)
  • Initial model loading takes 10-15 seconds
  • Large input texts (>1000 tokens) may have longer inference times

πŸ“š Resources


πŸ“§ Contact

Jayasuryan Mutyala - @jsuryanm

Project Link: https://github.com/jsuryanm/text-summarization-system


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Hugging Face team for the Transformers library
  • Facebook AI Research for the BART model
  • CNN/DailyMail dataset creators
  • FastAPI and Streamlit communities

If you find this project helpful, please consider giving it a star! ⭐

About

End-to-End text summarization system built with bart-base using HuggingFace Transformers and deployed using FastAPI, Docker, and AWS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors