A production-ready, end-to-end text summarization system built with BART-base, deployed on AWS with FastAPI and Streamlit. This project demonstrates MLOps best practices with modular architecture, configuration-driven pipelines, and containerized deployment.
- State-of-the-art NLP: Fine-tuned Facebook's BART-base model for abstractive summarization
- Production Architecture: Clean, modular design following software engineering best practices
- MLOps Integration: Configuration-driven pipelines with reproducible experiments
- Dual Interface: REST API (FastAPI) + Interactive Web UI (Streamlit)
- Cloud Deployment: Fully deployed on AWS with Docker containers
- Performance: Achieved ROUGE-1: 42.73, ROUGE-2: 20.29, ROUGE-L: 39.70
- Features
- Architecture
- Technology Stack
- Project Structure
- Installation
- Usage
- Model Performance
- API Documentation
- Docker Deployment
- AWS Deployment
- Configuration
- Contributing
- License
- Abstractive Summarization: Generates human-like summaries using transformer architecture
- Configurable Pipeline: YAML-based configuration for easy experimentation
- Artifact Management: Smart training skip logic based on existing artifacts
- Comprehensive Evaluation: ROUGE metrics (ROUGE-1, ROUGE-2, ROUGE-L, ROUGE-Lsum)
- REST API: Production-ready FastAPI endpoints
- Interactive UI: User-friendly Streamlit interface with latency monitoring
- Modular component-based architecture
- Configuration-driven training and inference
- Automated data validation
- Model versioning and artifact tracking
- Docker containerization for reproducibility
- CI/CD ready structure
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface Layer β
β ββββββββββββββββββββββββ βββββββββββββββββββββββββ β
β β Streamlit Web UI β β REST API (FastAPI) β β
β β (Port 8501) βββββββΊβ (Port 8000) β β
β ββββββββββββββββββββββββ βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Prediction β β Training β β Evaluation β β
β β Pipeline β β Pipeline β β Pipeline β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Component Layer β
β ββββββββββββ ββββββββββββββ βββββββββββ ββββββββββββ β
β β Data β β Data β β Model β β Model β β
β β IngestionβββΊβValidation βββΊβ Trainer βββΊβEvaluationβ β
β ββββββββββββ ββββββββββββββ βββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Model Layer β
β BART-base (facebook/bart-base) Fine-tuned β
β on CNN/DailyMail Dataset β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Framework: PyTorch
- Model: Hugging Face Transformers (BART-base)
- API: FastAPI
- UI: Streamlit
- Containerization: Docker, Docker Compose
- transformers (Hugging Face)
- datasets (Hugging Face)
- evaluate (ROUGE metrics)
- torch
- tokenizers
- AWS EC2 (compute)
- AWS S3 (model storage)
- Docker (containerization)
- GitHub Actions (CI/CD)
- pandas
- numpy
- PyYAML (configuration)
- python-box (config access)
- ensure (data validation)
text-summarization-system/
β
βββ config/
β βββ config.yaml # Pipeline configuration
β
βββ src/summarizer/
β βββ components/ # Core components
β β βββ data_ingestion.py
β β βββ data_validation.py
β β βββ data_transformation.py
β β βββ model_trainer.py
β β βββ model_evaluation.py
β β
β βββ pipeline/ # Pipeline orchestration
β β βββ stage_01_data_ingestion.py
β β βββ stage_02_data_validation.py
β β βββ stage_03_data_transformation.py
β β βββ stage_04_model_trainer.py
β β βββ stage_05_model_evaluation.py
β β βββ prediction.py
β β
β βββ config/ # Configuration management
β β βββ configuration.py
β β
β βββ entity/ # Data models
β β βββ config_entity.py
β β
β βββ utils/ # Utilities
β β βββ common.py
β β
β βββ constants/ # Constants
β β βββ __init__.py
β β
β βββ logging/ # Logging setup
β βββ __init__.py
β
βββ artifacts/ # Generated artifacts (gitignored)
β βββ data_ingestion/
β βββ data_validation/
β βββ data_transformation/
β βββ model_trainer/
β βββ model_evaluation/
β
βββ research/ # Jupyter notebooks for experimentation
β
βββ .github/workflows/ # CI/CD pipelines
β
βββ app.py # FastAPI application
βββ streamlit_app.py # Streamlit application
βββ main.py # Training pipeline entry point
βββ params.yaml # Training hyperparameters
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup
βββ pyproject.toml # Project metadata
β
βββ api.Dockerfile # FastAPI container
βββ streamlit.Dockerfile # Streamlit container
βββ docker-compose.yml # Multi-container orchestration
β
βββ README.md # This file
- Python 3.10 or higher
- CUDA-capable GPU (recommended, NVIDIA RTX 4060 or better)
- 8GB+ RAM
- Docker (for containerized deployment)
- Clone the repository
git clone https://github.com/jsuryanm/text-summarization-system.git
cd text-summarization-system- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Install the package in editable mode
pip install -e .Run the complete training pipeline:
python main.pyThis executes all stages sequentially:
- Data Ingestion: Downloads CNN/DailyMail dataset
- Data Validation: Validates dataset structure
- Data Transformation: Tokenizes and prepares data
- Model Training: Fine-tunes BART-base model
- Model Evaluation: Computes ROUGE metrics
The pipeline includes smart artifact checking - if a stage has already been completed, it will be skipped automatically.
Start the FastAPI server:
uvicorn app:app --host 0.0.0.0 --port 8000 --reloadAccess API documentation at: http://localhost:8000/docs
Launch the interactive web interface:
streamlit run streamlit_app.pyAccess the UI at: http://localhost:8501
from summarizer.pipeline.prediction import PredictionPipeline
# Initialize pipeline
predictor = PredictionPipeline()
# Generate summary
text = """
Your long article text here...
"""
summary = predictor.predict(text)
print(f"Summary: {summary}")The model was evaluated on the CNN/DailyMail test set using ROUGE metrics:
| Metric | Score | Interpretation |
|---|---|---|
| ROUGE-1 | 39.43 | Strong unigram overlap - good content coverage |
| ROUGE-2 | 17.65 | Solid bigram matching - maintains fluency |
| ROUGE-L | 26.89 | Good structural similarity |
| ROUGE-Lsum | 36.34 | High summary-level coherence |
- ROUGE-1: Measures word-level overlap between generated and reference summaries
- ROUGE-2: Evaluates phrase-level (bigram) similarity and fluency
- ROUGE-L: Based on longest common subsequence, captures word order
- ROUGE-Lsum: Summary-level ROUGE-L, standard for CNN/DailyMail
- Base Model: facebook/bart-base (139M parameters)
- Dataset: CNN/DailyMail (10% used due to hardware constraints)
- Hardware: NVIDIA RTX 4060
- Training Epochs: Configurable via
params.yaml - Optimizer: AdamW
- Scheduler: Linear warmup with decay
Note: Training was performed on 10% of the dataset due to GPU memory limitations. With a more powerful GPU (e.g., A100, V100), you can train on the full dataset for improved performance.
GET /Response:
{
"status": "healthy",
"message": "Text Summarization API is running"
}POST /predict
Content-Type: application/jsonRequest Body:
{
"text": "Your long article or document text here..."
}Response:
{
"summary": "Concise generated summary of the input text",
"inference_time_ms": 234.5
}Error Response:
{
"detail": "Error message"
}curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "Your article text here"}'import requests
url = "http://localhost:8000/predict"
payload = {
"text": "Your long article text here..."
}
response = requests.post(url, json=payload)
result = response.json()
print(f"Summary: {result['summary']}")
print(f"Inference Time: {result['inference_time_ms']}ms")Build FastAPI container:
docker build -f api.Dockerfile -t summarization-api:latest .Build Streamlit container:
docker build -f streamlit.Dockerfile -t summarization-ui:latest .Run API server:
docker run -d -p 8000:8000 --name api-server summarization-api:latestRun Streamlit UI:
docker run -d -p 8501:8501 --name ui-server summarization-ui:latestFor running both services together:
docker-compose up -dThis will start:
- FastAPI server on
http://localhost:8000 - Streamlit UI on
http://localhost:8501
Stop services:
docker-compose downInternet
β
βββΊ AWS EC2 Instance (FastAPI) β Port 8000
β
βββΊ AWS EC2 Instance (Streamlit) β Port 8501
β
βββΊ AWS S3 (Model Artifacts)
-
Prepare EC2 Instances
- Launch 2 EC2 instances (t2.medium or better)
- Configure security groups (allow ports 8000, 8501, 22)
- Install Docker and Docker Compose
-
**Configure GitHub Actions self-hosted runner in EC2
-
Deploy Containers
- SSH into EC2 instances
- Pull Docker images or build from source
-
Configure Load Balancer (Optional)
- Set up Application Load Balancer
- Configure health checks
- Enable auto-scaling
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
ECR_API_REPO=ecr_repo_name
ECR_UI_REPO=ecr_repo_name
AWS_ACCOUNT_ID=account_id
AWS_REGION=regionControls pipeline behavior:
artifacts_root: artifacts
data_ingestion:
root_dir: artifacts/data_ingestion
source_URL: https://github.com/entbappy/Branching-tutorial/raw/master/summarizer-data.zip
local_data_file: artifacts/data_ingestion/data.zip
unzip_dir: artifacts/data_ingestion
data_validation:
root_dir: artifacts/data_validation
STATUS_FILE: artifacts/data_validation/status.txt
ALL_REQUIRED_FILES: ["train", "test", "validation"]
# ... additional configurationThe example given below for training is a rough example. I used a lot more parameters for the TrainingArguments function please refer my params.yaml for the exact parameters.
Defines training hyperparameters:
TrainingArguments:
num_train_epochs: 1
warmup_steps: 500
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
weight_decay: 0.01
logging_steps: 10
evaluation_strategy: steps
eval_steps: 500
save_steps: 1e6
gradient_accumulation_steps: 16
fp16: true # Mixed precision trainingTip: Adjust
per_device_train_batch_sizeandgradient_accumulation_stepsbased on your GPU memory.
- Follow PEP 8 style guidelines
- Add unit tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
- Upload model artifacts to S3 for persistence
- Implement A/B testing for model versions
- Add support for multi-document summarization
- Integrate monitoring with Prometheus/Grafana
- Add support for multiple languages
- Implement user feedback loop
- Create mobile-friendly UI
- Add batch processing endpoints
- Training on full dataset requires high-memory GPU (16GB+ VRAM)
- Initial model loading takes 10-15 seconds
- Large input texts (>1000 tokens) may have longer inference times
Jayasuryan Mutyala - @jsuryanm
Project Link: https://github.com/jsuryanm/text-summarization-system
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face team for the Transformers library
- Facebook AI Research for the BART model
- CNN/DailyMail dataset creators
- FastAPI and Streamlit communities
If you find this project helpful, please consider giving it a star! β