A highly available, fault-tolerant distributed job scheduling system built with Java 21 and Spring Boot, demonstrating advanced distributed systems concepts.
This project showcases:
- Leader Election: Redis-based leader election with automatic failover
- Distributed Locking: Redlock algorithm to prevent duplicate job execution
- Fault Tolerance: Automatic recovery from node failures
- Observability: Comprehensive metrics, logging, and health checks
- Production-Ready: Docker, Kubernetes, CI/CD ready
See ARCHITECTURE.md for comprehensive architecture documentation including:
- High-level system design
- Component interactions
- Leader election process
- Job execution flow
- Database schema
- Deployment architecture
- Java 21 (OpenJDK or Oracle JDK)
- Maven 3.9+
- Docker & Docker Compose
-
Start infrastructure services
cd deployment/docker docker-compose up -d mysql redis -
Build the project
mvn clean install
-
Run the application
mvn spring-boot:run
-
Access the application
Run a 3-node cluster with Docker Compose:
cd deployment/docker
docker-compose up -dThis starts:
- 3 scheduler nodes (ports 8080, 8081, 8082)
- MySQL database
- Redis cluster
Note: Prometheus and Grafana are disabled by default. They will be enabled in Phase 4 (Observability). See docs/OBSERVABILITY_STRATEGY.md
- DEVELOPMENT.md - Development progress tracker and phase index
- ARCHITECTURE.md - Comprehensive architecture documentation
- DIAGRAMS_ASCII.md - ASCII architecture diagrams
- docs/features/ - Feature-specific documentation
# Run all tests
mvn test
# Run only unit tests
mvn test -Dtest=*Test
# Run only integration tests
mvn test -Dtest=*IntegrationTest
# Run with coverage
mvn test jacoco:reportBackend:
- Java 21 (Virtual Threads, Records, Pattern Matching)
- Spring Boot 3.2.3
- Spring Data JPA (Hibernate 6.4)
- Redisson 3.27.0 (Redis client)
- Flyway 10.8.1 (Database migrations)
Infrastructure:
- Redis 7.2+ (Coordination)
- MySQL 8.0+ (Persistence)
- Prometheus + Grafana (Observability)
- Docker & Kubernetes (Deployment)
- Project structure and Maven setup
- Database schema with Flyway (V1-V4 migrations)
- Core domain entities (Job, JobExecution, SchedulerNode)
- JPA repositories
- Coordination layer (leader election, distributed locking)
- REST API controllers
- Job executor with virtual threads
- Redis-based leader election
- Heartbeat mechanism
- Automatic failover
- Fencing tokens
- Redlock implementation
- Idempotency service
- Retry logic with exponential backoff
- Job state machine
- Prometheus metrics
- Custom health indicators
- Structured logging
- Distributed tracing
Note: Observability features deferred to focus on core distributed systems functionality first.
This is a portfolio project . Contributions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Scheduler Team - Initial work
- Inspired by production job schedulers like Quartz, Airflow, and Temporal
- Built to demonstrate distributed systems expertise
- Designed with production-ready patterns and best practices
Status: 🚧 Phase 1 in progress
Last Updated: 2026-03-07