A powerful FastAPI-based application for managing and searching data leak information stored in Elasticsearch. This system provides comprehensive search capabilities for forum posts, link validation, and system health monitoring.
LeaksBackend is designed to index and search forum posts from various sources, allowing users to efficiently query leaked data across multiple dimensions including dates, authors, forums, and content. The application leverages Elasticsearch for high-performance search operations and provides a clean REST API interface.
- Multi-dimensional Search: Search by date ranges, usernames, forums, text content, and advanced filters
- Text Content Search: Full-text search capabilities across post titles and content
- Link Validation: Check if specific post links exist in the database
- Pagination Support: Efficient token-based pagination for large datasets
- Health Monitoring: Real-time Elasticsearch connectivity status
- Input Validation: Robust data validation and sanitization
- Structured Logging: Comprehensive request/response logging
For Docker Setup (Recommended):
- Docker & Docker Compose
For Local Development:
- Python 3.12+
- UV package manager
- Elasticsearch 8.17.0+
-
Clone and navigate to the project
git clone https://github.com/MrStarkEG/Leaks-Backend cd leaks_backend -
Configure environment variables
cp .env.example .env # Edit .env with your configuration (Elasticsearch URL will be overridden for Docker) -
Start the complete stack
docker-compose up -d
This will start:
-
LeaksBackend API:
http://localhost:8080 -
Elasticsearch:
http://localhost:9200(⚠️ Empty database - for testing only) -
Kibana Dashboard:
http://localhost:5601⚠️ Important Note: The Docker Elasticsearch starts completely empty. API endpoints that search for data (like/posts/search/*and/posts/check/link/) will return 500 errors withindex_not_found_exceptionuntil you add test data. This is expected behavior for a fresh Docker setup.
- Useful Docker commands
# Check service status docker-compose ps # View logs docker-compose logs -f leaks-backend # Stop the stack docker-compose down # Rebuild application after code changes docker-compose up -d --build leaks-backend
-
Clone and navigate to the project
git clone https://github.com/MrStarkEG/Leaks-Backend cd leaks_backend -
Install dependencies using UV
uv sync
-
Configure environment variables
cp .env.example .env # Edit .env with your Elasticsearch configuration -
Run the application
uv run main.py
The API will be available at http://localhost:8080
The application uses a .env file for configuration. Copy .env.example to .env and configure:
# Elasticsearch Configuration
ELASTIC_URL=http://localhost:9200
ELASTIC_USERNAME=elastic
ELASTIC_PASSWORD=your_password_here
# API Authentication
API_KEY=your_api_key_here
# Logging Configuration (Optional)
LOG_LEVEL=INFONote for Docker users: The ELASTIC_URL will be automatically overridden to http://elasticsearch:9200 for container networking.
Expected Docker Behavior:
- ✅ Health endpoints work:
/and/elastic/ping - ❌ Search endpoints return 500 errors (no data available)
- 💡 To test with real data, use local mode:
uv run main.py
Note: Local development runs on port 8080. For Windows users, if port binding fails, try running as administrator or use a different port.
- GET
/- Basic application health check- Response:
{"message": "Welcome to the LeaksBackEnd API, Please refer to the docs for any help!"}
- Response:
- GET
/elastic/ping- Test Elasticsearch connectivity- Authentication: Required (Bearer token)
- Success Response:
{"message": "pong"} - Error Response:
503 Service Unavailableif Elasticsearch is unreachable
- POST
/posts/search/date/- Search posts within a date range
Headers:
Authorization: Bearer your_api_key_here
Request Body:
{
"start_date": "2024-01-01", // Required: YYYY-MM-DD format
"end_date": "2024-12-31", // Required: YYYY-MM-DD format
"sort_field": "date", // Optional: Field to sort by
"sort_order": "desc", // Optional: asc/desc
"size": 10, // Optional: Results per page (1-100)
"offset": 0, // Optional: Pagination offset
"search_next_posts_token": "..." // Optional: Pagination token
}Response:
{
"start_date": "2024-01-01T00:00:00",
"end_date": "2024-12-31T00:00:00",
"size": 10,
"offset": 0,
"total_found": 150,
"has_more": true,
"search_next_posts_token": "1234567890,post_id",
"posts": [...]
}- POST
/posts/search/username/{username}- Search posts by author
Path Parameters:
username(string): Username to search for
Request Body: Same as date search (excluding required dates)
Response: Posts array with username-specific metadata
- POST
/posts/search/forum/{forum}- Search posts from specific forum
Path Parameters:
forum(string): Forum name to search for
Request Body: Same as date search structure
Response: Posts array with forum-specific metadata
- POST
/posts/search/advanced/- Multi-criteria search
Request Body:
{
"author": "username", // Optional: Filter by author
"source": "darkforums", // Optional: Filter by source
"start_date": "2024-01-01", // Optional: Date range start
"end_date": "2024-12-31", // Optional: Date range end
"sort_field": "date",
"sort_order": "desc",
"size": 10,
"offset": 0,
"search_next_posts_token": "..."
}Response: Posts array with applied filters and metadata
- POST
/posts/search/text/- Search posts by text content in title and content
Headers:
Authorization: Bearer your_api_key_here
Request Body:
{
"text": "data breach", // Required: Text to search for
"start_date": "2024-01-01", // Optional: YYYY-MM-DD format
"end_date": "2024-12-31", // Optional: YYYY-MM-DD format
"source": "darkforums", // Optional: Source filter
"author": "username", // Optional: Author filter
"size": 10, // Optional: Results per page (1-100)
"offset": 0, // Optional: Pagination offset
"search_next_posts_token": "..." // Optional: Pagination token
}Response:
{
"text": "data breach",
"start_date": "2024-01-01",
"end_date": "2024-12-31",
"source": "darkforums",
"author": "username",
"size": 10,
"offset": 0,
"search_next_posts_token": "...",
"total_found": 42,
"has_more": false,
"posts": [...]
}- POST
/posts/check/link/- Verify if a post link exists
Request Body:
{
"link": "https://example.com/post/123" // Required: Valid HTTP/HTTPS URL
}Response:
{
"exists": true, // Boolean: Post existence status
"post": {...}, // Object: Post data if exists (null if not)
"link": "https://example.com/post/123" // String: Cleaned/validated link
}{
"type": "post",
"title": "Post Title",
"content": "Post content...",
"link": "https://example.com/post/123",
"date": "2024-01-01T12:00:00",
"author": {
"name": "username",
"link": "https://example.com/user/username"
},
"forum": {
"name": "Forum Name",
"link": "https://example.com/forum"
},
"source": "darkforums",
"replies": [...]
}All search endpoints support token-based pagination:
- Use
search_next_posts_tokenfrom previous response for next page - Check
has_moreflag to determine if more results are available total_foundprovides total matching records count
- Framework: FastAPI 0.115.13
- Database: Elasticsearch 8.17.0
- Validation: Pydantic 2.11.7+
- Server: Uvicorn 0.34.3
- Retry Logic: Tenacity 9.1.2
- Bearer token authentication (API key required)
- Rate limiting (5 requests per minute per IP)
- URL validation and sanitization
- Input validation with Pydantic models
- Request size limits (1MB max request, 512KB max JSON)
- Elasticsearch authentication support
- Secure error handling and logging
- Efficient Elasticsearch queries with proper indexing
- Token-based pagination for large datasets
- Connection pooling and retry mechanisms
- Configurable result sizes
The API provides standardized error responses:
- 400 Bad Request: Invalid input parameters
- 401 Unauthorized: Missing or invalid API key
- 422 Unprocessable Entity: Validation errors
- 429 Too Many Requests: Rate limit exceeded (5 requests/minute)
- 500 Internal Server Error: Server-side issues
- 503 Service Unavailable: Elasticsearch connectivity issues
uv run pre-commit run --all-filesInteractive API documentation available at:
- Swagger UI:
http://localhost:8080/docs, or use thehttp://127.0.0.1:8080/openapi.jsonand get the endpoints to PostMan. - ReDoc:
http://localhost:8080/redoc