Skip to content

MrStarkEG/Leaks-Backend

Repository files navigation

LeaksBackend API

A powerful FastAPI-based application for managing and searching data leak information stored in Elasticsearch. This system provides comprehensive search capabilities for forum posts, link validation, and system health monitoring.

Overview

LeaksBackend is designed to index and search forum posts from various sources, allowing users to efficiently query leaked data across multiple dimensions including dates, authors, forums, and content. The application leverages Elasticsearch for high-performance search operations and provides a clean REST API interface.

Key Features

  • Multi-dimensional Search: Search by date ranges, usernames, forums, text content, and advanced filters
  • Text Content Search: Full-text search capabilities across post titles and content
  • Link Validation: Check if specific post links exist in the database
  • Pagination Support: Efficient token-based pagination for large datasets
  • Health Monitoring: Real-time Elasticsearch connectivity status
  • Input Validation: Robust data validation and sanitization
  • Structured Logging: Comprehensive request/response logging

Quick Start

Prerequisites

For Docker Setup (Recommended):

  • Docker & Docker Compose

For Local Development:

  • Python 3.12+
  • UV package manager
  • Elasticsearch 8.17.0+

Installation & Setup

Option 1: Docker Setup (Recommended)

  1. Clone and navigate to the project

    git clone https://github.com/MrStarkEG/Leaks-Backend
    cd leaks_backend
  2. Configure environment variables

    cp .env.example .env
    # Edit .env with your configuration (Elasticsearch URL will be overridden for Docker)
  3. Start the complete stack

    docker-compose up -d

This will start:

  • LeaksBackend API: http://localhost:8080

  • Elasticsearch: http://localhost:9200 (⚠️ Empty database - for testing only)

  • Kibana Dashboard: http://localhost:5601

    ⚠️ Important Note: The Docker Elasticsearch starts completely empty. API endpoints that search for data (like /posts/search/* and /posts/check/link/) will return 500 errors with index_not_found_exception until you add test data. This is expected behavior for a fresh Docker setup.

  1. Useful Docker commands
    # Check service status
    docker-compose ps
    
    # View logs
    docker-compose logs -f leaks-backend
    
    # Stop the stack
    docker-compose down
    
    # Rebuild application after code changes
    docker-compose up -d --build leaks-backend

Option 2: Local Development

  1. Clone and navigate to the project

    git clone https://github.com/MrStarkEG/Leaks-Backend
    cd leaks_backend
  2. Install dependencies using UV

    uv sync
  3. Configure environment variables

    cp .env.example .env
    # Edit .env with your Elasticsearch configuration
  4. Run the application

    uv run main.py

The API will be available at http://localhost:8080

Environment Configuration

The application uses a .env file for configuration. Copy .env.example to .env and configure:

# Elasticsearch Configuration
ELASTIC_URL=http://localhost:9200
ELASTIC_USERNAME=elastic
ELASTIC_PASSWORD=your_password_here

# API Authentication
API_KEY=your_api_key_here

# Logging Configuration (Optional)
LOG_LEVEL=INFO

Note for Docker users: The ELASTIC_URL will be automatically overridden to http://elasticsearch:9200 for container networking.

Expected Docker Behavior:

  • ✅ Health endpoints work: / and /elastic/ping
  • ❌ Search endpoints return 500 errors (no data available)
  • 💡 To test with real data, use local mode: uv run main.py

Note: Local development runs on port 8080. For Windows users, if port binding fails, try running as administrator or use a different port.

API Documentation

System Endpoints

Health Check

  • GET / - Basic application health check
    • Response: {"message": "Welcome to the LeaksBackEnd API, Please refer to the docs for any help!"}

Elasticsearch Monitoring

Connectivity Check

  • GET /elastic/ping - Test Elasticsearch connectivity
    • Authentication: Required (Bearer token)
    • Success Response: {"message": "pong"}
    • Error Response: 503 Service Unavailable if Elasticsearch is unreachable

Forum Data Endpoints

Date-Based Search

  • POST /posts/search/date/ - Search posts within a date range

Headers:

Authorization: Bearer your_api_key_here

Request Body:

{
  "start_date": "2024-01-01",           // Required: YYYY-MM-DD format
  "end_date": "2024-12-31",             // Required: YYYY-MM-DD format
  "sort_field": "date",                 // Optional: Field to sort by
  "sort_order": "desc",                 // Optional: asc/desc
  "size": 10,                           // Optional: Results per page (1-100)
  "offset": 0,                          // Optional: Pagination offset
  "search_next_posts_token": "..."      // Optional: Pagination token
}

Response:

{
  "start_date": "2024-01-01T00:00:00",
  "end_date": "2024-12-31T00:00:00",
  "size": 10,
  "offset": 0,
  "total_found": 150,
  "has_more": true,
  "search_next_posts_token": "1234567890,post_id",
  "posts": [...]
}

Username-Based Search

  • POST /posts/search/username/{username} - Search posts by author

Path Parameters:

  • username (string): Username to search for

Request Body: Same as date search (excluding required dates)

Response: Posts array with username-specific metadata


Forum-Based Search

  • POST /posts/search/forum/{forum} - Search posts from specific forum

Path Parameters:

  • forum (string): Forum name to search for

Request Body: Same as date search structure

Response: Posts array with forum-specific metadata


Advanced Search

  • POST /posts/search/advanced/ - Multi-criteria search

Request Body:

{
  "author": "username",                 // Optional: Filter by author
  "source": "darkforums",               // Optional: Filter by source
  "start_date": "2024-01-01",           // Optional: Date range start
  "end_date": "2024-12-31",             // Optional: Date range end
  "sort_field": "date",
  "sort_order": "desc",
  "size": 10,
  "offset": 0,
  "search_next_posts_token": "..."
}

Response: Posts array with applied filters and metadata


Text Content Search

  • POST /posts/search/text/ - Search posts by text content in title and content

Headers:

Authorization: Bearer your_api_key_here

Request Body:

{
  "text": "data breach",                  // Required: Text to search for
  "start_date": "2024-01-01",           // Optional: YYYY-MM-DD format
  "end_date": "2024-12-31",             // Optional: YYYY-MM-DD format
  "source": "darkforums",               // Optional: Source filter
  "author": "username",                 // Optional: Author filter
  "size": 10,                           // Optional: Results per page (1-100)
  "offset": 0,                          // Optional: Pagination offset
  "search_next_posts_token": "..."      // Optional: Pagination token
}

Response:

{
  "text": "data breach",
  "start_date": "2024-01-01",
  "end_date": "2024-12-31",
  "source": "darkforums",
  "author": "username",
  "size": 10,
  "offset": 0,
  "search_next_posts_token": "...",
  "total_found": 42,
  "has_more": false,
  "posts": [...]
}

Link Existence Check

  • POST /posts/check/link/ - Verify if a post link exists

Request Body:

{
  "link": "https://example.com/post/123"  // Required: Valid HTTP/HTTPS URL
}

Response:

{
  "exists": true,                       // Boolean: Post existence status
  "post": {...},                        // Object: Post data if exists (null if not)
  "link": "https://example.com/post/123" // String: Cleaned/validated link
}

Data Models

Post Structure

{
  "type": "post",
  "title": "Post Title",
  "content": "Post content...",
  "link": "https://example.com/post/123",
  "date": "2024-01-01T12:00:00",
  "author": {
    "name": "username",
    "link": "https://example.com/user/username"
  },
  "forum": {
    "name": "Forum Name",
    "link": "https://example.com/forum"
  },
  "source": "darkforums",
  "replies": [...]
}

Pagination

All search endpoints support token-based pagination:

  • Use search_next_posts_token from previous response for next page
  • Check has_more flag to determine if more results are available
  • total_found provides total matching records count

=' Technical Details

Architecture

  • Framework: FastAPI 0.115.13
  • Database: Elasticsearch 8.17.0
  • Validation: Pydantic 2.11.7+
  • Server: Uvicorn 0.34.3
  • Retry Logic: Tenacity 9.1.2

Security Features

  • Bearer token authentication (API key required)
  • Rate limiting (5 requests per minute per IP)
  • URL validation and sanitization
  • Input validation with Pydantic models
  • Request size limits (1MB max request, 512KB max JSON)
  • Elasticsearch authentication support
  • Secure error handling and logging

Performance

  • Efficient Elasticsearch queries with proper indexing
  • Token-based pagination for large datasets
  • Connection pooling and retry mechanisms
  • Configurable result sizes

Error Handling

The API provides standardized error responses:

  • 400 Bad Request: Invalid input parameters
  • 401 Unauthorized: Missing or invalid API key
  • 422 Unprocessable Entity: Validation errors
  • 429 Too Many Requests: Rate limit exceeded (5 requests/minute)
  • 500 Internal Server Error: Server-side issues
  • 503 Service Unavailable: Elasticsearch connectivity issues

Code Quality

uv run pre-commit run --all-files

API Documentation

Interactive API documentation available at:

  • Swagger UI: http://localhost:8080/docs , or use the http://127.0.0.1:8080/openapi.json and get the endpoints to PostMan.
  • ReDoc: http://localhost:8080/redoc

About

This is an tool that fetches data from elastic regarding specific cluster and filters it.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors