Skip to content

hemathens/OHLCV-Signal-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OHLCV Signal Generation

This project implements a Python program that processes OHLCV (Open, High, Low, Close, Volume) data, computes rolling means, generates binary trading signals, and outputs structured metrics with detailed logging.

Project Structure

.
├── run.py              # Main application script
├── config.yaml         # Configuration file
├── data.csv            # Input dataset (10,000 rows of OHLCV data)
├── requirements.txt    # Python dependencies
├── Dockerfile          # Docker configuration
├── metrics.json        # Sample output metrics (generated)
├── run.log             # Sample log file (generated)
└── README.md           # This file

Requirements

  • Python 3.9 or higher
  • Docker (for containerized execution)

Local Installation and Execution

1. Install Dependencies

pip install -r requirements.txt

2. Run the Program

python run.py --input data.csv --config config.yaml --output metrics.json --log-file run.log

Command-Line Arguments

  • --input: Path to input CSV file containing OHLCV data
  • --config: Path to YAML configuration file
  • --output: Path to output JSON metrics file
  • --log-file: Path to log file

Docker Execution

1. Build the Docker Image

docker build -t mlops-task .

2. Run the Docker Container

docker run --rm mlops-task

The container will execute the program and print the final metrics JSON to stdout.

3. Extract Output Files (Optional)

To copy the generated files from the container:

docker run --name mlops-run mlops-task
docker cp mlops-run:/app/metrics.json ./metrics.json
docker cp mlops-run:/app/run.log ./run.log
docker rm mlops-run

Configuration

The config.yaml file contains the following parameters:

seed: 42          # Random seed for reproducibility
window: 5         # Rolling window size for mean calculation
version: "v1"     # Version identifier

Input Data Format

The input CSV file must contain the following columns:

  • timestamp: Timestamp of the data point
  • open: Opening price
  • high: Highest price
  • low: Lowest price
  • close: Closing price (required for calculations)
  • volume_btc: Volume in BTC
  • volume_usd: Volume in USD

Only the close column is used for signal generation.

Processing Logic

  1. Configuration Loading: Validates and loads configuration from YAML file
  2. Data Loading: Reads and validates CSV input file
  3. Rolling Mean Calculation: Computes rolling mean on close prices using the specified window size
    • First (window-1) rows will have NaN values and are excluded from signal computation
  4. Signal Generation: Creates binary signal where:
    • signal = 1 if close > rolling_mean
    • signal = 0 if close <= rolling_mean
  5. Metrics Computation: Calculates signal rate and processing latency
  6. Output Generation: Writes metrics JSON and detailed logs

Output Format

Success Output (metrics.json)

{
  "version": "v1",
  "rows_processed": 10000,
  "metric": "signal_rate",
  "value": 0.4991,
  "latency_ms": 25,
  "seed": 42,
  "status": "success"
}

Error Output (metrics.json)

{
  "version": "v1",
  "status": "error",
  "error_message": "Description of what went wrong"
}

Example Metrics Output

{
  "version": "v1",
  "rows_processed": 10000,
  "metric": "signal_rate",
  "value": 0.4991,
  "latency_ms": 25,
  "seed": 42,
  "status": "success"
}

Logging

The application generates detailed logs including:

  • Job start and end timestamps
  • Configuration validation and loading
  • Dataset loading and validation
  • Processing steps (rolling mean, signal generation)
  • Metrics summary
  • Error messages and stack traces (if applicable)

Error Handling

The program handles the following error cases:

  • Missing input file
  • Invalid CSV format
  • Empty input file
  • Missing required column (close)
  • Invalid configuration structure
  • Missing required configuration fields

In all error cases, the program:

  • Writes error metrics to the output JSON file
  • Logs detailed error information
  • Exits with non-zero exit code

Determinism

The program ensures deterministic results by:

  • Setting a fixed random seed from configuration
  • Using consistent rolling mean calculation
  • Producing reproducible signal generation

Running the program multiple times with the same inputs will produce identical results.

Exit Codes

  • 0: Success
  • 1: Error (with details in metrics.json and logs)

About

processes OHLCV (Open, High, Low, Close, Volume) data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors