This project implements a Python program that processes OHLCV (Open, High, Low, Close, Volume) data, computes rolling means, generates binary trading signals, and outputs structured metrics with detailed logging.
.
├── run.py # Main application script
├── config.yaml # Configuration file
├── data.csv # Input dataset (10,000 rows of OHLCV data)
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
├── metrics.json # Sample output metrics (generated)
├── run.log # Sample log file (generated)
└── README.md # This file
- Python 3.9 or higher
- Docker (for containerized execution)
pip install -r requirements.txtpython run.py --input data.csv --config config.yaml --output metrics.json --log-file run.log--input: Path to input CSV file containing OHLCV data--config: Path to YAML configuration file--output: Path to output JSON metrics file--log-file: Path to log file
docker build -t mlops-task .docker run --rm mlops-taskThe container will execute the program and print the final metrics JSON to stdout.
To copy the generated files from the container:
docker run --name mlops-run mlops-task
docker cp mlops-run:/app/metrics.json ./metrics.json
docker cp mlops-run:/app/run.log ./run.log
docker rm mlops-runThe config.yaml file contains the following parameters:
seed: 42 # Random seed for reproducibility
window: 5 # Rolling window size for mean calculation
version: "v1" # Version identifierThe input CSV file must contain the following columns:
timestamp: Timestamp of the data pointopen: Opening pricehigh: Highest pricelow: Lowest priceclose: Closing price (required for calculations)volume_btc: Volume in BTCvolume_usd: Volume in USD
Only the close column is used for signal generation.
- Configuration Loading: Validates and loads configuration from YAML file
- Data Loading: Reads and validates CSV input file
- Rolling Mean Calculation: Computes rolling mean on close prices using the specified window size
- First (window-1) rows will have NaN values and are excluded from signal computation
- Signal Generation: Creates binary signal where:
signal = 1ifclose > rolling_meansignal = 0ifclose <= rolling_mean
- Metrics Computation: Calculates signal rate and processing latency
- Output Generation: Writes metrics JSON and detailed logs
{
"version": "v1",
"rows_processed": 10000,
"metric": "signal_rate",
"value": 0.4991,
"latency_ms": 25,
"seed": 42,
"status": "success"
}{
"version": "v1",
"status": "error",
"error_message": "Description of what went wrong"
}{
"version": "v1",
"rows_processed": 10000,
"metric": "signal_rate",
"value": 0.4991,
"latency_ms": 25,
"seed": 42,
"status": "success"
}The application generates detailed logs including:
- Job start and end timestamps
- Configuration validation and loading
- Dataset loading and validation
- Processing steps (rolling mean, signal generation)
- Metrics summary
- Error messages and stack traces (if applicable)
The program handles the following error cases:
- Missing input file
- Invalid CSV format
- Empty input file
- Missing required column (
close) - Invalid configuration structure
- Missing required configuration fields
In all error cases, the program:
- Writes error metrics to the output JSON file
- Logs detailed error information
- Exits with non-zero exit code
The program ensures deterministic results by:
- Setting a fixed random seed from configuration
- Using consistent rolling mean calculation
- Producing reproducible signal generation
Running the program multiple times with the same inputs will produce identical results.
0: Success1: Error (with details in metrics.json and logs)