Forecasting Time-Varying Intermarket Dependencies Between Cryptocurrencies and Conventional Assets Using Machine Learning

Master's Thesis — Bogdan Babaev M.Sc. Artificial Intelligence · University of Kragujevac, Serbia (2023–2026)

Overview

This project investigates whether cryptocurrency market dynamics can be used to forecast time-varying dependency structures between Bitcoin and conventional financial assets — equity indices, precious-metal ETFs, and the U.S. dollar index.

A fully reproducible machine-learning pipeline is developed covering: data acquisition, return construction, rolling-correlation target generation, feature engineering, walk-forward model evaluation, and benchmarking against a leakage-safe DCC-GARCH(1,1) specification.

Key finding: Intermarket dependency is forecastable out of sample. Ridge, AR(1), and HAR form a near-indistinguishable top cluster (avg RMSE 0.0656–0.0659, R² ≈ 0.942–0.943), confirming that serial persistence is the dominant driver. All ML models substantially outperform the DCC-GARCH econometric benchmark (avg RMSE 0.2136).

Asset Universe

Symbol	Asset	Type
BTC-USD	Bitcoin	Crypto (base)
ETH-USD	Ethereum	Crypto (reference)
^GSPC	S&P 500	Equity index
^IXIC	NASDAQ Composite	Equity index
GLD	SPDR Gold Shares ETF	Precious metal
SLV	iShares Silver Trust ETF	Precious metal
UUP	Invesco US Dollar Index	Currency

Daily prices 2017–2026 · source: Yahoo Finance

Methodology

Target: rolling Pearson correlation (BTC vs each asset) — windows: 14 / 30 / 60 / 90 days
Transform: Fisher-z (arctanh) for variance stabilization
Features: momentum, volatility, return-based predictors derived from the dependency series
Models: Naive_Last, AR(1), HAR, ElasticNet, Ridge, Adaptive Ensemble, Random Forest, GBM, XGBoost vs DCC-GARCH(1,1) benchmark (10 specifications total)
Evaluation: walk-forward expanding window (no data leakage)
Statistical tests: Diebold-Mariano with Newey-West correction
Signal layer: logistic classifier for investor stress-day detection on traditional assets

Project Structure

├── main.py                  # Entry point — runs full pipeline
├── run_all.py               # Full reproducibility runner (pipeline + notebooks)
├── config.yaml              # All settings
├── requirements.txt
│
├── thesis_app/
│   ├── pipeline.py          # Core ML pipeline
│   ├── dcc.py               # DCC-GARCH(1,1) benchmark
│   ├── dcc_walk.py          # Walk-forward DCC wrapper
│   ├── signal_layer.py      # Investor signal layer
│   ├── regime_analysis.py   # Regime detection utilities
│   ├── data_quality.py      # Data validation
│   └── notebook_helpers.py  # Shared notebook utilities
│
├── notebooks/
│   ├── 01_EDA_Dataset.ipynb        # Exploratory data analysis
│   ├── 02_GridSearch.ipynb         # Hyperparameter search (TimeSeriesSplit)
│   ├── 03_Model_Comparison.ipynb   # RMSE/R² across all pairs & windows
│   ├── 04_DM_Tests_Visuals.ipynb   # Diebold-Mariano tests & thesis figures
│   ├── 05_XGB_vs_DCC.ipynb         # XGBoost vs DCC-GARCH deep dive
│   ├── 06_Regime_Analysis.ipynb    # Dependency regime analysis
│   ├── 07_Robustness_Checks.ipynb  # Sensitivity & robustness
│   └── 08_Market_Events_Showcase.ipynb  # Landmark market-event case studies
│
├── data/                    # Auto-created on first run
├── outputs/                 # Auto-created: figures, results, predictions
└── models/                  # Auto-created: saved model artefacts

Quickstart

# 1. Clone
git clone https://github.com/b0gdaan/master-thesis.git
cd master-thesis

# 2. Create virtual environment (Python 3.14 recommended)
python -m venv .venv
.venv\Scripts\activate       # Windows
# source .venv/bin/activate   # Linux / macOS

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run full pipeline
python main.py

First run downloads ~10 years of price data from Yahoo Finance (~30 sec). Subsequent runs use the cached CSV in data/raw/.

No GPU? Set xgb_device: "cpu" in config.yaml. No DCC-GARCH? Set use_dcc_garch: false in config.yaml.

Notebooks

Run after main.py (outputs must exist):

jupyter notebook

Notebook	Purpose
`01_EDA_Dataset.ipynb`	Price/return EDA, ADF stationarity tests, correlation overview
`02_GridSearch.ipynb`	Hyperparameter tuning with TimeSeriesSplit CV
`03_Model_Comparison.ipynb`	RMSE / R² heatmaps, model ranking, LaTeX tables
`04_DM_Tests_Visuals.ipynb`	DM significance tests, publication-quality forecast plots
`05_XGB_vs_DCC.ipynb`	Error analysis, rolling RMSE, scatter diagnostics
`06_Regime_Analysis.ipynb`	Dependency regime detection and characterization
`07_Robustness_Checks.ipynb`	Sensitivity to rolling window length and refit frequency
`08_Market_Events_Showcase.ipynb`	Landmark market-event case studies (COVID, FTX, SVB, tariff shock, …)

Configuration

All parameters are in config.yaml:

Key	Default	Description
`base_asset`	`BTC-USD`	Base cryptocurrency
`rolling_windows`	`[14,30,60,90]`	Correlation window sizes (days)
`use_fisher_transform`	`true`	Fisher-z transform on correlation target
`use_dcc_garch`	`true`	Include DCC-GARCH(1,1) benchmark
`use_xgboost`	`true`	Include XGBoost model
`xgb_device`	`cuda`	`cuda` for GPU, `cpu` for CPU
`min_train_size`	`800`	Minimum training observations (walk-forward)
`refit_every`	`20`	Model refit frequency (trading days)
`n_parallel_workers`	`8`	Parallel experiments (ThreadPoolExecutor)
`enable_signal_layer`	`true`	Run investor stress-warning signal layer

Tech Stack

Python 3.14 · pandas 3.0 · numpy 2.4 · scikit-learn 1.8 · XGBoost 3.2 · arch 8.0 · scipy 1.17 · yfinance 1.2 · matplotlib · seaborn · Jupyter

Author

Bogdan Babaev M.Sc. student · Artificial Intelligence — University of Kragujevac, Serbia (2023–2026) github.com/b0gdaan · linkedin.com/in/b0gdaan

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
.vscode		.vscode
notebooks		notebooks
outputs		outputs
tests		tests
thesis		thesis
thesis_app		thesis_app
website		website
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
main.py		main.py
photo.JPG		photo.JPG
requirements.txt		requirements.txt
run_all.bat		run_all.bat
run_all.py		run_all.py
run_notebooks.bat		run_notebooks.bat
run_pipeline.bat		run_pipeline.bat
run_tests.bat		run_tests.bat
setup_dirs.py		setup_dirs.py
train_signal_model.py		train_signal_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forecasting Time-Varying Intermarket Dependencies Between Cryptocurrencies and Conventional Assets Using Machine Learning

Overview

Asset Universe

Methodology

Project Structure

Quickstart

Notebooks

Configuration

Tech Stack

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Forecasting Time-Varying Intermarket Dependencies Between Cryptocurrencies and Conventional Assets Using Machine Learning

Overview

Asset Universe

Methodology

Project Structure

Quickstart

Notebooks

Configuration

Tech Stack

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages