The reference document. Every component, every decision rule, every parameter. Read linearly to understand the whole system; jump to a section to look something up.
Companion documents: methodology.html (executive deck, 12 slides) · PRESENTATION.md (running order for the in-class talk) · README.md (portfolio-grade summary).
- What the app does, end-to-end
- The asset universe
- The intake — three tiers, three axes
- Profile scoring — how answers become a number
- The five profiles
- The five allocation engines
- The graph-methods overlay
- The regime-aware tilt — the headline feature
- The macro layer
- The backtest
- The explainability layer — IPS, parameters, risk declaration
- The UI — six tabs
- What's deliberately not in the system
- Honest limitations
- Module map
A user opens the Streamlit app. They are shown three intake options — Quick (6 questions), Standard (15), Detailed (24). They pick a depth and answer the questions.
Their answers feed a transparent scoring rule that produces an integer 1–5 (the profile). The profile determines which of five allocation engines runs. The engine reads the last 252 days of returns from the asset universe, solves an optimisation problem appropriate to the profile, and produces a vector of weights summing to 1.
The result page shows the user six things:
- The allocation — pie chart, class breakdown, expected return, expected vol, Sharpe.
- A risk declaration — what this portfolio is designed to do, what it is not, and what could break it.
- The parameters explained — every numerical knob the engine used, with rationale and limitation.
- The asset network — the correlation graph, browsable across time windows and graph types.
- The macro dashboard — current regime, indicator history, asset × macro correlations.
- The backtest — the strategy played forward over the last 5+ years, instantly or animated.
There is also an optional toggle: graph-based methods. With this on, the engine uses Hierarchical Risk Parity for the Balanced profile, and a centrality-aware tilt for the others. A second toggle — regime-aware tilt — makes that tilt strength a function of the network's tightness, automatically strengthening when correlations cluster.
That's the whole product. No LLM. No black box. Every output traces back to either a published technique with a citation or a hand-written rule we can defend.
The system holds 14 ETFs chosen to span the major liquid asset classes. All are listed on US exchanges, all priced in USD, all available free via yfinance. The choice was made for breadth of asset class with minimum redundancy, not for any view on what should outperform.
| Class | Tickers | What they are |
|---|---|---|
| US Equity | SPY, IWM | S&P 500, Russell 2000 (small caps) |
| Developed International | EFA | EAFE — Europe, Australasia, Far East |
| Emerging Markets | EEM | MSCI Emerging Markets |
| Equity Factors | MTUM, IVE, QUAL | Momentum, Value, Quality |
| Long-Duration Treasury | TLT | 20+ year US Treasury |
| Intermediate Treasury | IEF | 7–10 year US Treasury |
| Investment-Grade Credit | LQD | iBoxx investment-grade corporate |
| High-Yield Credit | HYG | iBoxx high-yield corporate |
| Gold | GLD | Physical gold |
| Real Estate | VNQ | US REITs |
| Cash | BIL | 1–3 month T-bills |
This is 5 equity sleeves, 4 fixed-income sleeves, and 3 real/cash sleeves. It's enough variety to give graph-methods something to work with (different cluster structures emerge across regimes), but few enough nodes that hand-built strategies remain interpretable.
A second dictionary in the codebase, MARKET_CAPS, holds approximate cap weights for the Black-Litterman implied-equilibrium step. These are static rather than live (a known simplification — see §14).
Why 14 specifically? Small enough that an analyst can reason about each holding, large enough that diversification is non-trivial, and structurally varied enough that the asset network has interesting topology. With 14 nodes we get clear clusters — equity bloc, duration bloc, credit bloc, real-asset bloc, cash — that emerge unsupervised from the correlation matrix.
The user picks how thorough they want to be. The three tiers exist because in real onboarding flows there's a tension: too few questions and the recommendation is unjustifiable; too many and most users abandon. We let the user pick the trade-off.
The minimum viable intake. Three questions on capacity (horizon, goal, income stability), two on tolerance (drawdown response, vol preference), one on knowledge (familiarity with markets).
Adds:
- About you — age, employment status, dependents
- Financial situation — emergency fund, savings rate
- Goal — withdrawal need
- Risk tolerance — second behavioral question (loss recovery scenario)
- Knowledge — years of experience, products used
Adds:
- About you — household structure
- Balance sheet — net worth band, share of wealth, debt situation
- Goal — goal type (retirement, house, education, legacy)
- Risk tolerance — Grable-Lytton-style lottery question, regret aversion
- Constraints — ESG preference, asset-class exclusions
The constraint questions (ESG, exclusions) are recorded for the IPS but do not currently filter the asset universe. They're in the schema so the IPS export is complete.
Every question is tagged with one of four axes:
- Capacity — objective ability to bear risk (income, horizon, dependents, emergency fund). Not negotiable; if income is unstable, capacity is low regardless of preference.
- Tolerance — subjective willingness to bear risk (drawdown response, regret aversion). Behavioral.
- Knowledge — familiarity with markets and products. Acts as a regulatory soft cap.
- Constraint — recorded preferences that don't enter the score (ESG, exclusions).
The fundamental design rule comes from Reilly & Brown's Investment Analysis and Portfolio Management: a real Investment Policy Statement is built from capacity, willingness, and knowledge. Capacity tells you what risk the client can bear; tolerance tells you what risk they will bear. Knowledge tells you what products they understand. The lower of capacity and tolerance is the binding constraint; knowledge gates against putting them in products they can't evaluate.
The scoring rule is intentionally simple. Three steps:
Each question option has a score from 1 (low) to 5 (high). For each axis (capacity, tolerance, knowledge), take the unweighted mean of the answers tagged to that axis.
cap̄ = mean(scores of capacity-tagged answers)
tol̄ = mean(scores of tolerance-tagged answers)
know̄ = mean(scores of knowledge-tagged answers)
raw = 0.4 · cap̄ + 0.4 · tol̄ + 0.2 · know̄
The 40/40/20 weighting reflects the Reilly & Brown framework: capacity and tolerance carry equal weight in determining the appropriate risk level; knowledge is a smaller modifier because it's a competence question, not a risk-bearing question.
if know̄ < 3: raw = min(raw, 4) # no Aggressive bucket for low-knowledge clients
if know̄ < 2: raw = min(raw, 3) # no Growth either, only up to Balanced
profile = clip(round(raw), 1, 5)
This implements the spirit of MiFID II Article 25(2) and the ESMA Suitability Guidelines: a product must be appropriate to the client's knowledge and experience, not just their stated risk preference. A client who self-reports as "no knowledge of markets" cannot be put into a Max-Sharpe portfolio even if they answer aggressively elsewhere.
The crucial design property: the scoring rule is identical across Quick, Standard, and Detailed. A user who scores Profile 3 in the Quick tier and a user who scores Profile 3 in the Detailed tier get the same allocation. The deeper tiers add resolution — more axes contribute to each mean — but don't change the bucket boundaries. This makes results commensurable across tiers.
It does not use machine learning. It does not learn from past clients. It does not embed neural-network-style preference models. It is a rule a compliance officer can read and approve in 30 seconds, which is the point.
The five buckets are defined by the binding allocation philosophy, not by a target allocation. Each profile gets a label, a strategy summary, a target risk level, and a recommended horizon.
| # | Label | Engine | Vol target | Max DD target | Horizon |
|---|---|---|---|---|---|
| 1 | Capital Preservation | CPPI overlay | < 3% annualised | < 5% | < 3 years |
| 2 | Conservative | GMV with shrinkage | 4–6% | < 10% | 3–7 years |
| 3 | Balanced | Equal Risk Contribution | 6–9% | < 15% | 5–10 years |
| 4 | Growth | Black-Litterman | 9–13% | < 25% | 7–15 years |
| 5 | Aggressive | Max-Sharpe (capped) | 13–18% | < 35% | > 10 years |
The progression is monotonic: each profile takes more risk than the previous one in exchange for more expected return. The drawdown targets are guardrails for the IPS — if a strategy materially exceeds its DD target out-of-sample, the IPS makes that visible to the client.
What the profile does not capture: age, lifecycle, glide path, withdrawal phase. A 25-year-old Aggressive client and a 60-year-old Aggressive client get the same allocation today. In production this would need to be fixed with a target-date overlay. We document this as an honest limitation rather than build a half-baked version.
This is the core of the system. Each engine is one of the canonical solutions to the portfolio construction problem, chosen to match the profile it serves.
What it does. Constant Proportion Portfolio Insurance. A dynamic allocation rule that guarantees a capital floor under continuous rebalancing.
The formula.
e_t = m · (a_t − f_t)
where e_t is the dollar allocation to the risky sleeve, a_t is current wealth, f_t is the floor (which drifts up at the risk-free rate), and m is the multiplier.
Parameters used.
- Floor = 90% of initial capital, drifting up at 2%/year
- Multiplier m = 3
- Risky sleeve = ERC-balanced portfolio of risk assets (equity + credit + real assets); HRP under graph mode
- Safe sleeve = BIL (1–3 month T-bills)
Why m = 3. The multiplier sets the maximum tolerable one-period drop. With m = 3, the strategy survives a ~33% instantaneous drop in the risky sleeve before the floor is breached. Lower m = more protection but less upside; higher m = more upside but more risk of breaching the floor in a fast crash.
What could break it.
- A gap-down move larger than 1/m before the next rebalance can pierce the floor.
- Illiquidity in the safe asset (e.g. a repo dislocation affecting BIL) disrupts the protection mechanism.
- Negative real rates erode the floor in real terms — the nominal floor stays, but inflation can eat 5–10% of real value over a multi-year stress.
Honest limitation. CPPI gives up upside in trending markets in exchange for downside protection. A static 60/40 portfolio outperforms CPPI in steady bull markets; CPPI outperforms in choppy or crisis markets. It's a behavioral product as much as a financial one — the client values the floor.
Citation. Perold & Sharpe (1988), Dynamic Strategies for Asset Allocation, Journal of Portfolio Management.
What it does. Solves for the long-only portfolio with the lowest variance. No view on returns is needed — only the covariance matrix is estimated.
The optimisation.
w* = arg min w⊤ Σ̂ w
subject to Σw = 1, 0 ≤ w_i ≤ 0.30
where Σ̂ is a shrunken covariance matrix.
Why shrinkage. The sample covariance matrix is unstable when the number of observations is not much larger than the number of assets. With 14 assets and 252 days, our T/N ratio is 18 — usable but noisy. Ledoit & Wolf (2003) showed that shrinking the sample covariance toward a structured target (here, a constant-correlation matrix with sample variances) reduces estimation error in the eigenvalues, which is precisely the source of GMV's instability.
Parameters used.
- Shrinkage intensity δ = 0.5 (50% sample, 50% target)
- Target = constant-correlation matrix
- Estimation window = 252 trading days
- Per-asset weight cap = 30%
Expected behavior. Weights concentrate in low-volatility, low-correlation assets — typically short-duration treasuries, cash, and quality factor ETFs. Equity allocation is modest by design.
Honest limitation. GMV ignores expected returns. It will hold an asset with zero expected return if doing so lowers portfolio variance. This is a feature in stable regimes, but a liability when "safe" assets re-rate adversely (the canonical example: 2022, when long-duration treasuries lost ~30% as rates rose — GMV would have been heavily long them through the loss).
Citation. Ledoit & Wolf (2003), Honey, I Shrunk the Sample Covariance Matrix, Journal of Portfolio Management.
What it does. Risk Parity. Each asset contributes the same amount to total portfolio variance.
The optimisation. Define the marginal risk contribution of asset i as MRC_i = (Σw)_i / √(w⊤Σw). The risk contribution is RC_i = w_i · MRC_i. ERC solves for weights such that all RC_i are equal.
Why this is the balanced default. ERC makes no assumptions about expected returns. It produces a portfolio that's diversified in risk space rather than in dollar space — which means bonds end up dollar-overweight (because they're low-vol) and equities end up dollar-underweight (because they're high-vol). For a non-expert investor who wants "balance" without committing to a specific equity/bond mix, ERC gives the right answer.
Solver. scipy.optimize.minimize on the squared deviation from equal risk contributions, with sum-to-one and long-only constraints. Seeded from inverse-volatility weights, SLSQP method, tolerance 1e-9 on the RC dispersion.
Parameters used.
- Covariance estimator = sample, 252-day window
- Seed weights =
1/σ_i, normalised - Solver tolerance = 1e-9
Graph variant. Hierarchical Risk Parity (López de Prado 2016) — see §7.3.
Honest limitation. ERC implicitly assumes all assets have the same Sharpe ratio. In regimes where one asset class earns disproportionately better risk-adjusted returns (e.g. late-cycle equities), ERC under-weights that asset relative to what would be optimal.
Citation. Maillard, Roncalli & Teïletche (2010), The Properties of Equally Weighted Risk Contribution Portfolios, Journal of Portfolio Management.
What it does. A Bayesian update of the market-cap-implied equilibrium with explicit views. Solves the central problem of mean-variance optimisation: garbage-in, garbage-out from sample-mean estimates of expected returns.
The two-step formulation.
Step 1: Reverse-optimise the cap-weighted portfolio to get the implied equilibrium returns:
π = δ · Σ · w_mkt
where δ is the representative investor's risk aversion and w_mkt is the cap-weight vector.
Step 2: Update with views via Bayes:
μ_BL = [(τΣ)⁻¹ + P⊤Ω⁻¹P]⁻¹ · [(τΣ)⁻¹π + P⊤Ω⁻¹q]
where (P, q, Ω) encode the views: P selects which assets the view is on, q is the view's expected value, and Ω is the view's uncertainty.
Step 3: Mean-variance optimise using μ_BL and Σ.
Why this is the growth default. Growth investors expect a positive equity risk premium. Black-Litterman lets us encode that view explicitly with calibrated confidence, without abandoning the diversification structure of the equilibrium. It gives the user a portfolio that's tilted toward equities relative to the cap-weight, but not catastrophically so.
Parameters used.
- τ = 0.05 (overall uncertainty in the prior)
- δ = 2.5 (risk aversion of the representative investor)
- Default view = US equities outperform long-duration treasuries by 2%/year
- View confidence calibrated to
Ω = 0.5 · PτΣP⊤(moderate confidence) - Per-asset cap = 30%
Honest limitation. Requires market caps for the implied equilibrium. We use a fixed cap-weight vector for our 14 ETFs rather than live AUMs — a known simplification. The default view is hard-coded; in production, a real robo-advisor would either elicit views from the user or remove views entirely (in which case BL collapses to the equilibrium).
Citation. Black & Litterman (1992), Global Portfolio Optimization, Financial Analysts Journal.
What it does. Solves for the portfolio with the highest reward-to-risk ratio on the long-only efficient frontier.
The optimisation.
w* = arg max (μ̂⊤w − r_f) / √(w⊤Σ̂w)
subject to Σw = 1, 0 ≤ w_i ≤ 0.30
Why bounded. The unbounded tangency portfolio is famously unstable — small changes in the estimated mean produce large weight shifts (Michaud 1989's "error-maximisation" critique of mean-variance). The 30% per-asset cap is a regularisation: it bounds out-of-sample turnover and prevents the optimiser from concentrating in a single high-Sharpe outlier that may be a noise estimate.
Parameters used.
- r_f = 2% annualised
- μ estimator = sample mean over 252 days, with exponential decay (λ = 0.94)
- Σ estimator = sample covariance, 252-day window
- Per-asset cap = 30%
Why exponential decay on μ. The sample mean is the worst estimator of the true mean — it has high variance. We don't fix that, but we do make μ more responsive to recent regime changes by exponentially down-weighting old observations. This is a known compromise: more reactive but more noisy.
What could break it.
- Regime change in expected returns. If the past 252 days were a bull market and the next 252 are a bear, the optimiser was looking at the wrong target.
- Covariance breakdown. In stress, correlations spike toward 1; the diagonalisation of Σ that the optimiser exploits collapses.
- Concentration at the cap. If the optimiser is repeatedly pinned at 30% in one asset, the cap is binding — meaning the unconstrained answer would be far more concentrated. In production, you'd want this to trigger a manual review.
Honest limitation. Mean-variance optimisation rewards estimation error. The "best" portfolio in-sample is rarely the best out-of-sample. Bounding the weights helps, but doesn't fix it. For an aggressive client, we accept this risk in exchange for the upside.
Citation. Markowitz (1952), Sharpe (1966), Merton (1972).
A toggle in the sidebar. With graph methods on, the system applies graph-theoretic techniques to the correlation matrix as either an alternative engine (HRP) or a post-processing tilt (centrality). All of this runs on classical graph theory — it is not graph neural networks. We chose this deliberately; see §13.
Convert the correlation matrix into a distance matrix using the Mantegna (1999) metric:
d_ij = √(2 · (1 − ρ_ij))
This converts a correlation matrix into a proper metric space — d_ii = 0, d_ij ≥ 0, and the triangle inequality holds. With this distance, two assets that move together perfectly have distance 0; two that are uncorrelated have distance √2; two that move opposite have distance 2.
The network is then a fully connected graph on the 14 assets, with edge weights equal to these distances.
The MST is the subset of n−1 edges that connect all 14 nodes with minimum total distance. This surfaces the dominant correlation structure — the single most-similar neighbour for each asset, chained together.
In normal regimes the MST shows clear cluster structure: equities form one chain, bonds another, gold sits as a peripheral node. In crisis regimes the chains shorten and equity-bond cross-edges appear — a visualisation of correlation breakdown.
HRP (López de Prado 2016) is an alternative to ERC for the Balanced profile. It works in three steps:
- Tree clustering — agglomerative clustering on the distance matrix produces a binary hierarchical tree.
- Quasi-diagonalisation — reorder the rows/columns of Σ according to the tree, so similar assets are adjacent.
- Recursive bisection — starting from the full universe, split into two halves at the top of the tree. Allocate to each half by inverse-variance. Recurse on each half.
Why it might be better than ERC in some regimes. HRP avoids inverting Σ, which is the source of ERC's instability when the covariance matrix is near-singular. It produces weights that respect the hierarchical structure — assets in the same cluster get treated as a group rather than independently.
Why it might be worse in others. HRP doesn't enforce equal risk contribution at the asset level, only at each split. In universes with very unequal cluster sizes, this can produce concentration that ERC would have avoided.
We expose both — HRP for graph mode, ERC for standard. Users can compare.
The leading eigenvector of the correlation matrix's adjacency. A node has high eigenvector centrality if it co-moves with many other high-centrality nodes — i.e., it's systemic.
In a typical 14-ETF universe, SPY and IWM have the highest centrality (they're the equity-bloc anchors); BIL has the lowest (cash is structurally uncorrelated). HYG often shows up as a high-centrality bridge between equity and credit.
We use centrality as the input to the centrality tilt, which down-weights central assets and up-weights peripheral ones. The intuition: in a stress event, the central nodes are the contagion vectors; concentration in them is structurally risky.
For each weight w_i from the chosen engine, multiply by exp(−s · z_i) where z_i is the standardised eigenvector centrality of asset i and s is the tilt strength. Then renormalise.
w_i_tilted = w_i · exp(−s · z_i)
w_final = w_tilted / sum(w_tilted)
With s = 0, no tilt. With s = 0.6, central assets are aggressively down-weighted. The default static value is s = 0.30.
Where the tilt is applied. When graph mode is on, the tilt is applied to Profiles 2, 4, and 5 (GMV, Black-Litterman, Max-Sharpe). Profile 3 uses HRP instead, and Profile 1 uses HRP for its risky sleeve.
The static centrality tilt is cosmetic — it adjusts weights but doesn't react to anything. The regime-aware tilt is the genuine novelty in this project: the tilt strength becomes a function of the network's tightness, so the same model behaves differently in different regimes.
At each rebalance, compute:
ρ̄ = mean(|ρ_ij|) for all i ≠ j
The average absolute off-diagonal correlation across the asset universe. In calm regimes this sits around 0.30 for our 14-asset multi-class universe. In a crisis it spikes toward 0.65 or higher — every asset's correlation with every other asset moves toward the equity-equity correlation.
This single number is the system's regime indicator. It's deterministic, computed from the same data the engines already use, and has a clean financial interpretation.
A linear ramp:
if ρ̄ ≤ low: s = base
if ρ̄ ≥ high: s = max
otherwise: s = base + (ρ̄ − low) / (high − low) · (max − base)
Calibrated thresholds for our 14-ETF universe:
- low = 0.30 (calm — no tilt beyond the baseline)
- high = 0.65 (full crisis — maximum tilt)
- base = 0.10 (always some structural anti-concentration)
- max = 0.60 (aggressive de-risking when needed)
These thresholds are calibrated for our specific universe by inspecting the empirical distribution of ρ̄ across the 2014–2024 sample. They would need recalibration for a different universe (a single-asset-class universe would have higher baseline tightness; a more diversified one lower).
Same as the static tilt, but with s now varying:
w_i_tilted = w_i · exp(−s(ρ̄) · z_i)
w_final = w_tilted / sum(w_tilted)
In March 2020, the COVID crash, ρ̄ spiked from ~0.32 (pre-crisis baseline) to ~0.55 in late March. The tilt strength rose from 0.10 to ~0.42. The high-centrality assets (SPY, IWM, HYG) were down-weighted; the peripheral assets (GLD, TLT, BIL) were up-weighted. The portfolio still lost money — the tilt doesn't avoid losses — but the loss was less concentrated.
In 2022's rate-driven drawdown, ρ̄ rose modestly but didn't spike (it was more of a slow re-rating than a panic). The tilt strengthened mildly but stayed in its low range. The portfolio took the bond losses in stride.
A static centrality adjustment is an aesthetic choice; it doesn't earn its keep. A regime-conditional adjustment is a dynamic risk control — its behavior in 2020 is materially different from its behavior in 2017. That's the difference between "we used graph methods" and "we used graph methods to do something useful."
This is the headline of the methodology deck and the README. It's also the most defensible novelty of the project, because the formula is fully explicit, every parameter is calibrated to historical data we can show, and the mechanism is interpretable end-to-end.
A separate tab in the UI shows macro indicators alongside the asset universe. The macro data does not drive allocation in the current build (see §13 for why). It serves three purposes:
Five series, all available without an API key via yfinance:
- ^VIX — CBOE volatility index (equity vol regime)
- ^TNX — 10-year US Treasury yield (rates level)
- ^TYX — 30-year US Treasury yield (long rates)
- DX-Y.NYB — US Dollar Index (currency regime)
- CL=F — WTI crude oil futures (commodity / inflation proxy)
Plus two derived series:
- Term spread = 30Y − 10Y
- VIX z-score (1-year rolling)
Caveat: these are proxies, not the underlying variables. ^TNX is the CBOE 10-year yield index — close to the actual 10-year yield but not identical. For research-grade work you'd use FRED's DGS10. For a teaching demo with no API keys, ^TNX is fine.
A simple, transparent rule-based classifier categorises the current macro state across four dimensions:
- Volatility: VIX < 15 = calm, < 25 = normal, ≥ 25 = stressed
- Rates: 10Y < 2% = low, < 4.5% = neutral, ≥ 4.5% = high
- Curve: spread < 0 = inverted, < 0.4% = flat, < 1.0% = normal, ≥ 1.0% = steep
- Dollar: percentile of DXY over the sample — bottom third = weak, middle = neutral, top = strong
The classification is presented as four cards on the Macro Dashboard. It's not predictive; it's a snapshot. The point is to give the user (and the demo audience) language for the current regime: "vol = stressed, rates = neutral, curve = flat, USD = strong" is more useful than seeing five charts and trying to integrate them mentally.
A heatmap showing each asset's correlation with each macro factor over the last 3 years. For yields, USD, oil we use changes; for VIX we use the level.
This is purely diagnostic — it lets you confirm that the data is sane (TLT should be strongly negatively correlated with the 10Y yield change; if it isn't, there's a data problem) and surfaces asset behaviors worth knowing (gold's correlation with the dollar, oil's correlation with EEM, etc.).
A walk-forward simulation that estimates weights from a rolling window, holds for a fixed period, then rebalances.
- At each day t, look at the prior 252 trading days of returns (the estimation window).
- Run the profile's engine on this window to produce weights.
- Hold those weights for 63 trading days (~3 months, quarterly rebalance).
- After 63 days, re-estimate from the new prior 252 days. Repeat.
This is genuinely out-of-sample at every step — at no point does the engine see future returns. The strategy's daily P&L is pnl_t = w_t⊤ · r_t, where w_t is the most recent set of weights and r_t is the realised return vector for day t.
The strategy is plotted alongside two benchmarks:
- Equal Weight — naive 1/N across all 14 ETFs, rebalanced daily.
- SPY (100% equity) — the lazy benchmark.
These bracket the reasonable space: if your sophisticated strategy can't beat 1/N, something's wrong. If it can't beat SPY in raw return but does so with materially less drawdown, that's the trade-off you sold to the client.
A second button on the Backtest tab plays the walk-forward forward in time, day-by-day, with a slider you can scrub. This is not just a UI gimmick — it's the "place yourself two days ago" feature. Drag the slider to March 12, 2020 and you see exactly what the strategy looked like as the COVID crash unfolded. The animation is downsampled to 120 frames so it stays snappy.
- Annualised return =
mean(daily_pnl) · 252 - Annualised volatility =
std(daily_pnl) · √252 - Sharpe ratio =
(ann_return − rf) / ann_volwithrf = 2% - Max drawdown = the maximum peak-to-trough decline over the path
- No transaction costs. This overstates real-world performance, especially for the higher-turnover engines (Max-Sharpe in particular). For a 14-ETF universe at quarterly rebalance, realistic costs would be in the 5–15 basis points per year range.
- No taxes, no wash-sale logic. A real robo-advisor would do tax-loss harvesting; we don't.
- No slippage on rebalance days. All rebalances assume execution at close.
- Quarterly rebalance is hard-coded. Not optimised. Monthly would mean more turnover and more reactivity; annual would mean more drift and more momentum exposure.
Three separate tabs in the UI, all built from hand-written content, no LLM in the loop.
A plain-language statement per profile covering:
- Strategy summary — one paragraph on what the engine does.
- Volatility target — the band we expect (e.g. "9–13% annualised").
- Max drawdown target — the worst case we plan for (e.g. "< 25%").
- Recommended horizon — how long the client should commit.
- Key risks — three to five bullet points specific to this strategy.
- What could break it — three to five bullet points on regime conditions where the engine fails.
These are written in the Reilly & Brown / MiFID II style: tell the client what the product is, what it isn't, and what could go wrong. No marketing copy.
Every numerical parameter in the engine has an entry covering:
- What it does — the technical role of the parameter.
- Why this value — the rationale for the specific number we picked.
- Limitation — what assumption this parameter encodes that may not hold.
Example, for the CPPI multiplier:
Multiplier (m) = 3 What it does: Sets the leverage of the risky-sleeve allocation relative to the cushion above the floor. Each 1% of cushion buys m% of risky exposure. Why this value: m=3 survives a one-period drop of up to ~33% in the risky sleeve before the floor is breached. Common range in practice is 2 to 5; m=3 is a moderate choice that gives meaningful upside participation while preserving the floor under typical equity-class drawdowns. Limitation: Assumes continuous rebalancing. A gap-down move larger than 1/m between rebalances can pierce the floor.
Every knob in every engine has one of these entries. The glossary is built once at module load; it does not change per user.
The user can download a Markdown Investment Policy Statement that combines:
- Their profile and tier
- The chosen engine and its parameters
- Their actual allocation in dollars and percentages
- The full risk declaration for their profile
- The relevant parameter glossary entries
This is what a real onboarding flow would produce as the regulator-facing artifact. It's templated string formatting, not generated content — every line was written by us in the source code, with the user's specific numbers plugged in.
The result page is organised into six tabs. Each tab answers a specific question.
| Tab | The question it answers |
|---|---|
| Allocation | "What did the model recommend?" |
| Risk Declaration | "What is this portfolio designed to do, and what isn't it?" |
| Parameters Explained | "Why is every number what it is?" |
| Asset Network | "What's the correlation structure I'm sitting on?" |
| Macro Dashboard | "What regime are we in right now?" |
| Backtest | "How would this have done over the last 5+ years?" |
The tabs are deliberately not sequential — there's no required order. A novice user might only look at Allocation. A more sophisticated one might dig into Parameters Explained. A skeptic might go straight to Backtest. Each tab stands alone.
The sidebar (collapsed by default) holds the two graph-methods toggles (graph methods on/off, regime-aware tilt on/off) and the centrality tilt strength slider when in static-tilt mode. These are advanced controls that don't intrude on the default experience.
The intake (tier picker → questionnaire → result) is a single linear flow with a "Start over" button at the bottom of the result page.
Several capabilities are absent on purpose. The reasoning matters more than the absence.
No LLM, no Claude, no GPT, no foundation-model API anywhere in the codebase. The risk declarations and parameter glossary entries are hand-written; the IPS export is templated string formatting. This is a deliberate design choice — every output traces back to either a published technique with a citation or text we authored.
In May 2026, every robo-advisor demo says "AI-powered." We chose not to. For quant-finance audiences this lands well: classical, traceable, defensible.
We use graph theory (MST, eigenvector centrality, hierarchical clustering). We do not use graph neural networks. With 14 nodes and ~10 years of daily data, a GNN would be overfit to the point of meaninglessness, and we couldn't defend its outputs in front of a regulator. Classical graph theory gives us interpretable, deterministic answers.
The macro layer is diagnostic, not prescriptive. The course pack includes a macro_factors_logistic.py file that demonstrates regime classification via logistic regression — we do not wire it into the live allocation engine. Doing so would require careful calibration and out-of-sample validation that exceeds the scope of a teaching demo. We document this as a clean extension hook for future work.
A production robo-advisor would do tax-loss harvesting, wash-sale avoidance, and account-type-aware rebalancing (taxable vs IRA). We do not. This would substantially complicate the optimisation problem and is out of scope.
The five profiles are static. A 25-year-old Aggressive client and a 60-year-old Aggressive client get the same allocation. In production this needs a target-date overlay; we document it and don't build it.
The Macro Dashboard includes a financial headlines panel that pulls recent stories from free RSS feeds (Yahoo Finance, MarketWatch). We display the headlines. We do not analyse them.
This is a deliberate choice. The literature on news-driven asset return prediction starts with Tetlock (2007) and the subsequent decade of refinements; doing it responsibly requires:
- A financial-text-specific sentiment lexicon (Loughran-McDonald is the canonical one, but it ages)
- Careful out-of-sample validation across multiple regimes
- Calibration to the specific publication style of each source (Bloomberg-style headlines differ from MarketWatch-style)
- Ongoing maintenance because publication conventions drift
We can't do all of that in scope, and doing it half-way would produce confident-looking outputs with no real information content — the kind of thing that looks like quant-finance work but isn't defensible under scrutiny. We choose silence over noise. The headlines are presented as ambient context, not as a signal.
We considered and rejected:
- Twitter / Reddit sentiment — dirty data, no defensible filtering ruleset, results would be sensitive to which accounts we sample.
- GDELT or similar event databases — large historical noise, calibrating impact on a 14-asset universe is a research project.
- Prediction-market data (Polymarket) — sparse liquidity, unstable API, connecting political markets to asset allocation requires careful causal reasoning we don't have time for.
- Real-time market data beyond yfinance — we're an allocation recommender, not a trading system. NBBO routing, order-book depth, exchange-specific data are out of scope.
Each of these is the kind of feature that sounds impressive in a pitch but doesn't survive a quant interviewer asking "show me your validation."
Three deliberate, defensible features were added in lieu of the rejected options above:
Statistical shock detection (macro.detect_shocks). Each macro indicator's 5-day change is z-scored against the rolling 60-day distribution of similar changes. A move beyond 2σ is flagged. This is textbook hypothesis-test statistics — no model, no learning, no opinion. We surface the shock; the user reads the regime-aware mechanism for what it might mean. It is not a forecast.
Weekly-changes table (macro.weekly_changes_table). The same data presented as a scan: current level, 5-day change, 5-day percent change, z-score. Lets the user see at a glance whether anything is moving abnormally without reading paragraphs of commentary.
Financial headlines panel (news_feed). Pulls recent stories from public RSS feeds. Displayed as a reader's panel. We do not score, summarise, or interpret them. Their function is to give the user something to click — not to provide allocation guidance.
All three additions share the same philosophy: surface the data, name the mechanism, let the user form their own view. No "this is what the market is telling you" claims. No predictive scoring. No LLM in the loop.
Beyond what's deliberately excluded, here's what's in but imperfect:
- The sample period (2014–2024) is mostly the falling-rate era. Higher-rate regimes are under-represented, which affects the calibration of all engines but especially Max-Sharpe and Black-Litterman.
- yfinance can be flaky on cloud hosts. The app falls back to calibrated synthetic returns when this happens — fine for the demo, not fine for production.
- 252-day estimation windows are reactive, not predictive. Weights respond to past stress, not future stress.
- Quarterly rebalance is hard-coded, not optimised.
- The 30% per-asset cap is a regularisation choice, not a derived bound.
- The CPPI floor (90%) and multiplier (m=3) are heuristic, not optimised.
- Profiles are static (no glide path).
- The knowledge gate is a soft cap, not a regulatory replacement.
- ESG and exclusion answers are recorded but do not yet filter the asset universe.
- Black-Litterman views are a fixed default, not user-elicited.
- Regime-tilt thresholds (low=0.30, high=0.65) are calibrated for a 14-asset multi-class universe. They would need recalibration for any other universe.
None of these limitations invalidate the project as a teaching demo. All of them would need to be addressed before any production use. Documenting them is what makes the demo defensible — a real compliance officer could read this section and either approve the limitations as acceptable scope-restrictions or flag them as blockers, but in either case they have what they need.
The codebase is nine Python modules plus the Streamlit app:
| Module | What it owns |
|---|---|
roboadvisor.py |
Asset universe definition, the six allocation engines, score-modulation, the backtest with optional frictions (expense ratios, transaction costs), the Allocation dataclass, summary stats including Sortino/Calmar/Information Ratio, drawdown calculation. The brain. |
intake.py |
The three intake tiers, the question schemas, the scoring rule with knowledge gate, the continuous-score result for parameter modulation, P6 routing logic. |
macro.py |
Macro indicator downloads, synthetic-data fallback, the regime classifier, profile-aware regime interpretation, statistical shock detection, weekly-changes table, asset-macro correlations. |
graph_methods.py |
Mantegna distance, MST construction, eigenvector centrality, HRP, the centrality tilt, the regime-aware tilt strength function. |
explainability.py |
The risk declarations (six profiles), the parameter glossary, the IPS rendering. All hand-written content, no LLM. |
monte_carlo.py |
Bootstrap forward simulation. Path generation, percentile bands, goal-tracking (probability of hitting target wealth, time-to-target), regular contributions. |
news_feed.py |
RSS-based financial headline fetcher. Display only, no analysis. Optional dependency on feedparser; falls back gracefully if unavailable. |
app.py |
The Streamlit UI. CSS, masthead, tier picker, questionnaire, seven result tabs (Allocation, Risk Declaration, Parameters, Asset Network, Macro Dashboard, Backtest, Forward Projection). |
clap_to_launch.py |
Optional cold-open gimmick — unrelated to the modelling. |
Plus:
requirements.txt— pinned dependencies.notebook_modelling.ipynb— Jupyter notebook for the in-class technical deep-dive.architecture.svg— the system architecture diagram.methodology.html— the executive deck (13 slides).landing/index.html— the portfolio landing page.profile_analytics.py— standalone analytics script that generates per-profile reports.
That's the system, end to end. Read this document linearly and you understand every piece.