Skip to content

System Maintenance: Market Metadata Refresh & Data Hygiene #196

Description

@evanlow

Summary

Build a safety-first System Maintenance module for TWS Robot to keep market metadata fresh and auditable without touching order placement, strategy execution, or autonomous trading controls.

This issue was originally scoped as a monthly index-constituent refresh. The recommended implementation is broader and more useful: a Maintenance Console with a manual dashboard tab, dry-run/apply workflow, validation, reports, and integration with the existing market-events refresh path.

The module should help keep TWS Robot abreast of:

  • Index constituents: S&P 500, STI, HSI
  • Market events: earnings, dividends, FOMC, market holidays / early closes
  • Data hygiene: stale metadata files, duplicate symbols, invalid symbols, suspicious count changes
  • Maintenance reports: what changed, what failed, and whether files were safely updated

Problem

Index constituent files and event metadata can become stale over time. This can cause incomplete screening coverage, misleading market context, or missing upcoming event warnings.

Examples:

  • HSI constituent list lagging current membership
  • STI/S&P 500 CSV files not refreshed in a controlled way
  • Earnings/dividend/FOMC events not refreshed consistently
  • Current refresh scripts are ad hoc and scattered rather than part of one auditable maintenance workflow

Existing scripts to consolidate:

  • scripts/refresh_sp500_constituents.py
  • deployment_scripts/refresh_hsi_constituents.py

Existing event refresh capability to expose through the maintenance UI/API:

  • POST /api/market-events/refresh
  • GET /api/market-events/sync-log
  • data/market_events.py

Recommended Product Direction

Add a new Maintenance dashboard page, preferably under the existing Monitoring or Settings area.

Suggested route:

/maintenance

Suggested API namespace:

/api/maintenance/*

The Maintenance page should not be mixed into the Stock Analysis screener pages. Stock Analysis should remain focused on screening and ticker analysis. Maintenance should be a separate operator/admin function.

Key Operating Model

The maintenance workflow should be manually invokable and safe by default.

Recommended cadence:

  • Manually run every 2–3 days, or daily if desired
  • Prefer off-peak / non-market hours
  • Allow running during market hours only because it is metadata-only, but warn the user if market appears open
  • Future optional scheduled workflow can run daily or every few days, but manual button should come first

Recommended operator flow:

  1. Operator opens Maintenance tab.
  2. Operator clicks Dry Run All or selects specific tasks.
  3. System fetches external metadata into memory / temp files only.
  4. System validates proposed outputs.
  5. System shows proposed changes: added symbols, removed symbols, count changes, warnings, source, duration.
  6. Operator clicks Apply Refresh only if validation passes.
  7. System writes files atomically, backs up previous files, invalidates relevant caches, and writes maintenance reports.

Dashboard UI Requirements

Create a Maintenance Console with status cards and action buttons.

Suggested status cards:

Area Status fields
S&P 500 constituents last refreshed, row count, source, validation status
STI constituents last refreshed, row count, source, validation status
HSI constituents last refreshed, row count, source, validation status
Market events last refreshed, fetched/upserted/stale/error counts
Latest maintenance report timestamp, overall status, warning/error count

Suggested buttons:

  • Dry Run All
  • Dry Run Constituents
  • Apply Constituents Refresh
  • Refresh Market Events
  • Validate Metadata Only
  • View Latest Report

The UI should clearly separate:

  • dry_run=true preview actions
  • apply actions that actually update files/DB

The UI should display warnings before apply, especially:

  • Market appears open
  • Large constituent count drop
  • Source fetch failed
  • Validation failure
  • Partial market-events sync failure

Backend Design

Add a maintenance orchestration package.

Suggested structure:

web/maintenance/
  __init__.py
  __main__.py
  runner.py
  tasks.py
  validators.py
  reports.py
  sources/
    __init__.py
    sp500.py
    sti.py
    hsi.py

The runner should expose a task registry, allowing individual or grouped tasks.

Example task names:

sp500_constituents
sti_constituents
hsi_constituents
market_events
metadata_validation

Each task should return a structured result:

{
  "task": "sp500_constituents",
  "status": "success",
  "dry_run": true,
  "source": "https://...",
  "started_at": "2026-06-24T00:00:00Z",
  "finished_at": "2026-06-24T00:00:04Z",
  "duration_seconds": 4.2,
  "before_count": 503,
  "after_count": 503,
  "added": [],
  "removed": [],
  "validation": {
    "status": "passed",
    "warnings": [],
    "errors": []
  },
  "warnings": [],
  "errors": []
}

CLI Requirements

Add a command interface so the same functionality can be used outside the dashboard.

Examples:

python -m web.maintenance run --dry-run
python -m web.maintenance run --task sp500_constituents --dry-run
python -m web.maintenance run --task hsi_constituents --apply
python -m web.maintenance run --task market_events --apply
python -m web.maintenance validate

Dry run should be the default unless --apply is explicitly supplied.

API Requirements

Add API endpoints for the dashboard.

Suggested endpoints:

GET  /api/maintenance/status
POST /api/maintenance/run
GET  /api/maintenance/reports
GET  /api/maintenance/reports/<report_id>

Example request:

{
  "tasks": ["sp500_constituents", "sti_constituents", "hsi_constituents"],
  "dry_run": true
}

Example response:

{
  "status": "completed",
  "dry_run": true,
  "report_id": "maintenance_20260624_001500",
  "results": [...],
  "warnings": [],
  "errors": []
}

State-changing endpoints must remain CSRF-protected like the existing web API pattern.

Index Constituent Refresh Requirements

Refresh at minimum:

  • S&P 500
  • STI
  • HSI

Required output files:

data/sp500_constituents.csv
data/sti_constituents.csv
data/hsi_constituents.csv

Required CSV columns:

symbol
security
sector
sub_industry

Additional market-specific display fields are allowed where already used:

display_symbol

Market-specific symbol handling:

  • S&P 500: yfinance-compatible US tickers; replace . with - where needed, e.g. BRK.B -> BRK-B
  • STI: SGX/yfinance .SI symbols; preserve display symbol separately
  • HSI: HKEX/yfinance .HK symbols; preserve zero-padded four-digit display symbol separately

Validation Requirements

Before applying any refreshed file, validate:

  • Required columns are present
  • Row count is above configurable minimum threshold
  • Row count change is not suspiciously large
  • Symbols are non-empty
  • Duplicate symbols are rejected or warned loudly
  • Symbol format matches market rules
  • At least one source table / source record was parsed
  • No output file is replaced if validation fails

Suggested minimum count thresholds:

sp500_constituents: >= 450
sti_constituents: >= 25
hsi_constituents: >= 70

Suggested suspicious-change thresholds:

warn if count changes by > 10%
fail if count changes by > 25%, unless an explicit override is provided

Safe File Write Requirements

Apply mode must be atomic and reversible.

Required behavior:

  1. Fetch source data.
  2. Normalize into a DataFrame/list.
  3. Write proposed output to a temp file.
  4. Validate temp file.
  5. Create timestamped backup of existing file.
  6. Replace existing file only after validation passes.
  7. Invalidate relevant screener cache.
  8. Write JSON and Markdown maintenance reports.

Suggested backup path:

data/backups/constituents/YYYYMMDD_HHMMSS/<filename>.csv

Suggested report paths:

reports/maintenance/YYYYMMDD_HHMMSS.json
reports/maintenance/YYYYMMDD_HHMMSS.md

Market Events Refresh Requirements

Use the existing market-events service rather than building a parallel event system.

Maintenance should expose a button/API task to refresh:

  • Earnings
  • Dividends
  • FOMC dates
  • Market holidays / early closes

Important caution:

Do not fetch earnings/dividends for all S&P 500 constituents by default. That may be slow and fragile with yfinance.

Initial recommended scope for earnings/dividends refresh:

  • Open positions
  • Active strategy symbols
  • Watchlist symbols, if available
  • Optional manually supplied symbols

FOMC and market holiday refresh can be broad/static because they are not per-symbol heavy.

The market-events maintenance task should call into the existing data.market_events service and record provider sync results in the maintenance report.

Cache Invalidation Requirements

After successful apply of constituent files, invalidate relevant screener caches:

  • S&P 500 screener cache
  • STI screener cache
  • HSI screener cache

This ensures the next screener load uses the refreshed universe instead of a stale in-memory cache.

Safety / Prime Directive Requirements

This feature must be strictly metadata-only.

The maintenance module must not:

  • Place orders
  • Modify order/execution logic
  • Start or stop strategies
  • Change autonomous trading configuration
  • Bypass emergency-stop / trading-safety controls
  • Touch live/paper trading execution paths

Allowed write targets:

  • Constituent CSV metadata files
  • Backup files
  • Maintenance report files
  • Market-events DB rows via the existing market-events service
  • Maintenance logs

Default failure mode:

  • Abort the task
  • Preserve existing files
  • Write a failed report
  • Surface warnings/errors clearly in the UI

Scope

In scope

  • Maintenance runner and task registry
  • Refactor existing S&P 500 refresh script into the framework
  • Refactor existing HSI refresh script into the framework
  • Add STI refresh task
  • Validation utilities
  • Safe atomic file replacement and backups
  • JSON and Markdown report writer
  • CLI entrypoint
  • Web Maintenance dashboard tab
  • Maintenance API endpoints
  • Market-events refresh button backed by existing service
  • Cache invalidation after successful constituent apply
  • Tests and documentation

Out of scope

  • Live order placement changes
  • Strategy behavior changes
  • Autonomous trading behavior changes
  • Intraday/real-time constituent refresh
  • Fetching earnings/dividends for every index constituent by default
  • Paid/licensed market-data vendor integration, unless added later as a separate issue

Acceptance Criteria

  • A new Maintenance dashboard page exists and is reachable from the web UI.
  • Operator can run a dry-run refresh for S&P 500, STI, and HSI constituents.
  • Operator can apply a validated constituent refresh from the dashboard.
  • CLI supports equivalent dry-run/apply operations.
  • Existing S&P 500 and HSI refresh scripts are refactored into or replaced by the maintenance module.
  • STI refresh task is implemented.
  • Validation fails loudly when suspicious output occurs.
  • Existing constituent files are preserved if validation fails.
  • Successful apply creates a timestamped backup of previous files.
  • Successful apply writes JSON and Markdown maintenance reports.
  • Successful apply invalidates relevant screener caches.
  • Market-events refresh is exposed through the Maintenance tab using the existing event service.
  • Maintenance report includes per-task counts, source, duration, added/removed symbols, warnings, and errors.
  • Tests cover success path, source failure path, validation failure path, dry-run behavior, apply behavior, backup behavior, and cache invalidation.
  • No regression in trading safety controls or autonomous execution gating.

Implementation Checklist

Phase 1 — Core maintenance framework

  • Create web/maintenance/ package.
  • Add task result dataclass / structured result model.
  • Add task registry and runner.
  • Add dry-run/apply execution modes.
  • Add JSON and Markdown report writer.
  • Add CLI entrypoint.

Phase 2 — Constituent refresh tasks

  • Refactor S&P 500 refresh logic into web/maintenance/sources/sp500.py.
  • Refactor HSI refresh logic into web/maintenance/sources/hsi.py.
  • Add STI refresh logic in web/maintenance/sources/sti.py.
  • Normalize all outputs to app-compatible CSV schemas.
  • Add safe temp-file validation and atomic replace.
  • Add timestamped backups.

Phase 3 — Validation and hygiene

  • Add required-column validator.
  • Add row-count threshold validator.
  • Add duplicate-symbol validator.
  • Add market-specific symbol-format validators.
  • Add suspicious-count-change warning/fail logic.
  • Add metadata-only validation task.

Phase 4 — Web API and dashboard

  • Add web/routes/maintenance.py page route.
  • Add web/templates/maintenance/index.html.
  • Add web/routes/api_maintenance.py.
  • Register blueprints in web/__init__.py.
  • Add Maintenance link under Monitoring or Settings navigation.
  • Add UI status cards and buttons.
  • Add display for latest report and task-level results.

Phase 5 — Market events integration

  • Add maintenance task wrapping existing market-events refresh service.
  • Show provider sync summary in maintenance report.
  • Ensure earnings/dividends refresh is limited to portfolio/strategy/watchlist/manual symbols by default.
  • Include FOMC and market-holiday refresh status.

Phase 6 — Tests and docs

  • Add unit tests for validators.
  • Add unit tests for dry-run behavior.
  • Add unit tests for apply + backup behavior.
  • Add unit tests for source failure behavior.
  • Add unit tests for validation failure preserving existing files.
  • Add route/API tests for Maintenance page and API endpoints.
  • Add docs explaining how and when to run maintenance.
  • Update relevant user guide / operations docs.

Notes for Future Enhancement

Potential future improvements, not required for first implementation:

  • GitHub Actions manual workflow to run dry-run maintenance and upload report artifact
  • Optional scheduled off-peak maintenance
  • More authoritative source providers for official index membership
  • Paid/licensed data provider integration
  • Watchlist management UI feeding event refresh symbols
  • Maintenance health badge on main dashboard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions