Skip to content

fix(data): implement market data layer; stop .gitignore swallowing src/data (#1)#19

Open
bradsmithmba wants to merge 1 commit into
cloudtrainerwork:masterfrom
bradsmithmba:fix/data-providers
Open

fix(data): implement market data layer; stop .gitignore swallowing src/data (#1)#19
bradsmithmba wants to merge 1 commit into
cloudtrainerwork:masterfrom
bradsmithmba:fix/data-providers

Conversation

@bradsmithmba

Copy link
Copy Markdown

Summary

src/data/ was missing __init__.py, providers.py, models.py, cache.py, and schema.py. Any import of FeatureEngineering — and therefore the entire feature-engineering and regime-detection pipeline — failed at runtime with ImportError.

Closes #1.

Root cause: a .gitignore footgun

This was not simply unwritten code. .gitignore line 165 had an unanchored rule:

data/

An unanchored directory pattern matches a directory of that name at every level, so it matched src/data/ as well as the intended root-level runtime cache directory (config.cache.cache_db_path = data/cache.db). Any module added under src/data/ was silently ignored by git and never committed.

Evidence it was written and then lost:

  • tests/test_providers.py and tests/test_cache.py already exist and specify a detailed API for YFinanceProvider, RateLimiter, CacheManager, and the ORM schema.
  • requirements.txt already declares yfinance>=0.2.28 and sqlalchemy>=2.0.0.
  • The three older modules (data_utils.py, regime_labeler.py, training_data.py) predate the rule, so they remained tracked — which is why the directory looked partially populated.

The fix anchors the rule to the repository root:

/data/

This still ignores the runtime cache (data/cache.db) but no longer touches src/data/. Without this change, any future module added under src/data/ would silently vanish again.

Changes

File Contents
.gitignore data//data/ (anchor to repo root)
src/data/models.py PriceData, OptionChainData dataclasses
src/data/providers.py YFinanceProvider, RateLimiter, DataProviderError/InvalidSymbol/DataNotAvailable, MarketDataRequest, get_default_provider()
src/data/schema.py SQLAlchemy ORM: PriceHistory, OptionsData, CacheMetadata
src/data/cache.py SQLite-backed CacheManager with TTL freshness, hit/miss stats, and cleanup
src/data/__init__.py Package exports consumed by src/features/base.py

The implementation was written to the contract defined by the pre-existing test files, not invented — the tests are the spec.

Testing

$ pytest tests/test_providers.py tests/test_cache.py -q
21 passed

The exact reproduction from the issue now imports cleanly:

from src.features.base import FeatureEngineering          # OK
from src.data import get_default_provider, CacheManager   # OK
from src.data.regime_labeler import RegimeType            # OK

Broader run across tests/{test_providers,test_cache}.py, tests/features/, and tests/models/ (excluding one unrelated missing module): 262 passed. Almost none of these collected before this change.

Related findings (separate issues, not addressed here)

  • The same unanchored-pattern bug also affects models/ (line 168), which silently ignores src/models/. Filed separately.
  • src/models/integrated_selector.py is genuinely absent and breaks recommendation_engine import. Filed separately.
  • transfer_trainer has a torch API drift (ReduceLROnPlateau constructor arg). Filed separately.

🤖 Generated with Claude Code

…/data

src/data was missing __init__.py, providers.py, models.py, cache.py, and
schema.py, so importing FeatureEngineering (and the whole feature pipeline)
failed at runtime. Root cause: an unanchored `data/` rule in .gitignore
matched src/data/ at every level, so modules added under src/data were
silently ignored and never committed.

- Anchor the rule to `/data/` (ignores only the root runtime cache dir,
  config cache_db_path = data/cache.db; no longer src/data/).
- Implement the data layer to the contract in the existing, previously
  unrunnable tests/test_providers.py and tests/test_cache.py:
  - models.py: PriceData, OptionChainData
  - providers.py: YFinanceProvider, RateLimiter, error hierarchy,
    MarketDataRequest, get_default_provider
  - schema.py: SQLAlchemy ORM (PriceHistory, OptionsData, CacheMetadata)
  - cache.py: SQLite-backed CacheManager with TTL, stats, cleanup
  - __init__.py: package exports consumed by src/features/base.py

Unblocks src.features and src.data.regime_labeler. 21 passed.

Closes cloudtrainerwork#1

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Critical: src/data module missing, entire feature pipeline fails to import at runtime

1 participant