Skip to content

Backend Modernization#47

Open
manuvanegas wants to merge 20 commits into
openskope:mainfrom
manuvanegas:titiler
Open

Backend Modernization#47
manuvanegas wants to merge 20 commits into
openskope:mainfrom
manuvanegas:titiler

Conversation

@manuvanegas
Copy link
Copy Markdown

@manuvanegas manuvanegas commented Mar 30, 2026

  • Set up TiTiler as internal tile server
  • Revise pydantic models. Stateful vs stateless data objects
  • Explore local filesystem store options (manual)
  • Divide core services into self-explanatory files
  • Tests

manuvanegas and others added 11 commits August 29, 2025 17:51
All deploy yamls in a single dir, next to timeseries. Dockerfile only copies the necessary yaml configs.
Drop Numba; add anyio, httpx, aioboto3. Pin pydantic==2.x and pydantic-settings. Replace custom yaml_config_settings_source with YamlConfigSettingsSource.
JSON-serializable models for the background task pipeline. Add TimeseriesAnalyzeRequest to support workflow that decouples raster data extraction and summarization from downstream analysis (extract-once/analyze-many)
Job store for background tasks, registry and lookup dict loaders, data reader abstraction to support S3 and local json dict loading
…ny logic

validation: regex + registry ID filtering, geometry cell-count estimation.
slice_resolver: temporal range → file+band mapping via lookup dict.
tiles: async TiTiler proxy using app-level httpx.AsyncClient.
timeseries_processing: anyio CapacityLimiter(10) + to_thread for concurrent
rasterio reads; z-score and rolling-average transforms.
timeseries_tasks: background task orchestrator for full pipeline lifecycle.
Add:
- store/: test_jobs (atomic writes, stale cleanup), test_data_reader (S3/local
  factory), test_lookup_dicts (schema + ISO-8601 ordering validation, cache)
- core/: test_registry (YAML validation, transform lengths, temporal resolution),
  test_validation (CRS detection heuristics, cell-count estimation, geom size),
  test_timeseries_processing (band chunking, z-score/rolling transforms,
  execute_analyze_request edge cases)
- schemas/: test_geometry_schemas (bbox bounds, DE-9IM, reprojection),
  test_timeseries_schemas (pattern matching, path-traversal/injection rejection,
  smoother width constraints)
pytest.ini: asyncio_mode=auto; integration marker for tests needing raster I/O.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@manuvanegas manuvanegas changed the title Entire Backend Modernization Backend Modernization Apr 2, 2026
@manuvanegas manuvanegas marked this pull request as ready for review April 10, 2026 17:01
@manuvanegas
Copy link
Copy Markdown
Author

graph LR
    UI["skopeui"]

    subgraph offline["skope-datasets (offline pipeline)"]
        Pipeline["'convert to stac' pipeline"]
    end

    subgraph skope-api["skope-api (FastAPI)"]
        subgraph endpoints["v3 Router endpoints"]
            EP1["GET /metadata"]
            EP2["GET /tiles/{ds}/{var}/{t}/{z}/{x}/{y}"]
            EP3["POST /timeseries/extract"]
            EP4["POST /timeseries/analyze"]
            EP5["GET /timeseries/status/{id}"]
        end
        
        Lookup[("Lookup Cache\n(JSON/Dict)")]
        BgTask["Background Task\n(Worker Process)"]
        Jobs[("Job Store\n(JSON/JobStore)")]
        Titiler["TiTiler (internal)"]
    end

    subgraph storage["Storage"]
        subgraph localst["local"]
            MetaYAML["metadata.yml\n(local registry)"]
        end
        subgraph s3st["AWS S3"]
            COGs["COGs + STAC\n(paleocar_v3/)"]
            lookupdict["Lookup Dict"]
        end
    end

    %% ==========================================
    %% ACTUAL CONNECTIONS (No Hacks)
    %% ==========================================
    UI --> EP1
    UI --> EP2
    UI --> EP3
    UI --> EP4
    UI --> EP5

    EP1 -.->|"registry (loaded at startup)"| MetaYAML

    EP2 -->|"A2: request then proxy tile"| Titiler
    EP2 -->|"A1: resolve URI + band"| Lookup
    Titiler -->|"A3: read COG tile"| COGs

    EP3 -->|"B2: dispatch"| BgTask
    EP3 -->|"B1: write PENDING"| Jobs
    BgTask -->|"B3: resolve URIs + bands"| Lookup
    BgTask -->|"B4: rasterio reads\n(anyio ≤10 concurrent)"| COGs
    Lookup -.->|"cache miss"| lookupdict
    BgTask -->|"B5: compute zonal stats, write results + SUCCESS"| Jobs

    EP4 -->|"C: read base_series and apply transform/smoothing"| Jobs
    EP5 -->|"B6: poll job status + get results"| Jobs
    
    Pipeline ---|"write COG STAC + lookup.json"| Junc(( ))
    Junc --> COGs
    Junc --> lookupdict

    %% Style the junction to look like a small black dot
    style Junc fill:#888,stroke:#888,stroke-width:1px
    
    style MetaYAML fill:#fff,stroke:#333,stroke-dasharray: 5 5
    style COGs fill:#fff,stroke:#333,stroke-dasharray: 5 5
    style lookupdict fill:#fff,stroke:#333,stroke-dasharray: 5 5
Loading

@manuvanegas manuvanegas requested a review from alee April 10, 2026 22:58
@manuvanegas manuvanegas marked this pull request as draft April 15, 2026 18:26
Introduce a Redis-backed job store and enable Redis in deployment. Adds a Redis service to docker-compose and exposes REDIS_URL to the app, adds redis dependency to requirements, and adds redis_url to app settings. Implements RedisJobStore (with 24h TTL) and updates get_job_store to prefer Redis when redis_url is configured. Also bumps default_max_cells to 1,000,000.
@manuvanegas manuvanegas marked this pull request as ready for review April 15, 2026 21:13
Introduce a pytest fixture for RedisJobStore and add comprehensive tests for Redis-backed job storage. Updates conftest to import RedisJobStore and provide redis_job_store with teardown flushing. Tests cover update/get semantics, overwrite behavior, missing keys, TTL enforcement (using _JOB_TTL_SECONDS), key namespacing, and handling of full success payloads to ensure Redis store correctness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add fallback and safeguards for geometries that don't cover pixel centers. In timeseries_processing.py use geometry_mask(all_touched=True) when the initial mask has zero covered pixels, and ensure the computed window has at least 1x1 size. In validation.py treat lists of Shapely Point geometries as always within size limits by returning early. These changes avoid zero-coverage/mask issues for point or very small polygons and prevent zero-dimension windows.
Replace heuristic geographic detection with pyproj.CRS and compute geometry area robustly. Add dataset_crs parameter to calculate_spatial_coverage, union shapes with shapely.unary_union and use geodetic area for geographic CRSs (fall back to planar area otherwise). Update caller to pass dataset_crs. Remove legacy COMMON_GEOGRAPHIC_EPSG/is_geographic heuristic and update estimate_cell_count/validate_geom_size to accept CRS strings and rely on CRS.is_geographic().
Add timestep precision detection and normalization helpers to verify and coerce incoming ISO-8601 timesteps to the dataset's resolution (supports year/month/day/datetime). Integrate normalization into slice resolvers (resolve_temporal_slice and resolve_uri_single_band) to enforce canonical suffixes and provide clearer errors for mismatched precisions. Also remove the internal 'base_series' field from job payloads returned by the get_job_status endpoint.
Custom colormaps are defined once in deploy/colormaps/custom.json. A custom TiTiler image bakes them into rio-tiler's registry at build time so tile requests use colormap_name without per-request overhead. resolve_colormaps() reads the same file at startup and injects colormap_stops into the registry so the frontend colorbar can read color stops from /metadata with no additional endpoint. Colormap names are declared per-variable in metadata.yml. If not built in TiTiler, the colormap needs to be defined in deploy/colormaps/custom.json.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes the backend by introducing an internal TiTiler tile server, migrating the timeseries API to a new v3 pipeline design (registry + lookup dicts + job store), and restructuring deployment/configuration and tests accordingly.

Changes:

  • Add TiTiler customization + build-time colormap baking for internal tile streaming.
  • Replace prior timeseries service layers/routers with v3 endpoints (/metadata, /tiles, /timeseries/extract + /timeseries/analyze + /timeseries/status) backed by a job store and cached lookup dictionaries.
  • Rework deployment files (Dockerfiles, compose, settings, requirements) and add an expanded unit/integration test suite with raster fixtures.

Reviewed changes

Copilot reviewed 72 out of 87 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
titiler_custom/main.py Exposes TiTiler app entrypoint for custom image.
titiler_custom/build_colormaps.py Build-time script to bake custom colormaps into rio-tiler.
timeseries/metadata.yml Adds crs/transform and variable colormap references for datasets.
timeseries/docker/.gitignore Removed docker ignore rules under timeseries/docker.
timeseries/deploy/settings/dev.yml Removed legacy timeseries deploy settings (moved to deploy/).
timeseries/deploy/settings/base.yml Removed legacy base settings (moved to deploy/).
timeseries/deploy/requirements/prod.txt Removed legacy prod requirements (moved to deploy/).
timeseries/deploy/requirements/dev.txt Removed legacy dev requirements (moved to deploy/).
timeseries/deploy/requirements/base.txt Removed legacy base requirements (moved to deploy/).
timeseries/deploy/metadata/prod.yml Removed legacy metadata split (now uses metadata.yml registry).
timeseries/deploy/metadata/dev.yml Removed legacy dev metadata.
timeseries/deploy/Dockerfile Removed legacy timeseries Dockerfile (replaced by deploy/Dockerfile).
timeseries/data/requests/yearly_prod.json Removed legacy request sample.
timeseries/data/requests/yearly.json Removed legacy request sample.
timeseries/data/requests/timeseriesv1.json Removed legacy request sample.
timeseries/data/requests/monthly.json Removed legacy request sample.
timeseries/app/tests/test_stores.py Removed legacy store tests.
timeseries/app/tests/store/test_lookup_dicts.py New tests for lookup-dict caching + validation.
timeseries/app/tests/store/test_jobs.py New tests for filesystem/redis job stores + cleanup.
timeseries/app/tests/store/test_data_reader.py New tests for Local/S3 DataReader and factory.
timeseries/app/tests/store/init.py Test package init.
timeseries/app/tests/schemas/test_timeseries_schemas.py New schema validation tests (TimeRange, smoother, request ID patterns).
timeseries/app/tests/schemas/test_geometry_schemas.py New geometry validation + reprojection tests.
timeseries/app/tests/schemas/init.py Test package init.
timeseries/app/tests/routers/test_datasets.py Removed legacy router tests (v1/v2).
timeseries/app/tests/pipeline/test_pipeline.py New end-to-end pipeline tests for extract/analyze/status.
timeseries/app/tests/pipeline/data/test-monthly/lookup.json Monthly lookup fixture for pipeline tests.
timeseries/app/tests/pipeline/data/test-annual/lookup.json Annual lookup fixture for pipeline tests.
timeseries/app/tests/pipeline/data/monthly_5x5x60_dataset_int16_variable.tif Raster fixture for pipeline tests.
timeseries/app/tests/pipeline/data/monthly_5x5x60_dataset_float32_variable.tif Raster fixture for pipeline tests.
timeseries/app/tests/pipeline/data/annual_5x5x5_dataset_uint16_variable.tif Raster fixture for pipeline tests.
timeseries/app/tests/pipeline/data/annual_5x5x5_dataset_float32_variable_uncertainty.tif.aux.xml Raster aux fixture for pipeline tests.
timeseries/app/tests/pipeline/data/annual_5x5x5_dataset_float32_variable_uncertainty.tif Raster fixture for pipeline tests.
timeseries/app/tests/pipeline/data/annual_5x5x5_dataset_float32_variable.tif.aux.xml Raster aux fixture for pipeline tests.
timeseries/app/tests/pipeline/data/annual_5x5x5_dataset_float32_variable.tif Raster fixture for pipeline tests.
timeseries/app/tests/pipeline/conftest.py Pipeline fixtures and dependency overrides for integration tests.
timeseries/app/tests/pipeline/init.py Test package init.
timeseries/app/tests/core/test_validation.py New validation tests (dataset/variable + geom sizing).
timeseries/app/tests/core/test_timeseries_processing.py New unit tests for processing pipeline utilities/transforms.
timeseries/app/tests/core/test_registry.py New unit tests for registry loading + slice/URI resolution.
timeseries/app/tests/core/init.py Test package init.
timeseries/app/tests/conftest.py Global fixtures for registry/lookup/series/job store/data reader.
timeseries/app/store/jobs.py New filesystem + redis job store implementation + cleanup.
timeseries/app/store/index_loaders.py New registry loader, colormap resolver, lookup cache/fetch/validate.
timeseries/app/store/data_reader.py New Local/S3 JSON readers and reader factory.
timeseries/app/schemas/timeseries.py Pydantic v2 schema refactor and request/response models.
timeseries/app/schemas/geometry.py Geometry model refactor; validation + reprojection helpers.
timeseries/app/schemas/dataset.py Removed legacy dataset manager/metadata resolver.
timeseries/app/schemas/common.py Removed legacy common schemas (BandRange/TimeRange/date-based).
timeseries/app/routers/v3/api.py New v3 API router (metadata, tiles, extract/analyze/status).
timeseries/app/routers/v3/init.py Router package init.
timeseries/app/routers/v2/api.py Removed legacy v2 API.
timeseries/app/routers/v1/api.py Removed legacy v1 API.
timeseries/app/pytest.ini Adds pytest configuration and integration marker.
timeseries/app/main.py App lifecycle initialization + v3 router wiring + httpx client + registry load.
timeseries/app/exceptions.py Updates validation error handling for FastAPI/Pydantic v2.
timeseries/app/core/validation.py New request validation helpers (IDs + geometry sizing).
timeseries/app/core/timeseries_tasks.py New background task for extract job execution + JobStore updates.
timeseries/app/core/timeseries_processing.py New extraction/analyze processing pipeline (chunking, masking, transforms).
timeseries/app/core/tiles.py New TiTiler proxy streaming for XYZ tiles.
timeseries/app/core/slice_resolver.py New timestep normalization + lookup slice/URI resolution.
timeseries/app/core/services.py Removed legacy service layer.
timeseries/app/config.py Migrates settings to pydantic-settings YAML source + new config fields.
geoserver/settings/web.xml Removed GeoServer settings from repo.
geoserver/settings/server.xml Removed GeoServer settings from repo.
geoserver/.gitignore Removed GeoServer ignore file.
docker/README.md Removed legacy docker shared mount README.
deploy/settings/prod.yml Updates prod settings with tile server + storage base URL.
deploy/settings/dev.yml Adds new dev settings file.
deploy/requirements/prod.txt Adds new pinned prod requirements.
deploy/requirements/dev.txt Adds new pinned dev requirements.
deploy/requirements/base.txt Adds new pinned base requirements (FastAPI + Pydantic v2 + geo stack).
deploy/logging/prod.yml Adds prod logging config.
deploy/logging/dev.yml Adds dev logging config.
deploy/dev.yml Removed legacy compose fragment.
deploy/compose/prod.yml Removes GeoServer service from prod compose.
deploy/compose/dev.yml Adds dev compose definition (server + titiler volumes).
deploy/compose/base.yml Adds base compose (server + redis + titiler + networks).
deploy/colormaps/custom.json Adds custom colormap definitions.
deploy/base.yml Removed legacy base compose.
deploy/Dockerfile.titiler Adds TiTiler image build with baked colormaps.
deploy/Dockerfile Adds new server image build for refactored app.
configure Removes legacy configure script.
README.md Updates request examples/paths.
Makefile Updates build/deploy flow and default config generation.
.gitignore Adds macOS/log paths.
.dockerignore Adds docker ignore rules (excludes tests and local artifacts).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread timeseries/app/schemas/timeseries.py
Comment thread timeseries/app/core/timeseries_tasks.py Outdated
Comment thread timeseries/app/exceptions.py
Comment thread timeseries/app/core/tiles.py Outdated
Comment thread timeseries/app/tests/core/test_validation.py
Comment thread titiler_custom/build_colormaps.py Outdated
Comment thread timeseries/app/schemas/timeseries.py
Add environment-specific metadata files (deploy/metadata/dev.yml and deploy/metadata/prod.yml) and update Dockerfile to COPY deploy/metadata/${ENVIRONMENT}.yml to /code/metadata.yml. Remove the previous COPY of timeseries/metadata.yml so the image uses the environment-specific metadata baked into the container.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants