Skip to content

hcmlab/discover-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DISCOVER-Utils

PyPI version Python License: GPL v3 Documentation

DISCOVER-Utils is a Python utility package for data handling, processing, and annotation of multimedia data. It is designed to work with the DISCOVER framework or as a stand-alone library.

Features

  • Data handling — Unified access to streams (audio, video, sensor data) and annotations (discrete, continuous) via file, MongoDB, or URL backends
  • Multiple video backends — Choose between decord, imageio, moviepy, or pyav for video decoding
  • Dataset management — Iterate over multi-session datasets with DatasetManager and DatasetIterator
  • Processing pipeline — Run DISCOVER server modules from the command line for feature extraction and prediction
  • SSI compatibility — Read and write SSI trainer files and XML configurations

Installation

pip install hcai-discover-utils

Optional video backends

# Fast video decoding with decord
pip install hcai-discover-utils[decord]

# PyAV (FFmpeg bindings)
pip install hcai-discover-utils[pyav]

# MoviePy
pip install hcai-discover-utils[pymovie]

Getting Started

Command-line tools

Process data with DISCOVER server modules:

du-process \
  --dataset "my_dataset" \
  --db_host "127.0.0.1" --db_port "27017" \
  --db_user "user" --db_password "pass" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session1", "session2"]' \
  --data '[{"src": "db:anno", "scheme": "transcript", "annotator": "test", "role": "testrole"}]'

File mode (no database)

Read inputs and write outputs directly from/to disk, without a NOVA database. Use file: sources and supply a path via uri (static, single session) or uri_template (per-session paths via {dataset} and {session} placeholders):

du-process \
  --dataset "my_study" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session_a", "session_b"]' \
  --data '[
    {
      "id": "video",
      "type": "input",
      "src": "file:stream:video",
      "uri_template": "/data/{dataset}/{session}/video.mp4"
    },
    {
      "id": "valence",
      "type": "output",
      "src": "file:annotation:continuous",
      "uri_template": "/outputs/{dataset}/{session}/valence.annotation",
      "sample_rate": 30,
      "min_val": -1,
      "max_val": 1
    }
  ]'

Each session resolves its own input and output paths. Output annotation descriptors may carry scheme metadata that is used when no annotation file exists yet:

  • file:annotation:continuous: sample_rate, min_val, max_val (defaults: 1, 0, 1).

  • file:annotation:discrete: classes as a map from class id to a dict of per-class XML attributes (typically name, optionally color, etc.). The outer key is the canonical id; the writer injects it into the XML automatically. For example:

    // fragment of a data description entry
    "classes": {
      "0": {"name": "neutral", "color": "#888"},
      "1": {"name": "happiness", "color": "#ffd700"}
    }

    Legacy {id: name} strings are also accepted and normalized internally to the canonical form.

This matters for modules that resample continuous outputs to the scheme's sample_rate — without explicit metadata, outputs default to 1 Hz.

Notes:

  • uri and uri_template are filesystem paths (absolute or relative to the working directory). There is no implicit base directory.
  • uri_template placeholders that reference {dataset} or {session} must have non-empty values; otherwise resolve_file_uri raises ValueError.
  • uri_template takes precedence over uri when both are present.

Python API

from discover_utils.data.provider.data_manager import DatasetManager

# Set up a dataset manager for your sessions
dm = DatasetManager(
    dataset="my_dataset",
    db_host="127.0.0.1",
    db_port=27017,
    db_user="user",
    db_password="pass",
    sessions=["session1"],
    data_description=[...],
)

Documentation

Full API documentation is available at hcmlab.github.io/discover-utils/docbuild/.

Citation

If you use DISCOVER or DISCOVER-Utils in your research, please cite:

@article{hallmen2025discover,
  title     = {DISCOVER: a Data-driven Interactive System for Comprehensive
               Observation, Visualization, and ExploRation of human behavior},
  author    = {Hallmen, Tobias and Schiller, Dominik and others},
  journal   = {Frontiers in Digital Health},
  volume    = {7},
  pages     = {1638539},
  year      = {2025},
  publisher = {Frontiers}
}

License

This project is licensed under the GNU General Public License v3.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages