DISCOVER-Utils

DISCOVER-Utils is a Python utility package for data handling, processing, and annotation of multimedia data. It is designed to work with the DISCOVER framework or as a stand-alone library.

Features

Data handling — Unified access to streams (audio, video, sensor data) and annotations (discrete, continuous) via file, MongoDB, or URL backends
Multiple video backends — Choose between decord, imageio, moviepy, or pyav for video decoding
Dataset management — Iterate over multi-session datasets with DatasetManager and DatasetIterator
Processing pipeline — Run DISCOVER server modules from the command line for feature extraction and prediction
SSI compatibility — Read and write SSI trainer files and XML configurations

Installation

pip install hcai-discover-utils

Optional video backends

# Fast video decoding with decord
pip install hcai-discover-utils[decord]

# PyAV (FFmpeg bindings)
pip install hcai-discover-utils[pyav]

# MoviePy
pip install hcai-discover-utils[pymovie]

Getting Started

Command-line tools

Process data with DISCOVER server modules:

du-process \
  --dataset "my_dataset" \
  --db_host "127.0.0.1" --db_port "27017" \
  --db_user "user" --db_password "pass" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session1", "session2"]' \
  --data '[{"src": "db:anno", "scheme": "transcript", "annotator": "test", "role": "testrole"}]'

File mode (no database)

Read inputs and write outputs directly from/to disk, without a NOVA database. Use file: sources and supply a path via uri (static, single session) or uri_template (per-session paths via {dataset} and {session} placeholders):

du-process \
  --dataset "my_study" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session_a", "session_b"]' \
  --data '[
    {
      "id": "video",
      "type": "input",
      "src": "file:stream:video",
      "uri_template": "/data/{dataset}/{session}/video.mp4"
    },
    {
      "id": "valence",
      "type": "output",
      "src": "file:annotation:continuous",
      "uri_template": "/outputs/{dataset}/{session}/valence.annotation",
      "sample_rate": 30,
      "min_val": -1,
      "max_val": 1
    }
  ]'

Each session resolves its own input and output paths. Output annotation descriptors may carry scheme metadata that is used when no annotation file exists yet:

file:annotation:continuous: sample_rate, min_val, max_val (defaults: 1, 0, 1).
file:annotation:discrete: classes as a map from class id to a dict of per-class XML attributes (typically name, optionally color, etc.). The outer key is the canonical id; the writer injects it into the XML automatically. For example:
```
// fragment of a data description entry
"classes": {
  "0": {"name": "neutral", "color": "#888"},
  "1": {"name": "happiness", "color": "#ffd700"}
}
```
Legacy {id: name} strings are also accepted and normalized internally to the canonical form.

This matters for modules that resample continuous outputs to the scheme's sample_rate — without explicit metadata, outputs default to 1 Hz.

Notes:

uri and uri_template are filesystem paths (absolute or relative to the working directory). There is no implicit base directory.
uri_template placeholders that reference {dataset} or {session} must have non-empty values; otherwise resolve_file_uri raises ValueError.
uri_template takes precedence over uri when both are present.

Python API

from discover_utils.data.provider.data_manager import DatasetManager

# Set up a dataset manager for your sessions
dm = DatasetManager(
    dataset="my_dataset",
    db_host="127.0.0.1",
    db_port=27017,
    db_user="user",
    db_password="pass",
    sessions=["session1"],
    data_description=[...],
)

Documentation

Full API documentation is available at hcmlab.github.io/discover-utils/docbuild/.

Citation

If you use DISCOVER or DISCOVER-Utils in your research, please cite:

@article{hallmen2025discover,
  title     = {DISCOVER: a Data-driven Interactive System for Comprehensive
               Observation, Visualization, and ExploRation of human behavior},
  author    = {Hallmen, Tobias and Schiller, Dominik and others},
  journal   = {Frontiers in Digital Health},
  volume    = {7},
  pages     = {1638539},
  year      = {2025},
  publisher = {Frontiers}
}

License

This project is licensed under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
discover_utils		discover_utils
docs		docs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DISCOVER-Utils

Features

Installation

Optional video backends

Getting Started

Command-line tools

File mode (no database)

Python API

Documentation

Citation

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DISCOVER-Utils

Features

Installation

Optional video backends

Getting Started

Command-line tools

File mode (no database)

Python API

Documentation

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages