Skip to content

chore(deps): Update datasets requirement from >=2.18.0 to >=5.0.0#28

Open
dependabot[bot] wants to merge 1 commit into
masterfrom
dependabot/pip/datasets-gte-5.0.0
Open

chore(deps): Update datasets requirement from >=2.18.0 to >=5.0.0#28
dependabot[bot] wants to merge 1 commit into
masterfrom
dependabot/pip/datasets-gte-5.0.0

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Jun 15, 2026

Copy link
Copy Markdown
Contributor

Updates the requirements on datasets to permit the latest version.

Release notes

Sourced from datasets's releases.

5.0.0

Datasets Features

Agent traces

  • Parse Agent traces messages for SFT using teich by @​lhoestq in huggingface/datasets#8232

    • Agent traces from claude_code/pi/codex and others can now be loaded with load_dataset
    • Using the teich library (new optional dependency), traces are parsed to messages to enable training on traces using e.g. trl
    • Load the data:
    >>> from datasets import load_dataset
    >>> ds = load_dataset("lhoestq/agent-traces-example", split="train")
    >>> ds[0]["messages"]
    [{'role': 'user', 'content': 'Download a random dataset from Hugging Face, use DuckDB to inspect it, and come back with a short report about it. Be concise and include: dataset name, what files/format you found, row count or rough size if you can determine it,...'
     ...]
    • Train on agent traces:
    trl sft --dataset-name lhoestq/agent-traces-example ...

Next-level shuffling in streaming mode

  • Use multiple input shards for shuffle buffer by @​lhoestq in huggingface/datasets#8194

    ds = load_dataset(..., streaming=True)
    ds = ds.shuffle(seed=42)
    # or configure local buffer shuffling manually, default is:
    ds = ds.shuffle(seed=42, buffer_size=1000, max_buffer_input_shards=10)

    before👎:

    after✨:

    toy example comparison

    from datasets import IterableDataset
    ds = IterableDataset.from_dict({"i": range(123_456_789)}, num_shards=1024)
    ds = ds.shuffle(seed=42)
    print("Cold start ids:")

... (truncated)

Commits
  • 68ac1a9 Release: 5.0.0 (#8239)
  • cfe4492 Support composed splits in streaming datasets (#8220)
  • fd67320 Keep None as a real null in Json() columns instead of the string "null" (#8231)
  • 10cdc81 Fix iterable skip over full Arrow blocks (#8236)
  • b7c064d Parse agent traces messages for SFT using teich (#8232)
  • 31e92f1 fix: embed_external_files=True for mesh support (#8224)
  • d168d5f feat: add TsFile (Apache IoTDB) packaged builder with per-device wide format ...
  • 992f3cf fix(map): fix progress bar exceeding total when load_from_cache_file=False (#...
  • 8474a91 Fix single lance file form pylance 7.0 (#8225)
  • d4284e9 feat: add 3D mesh support and MeshFolder builder (#8055)
  • Additional commits viewable in compare view

@dependabot dependabot Bot added the dependencies Dependency update (Dependabot). label Jun 15, 2026
@dependabot dependabot Bot requested a review from jinujon007 as a code owner June 15, 2026 10:26
@dependabot dependabot Bot added the dependencies Dependency update (Dependabot). label Jun 15, 2026
@jinujon007

Copy link
Copy Markdown
Owner

@dependabot rebase

@dependabot dependabot Bot force-pushed the dependabot/pip/datasets-gte-5.0.0 branch from 3c5398a to 074ec0e Compare June 15, 2026 12:44
@jinujon007

Copy link
Copy Markdown
Owner

@dependabot rebase

Updates the requirements on [datasets](https://github.com/huggingface/datasets) to permit the latest version.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@2.18.0...5.0.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 5.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/pip/datasets-gte-5.0.0 branch from 074ec0e to 1f85bb6 Compare June 15, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency update (Dependabot).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant