Skip to content

chore(deps): bump transformers from 5.9.0 to 5.10.2#82

Merged
SahilKumar75 merged 1 commit into
mainfrom
dependabot/pip/transformers-5.10.2
Jun 8, 2026
Merged

chore(deps): bump transformers from 5.9.0 to 5.10.2#82
SahilKumar75 merged 1 commit into
mainfrom
dependabot/pip/transformers-5.10.2

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Jun 8, 2026

Copy link
Copy Markdown
Contributor

Bumps transformers from 5.9.0 to 5.10.2.

Release notes

Sourced from transformers's releases.

Patch release v5.10.2

There was a big bug in the model conversion of models related to clip, this affected models like sam3 and others. Please make sure to update 🙏

Full Changelog: huggingface/transformers@v5.10.1...v5.10.2

Release v5.10.1

v5.10.0 was yanked as we publish on a corrupted branch. Sorry everyone, this happens when we rush a release!!!

New Model additions

Gemma4 unified+ Gemma4 MTP

Gemma 4 12B Unified is an encoder-free multimodal model with pretrained and instruction-tuned variants. Unlike standard Gemma 4, which uses dedicated encoder towers, Gemma 4 12B Unified projects raw inputs directly into the language model's embedding space through lightweight linear pipelines. This results in a simpler architecture while maintaining strong multimodal performance.

Key differences from standard Gemma 4:

  • No Vision Tower: Raw pixel patches are projected directly into LM space via a Dense + LayerNorm pipeline with factorized 2D positional embeddings, replacing the vision encoder.
  • No Audio Tower: Raw 16 kHz waveform samples are chunked into fixed-length frames and projected through a simple RMSNorm → Linear pipeline, replacing the mel spectrogram + Conformer encoder.
  • Shared Multimodal Pipeline: Both vision and audio use the same Gemma4UnifiedMultimodalEmbedder (RMSNorm → Linear) for the final projection to text hidden space.

You can find the original Gemma 4 12B Unified checkpoints under the Gemma 4 release.

Sapiens2

Sapiens2 is a family of high-resolution vision transformers pretrained on ~1 billion curated human images, designed for human-centric computer vision tasks including pose estimation, body-part segmentation, surface normal estimation, and pointmap estimation. The models scale from 0.4B to 5B parameters and train at native 1K resolution, with hierarchical 4K variants for extended spatial reasoning. Sapiens2 achieves substantial improvements over its predecessor with +4 mAP in pose estimation, +24.3 mIoU in body-part segmentation, and 45.6% error reduction in normal estimation.

Links: Documentation | Paper

DeepSeek-OCR-2

DeepSeek-OCR-2 is an OCR-specialized vision-language model built on a distinctive architecture that combines a SAM ViT-B vision encoder with a Qwen2 hybrid attention encoder, connected through an MLP projector to a DeepSeek-V2 Mixture-of-Experts (MoE) language model. The model features a hybrid attention mechanism that applies bidirectional attention over image tokens and causal attention over query tokens, enabling efficient and accurate document understanding. It supports both plain OCR tasks and grounding capabilities with coordinate-aware output for document conversion to markdown format.

Links: Documentation

Mellum

Mellum is a code-focused Mixture-of-Experts language model developed by JetBrains. It is derived from the Qwen3-MoE architecture with per-layer-type RoPE and interleaved sliding window attention. The model has 12B total parameters with 2.5B active parameters per token, using 64 routed experts with 8 activated per token across 28 layers.

Links: Documentation

... (truncated)

Commits

@dependabot dependabot Bot requested a review from SahilKumar75 as a code owner June 8, 2026 06:00

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Review Checklist

Before approving this PR, confirm each item:

Code quality

  • Logic is correct and edge cases are handled
  • No debug logs, dead code, or commented-out blocks left in
  • Naming is clear — no abbreviations that need a comment to decode

Tests

  • New behaviour is covered by tests, or existing tests updated
  • poetry run pytest passes locally

Frontend (if applicable)

  • UI renders correctly across light/dark mode
  • No layout regressions on narrow viewports

Infrastructure / config (if applicable)

  • Secrets/env vars are not hardcoded
  • Docker build still passes (docker build .)

Docs

  • CHANGELOG or PR description explains the why, not just the what
  • Public API changes are reflected in docs

Review, check off what applies, then submit your formal Approve or Request Changes.

@dependabot dependabot Bot force-pushed the dependabot/pip/transformers-5.10.2 branch 3 times, most recently from 67ac09d to dbd901f Compare June 8, 2026 07:04
Bumps [transformers](https://github.com/huggingface/transformers) from 5.9.0 to 5.10.2.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](huggingface/transformers@v5.9.0...v5.10.2)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 5.10.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/pip/transformers-5.10.2 branch from dbd901f to f87a131 Compare June 8, 2026 07:25

@SahilKumar75 SahilKumar75 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: Patch version upgrade (5.9.0 → 5.10.2). Critical bug fixes for CLIP models. Core tests passed.

@SahilKumar75 SahilKumar75 merged commit 1933243 into main Jun 8, 2026
9 checks passed
@dependabot dependabot Bot deleted the dependabot/pip/transformers-5.10.2 branch June 8, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant