Deepseek: skip first N evals by denys-fridman · Pull Request #891 · mlcommons/training

denys-fridman · 2026-06-29T12:57:51Z

We'd like to propose a change in the eval schedule of deepseek - the current setup is problematic because the validation dataset is small (1024 samples) so it's very inefficient to run eval at scale larger than the number of validation samples. The training GBS is set at 15k+, and since the training dataset is enormous, even large systems can support that. On the other hand, since the validation dataset is 1k samples, the largest GBS that makes sense is 1k. To handle the disconnect you have to either duplicate the dataset multiple times to maintain the same GBS for training/validation - reducing the scaling efficiency by doing redundant work; or implement a different parallelisation schema for validation - something that nobody does because it adds overheads.

Here's the proposal:
The validation starts after N(GBS) = GBS*FLOOR(42+24576/GBS) samples - this gives us 2-3 spare evaluations before we hit the RCP min, and ~4 before we hit the RCP avg. The equation (slope) was obtained by fitting a straight line to the RCP avg at different GBS. The bias was obtained empirically by lowering it enough to be 2-3 evaluations below the lowest RCP point at all GBS - to make sure the reduction of the total_eval_cost/total_train_cost is approximately 90%. The green line on the plot shows the function.

Related PRs:

logging: Add 6.1.0 and FIRST_CHECK support logging#467
training_policies: Add first eval sample requirement for deepseekv3_671b training_policies#587

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T12:58:00Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

denys-fridman and others added 3 commits June 29, 2026 14:07

deepseek: add start_eval_at_iter patch to skip first N evaluations

ceac6a1

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

deepseek: expose START_EVAL_AT_ITER env var to skip first N evals

46610f0

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

deepseek: set START_EVAL_AT_ITER=floor(42+24576/GBS) in run configs

6c2ba88

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

denys-fridman requested a review from a team as a code owner June 29, 2026 12:57

denys-fridman mentioned this pull request Jun 30, 2026

Add 6.1.0 and FIRST_CHECK support mlcommons/logging#467

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deepseek: skip first N evals#891

Deepseek: skip first N evals#891
denys-fridman wants to merge 3 commits into
mlcommons:masterfrom
denys-fridman:dfridman/ds-skip-first-n-evals

denys-fridman commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

denys-fridman commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

denys-fridman commented Jun 29, 2026 •

edited

Loading