Skip to content

Deepseek: skip first N evals#891

Open
denys-fridman wants to merge 3 commits into
mlcommons:masterfrom
denys-fridman:dfridman/ds-skip-first-n-evals
Open

Deepseek: skip first N evals#891
denys-fridman wants to merge 3 commits into
mlcommons:masterfrom
denys-fridman:dfridman/ds-skip-first-n-evals

Conversation

@denys-fridman

@denys-fridman denys-fridman commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

We'd like to propose a change in the eval schedule of deepseek - the current setup is problematic because the validation dataset is small (1024 samples) so it's very inefficient to run eval at scale larger than the number of validation samples. The training GBS is set at 15k+, and since the training dataset is enormous, even large systems can support that. On the other hand, since the validation dataset is 1k samples, the largest GBS that makes sense is 1k. To handle the disconnect you have to either duplicate the dataset multiple times to maintain the same GBS for training/validation - reducing the scaling efficiency by doing redundant work; or implement a different parallelisation schema for validation - something that nobody does because it adds overheads.

Here's the proposal:
The validation starts after N(GBS) = GBS*FLOOR(42+24576/GBS) samples - this gives us 2-3 spare evaluations before we hit the RCP min, and ~4 before we hit the RCP avg. The equation (slope) was obtained by fitting a straight line to the RCP avg at different GBS. The bias was obtained empirically by lowering it enough to be 2-3 evaluations below the lowest RCP point at all GBS - to make sure the reduction of the total_eval_cost/total_train_cost is approximately 90%. The green line on the plot shows the function.

Related PRs:

image (2)

denys-fridman and others added 3 commits June 29, 2026 14:07
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@denys-fridman denys-fridman requested a review from a team as a code owner June 29, 2026 12:57
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant