Popular repositories Loading
-
-
-
-
-
EssenceBench
EssenceBench PublicForked from gszfwsb/EssenceBench
Official PyTorch implementation of the paper "Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data" (EssenceBench) in ICLR 2026.
Python
-
MMLU-Pro
MMLU-Pro PublicForked from TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.