Releases: AISBench/benchmark
Releases · AISBench/benchmark
Release list
v3.1-20260630-master
🌟 Release Note
📦 Docker Images For This Release
| name | python version | offline resources | image size | offline resource size |
|---|---|---|---|---|
| v3.1-20260630-master-openeuler22.03-py310 | 3.10 | aarch64 x86_64 |
aarch64: 3.01 GB x86_64: 3.17 GB |
aarch64: 822 MB x86_64: 902 MB |
| v3.1-20260630-master-ubuntu22.04-py310 | 3.10 | aarch64 x86_64 |
aarch64: 2.38 GB x86_64: 2.59 GB |
aarch64: 653 MB x86_64: 723 MB |
| v3.1-20260630-master-openeuler24.03-py311 | 3.11 | aarch64 x86_64 |
aarch64: 2.75 GB x86_64: 2.95 GB |
aarch64: 759 MB x86_64: 849 MB |
| v3.1-20260630-master-ubuntu24.04-py312 | 3.12 | aarch64 x86_64 |
aarch64: 2.37 GB x86_64: 2.75 GB |
aarch64: 701 MB x86_64: 777 MB |
👉 Click Here For Docker Images Usage Guidance
🔍 Install This Version From Pypi
pip3 install ais_bench_benchmark==3.1.20260630📄 Document For This Release
https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/v3.1-20260415-master/
What's Changed
- [bugfix] Let trust_remote_code take effect by @SJTUyh in #226
- Update textvqa.py by @1037husterljx in #230
- 【Feature】 Adapt util/worker config typing and defaults for custom cfg by @GaoHuaZhang in #239
- 【Feature】add SWE-bench eval task, dataset loader, and summarizer integration by @GaoHuaZhang in #240
- 【Feature】Support SWE-Bench benchmark pipeline and Mini SWE Agent integration by @GaoHuaZhang in #241
- 【Feature】add SWE-bench example configs and bilingual user guide by @GaoHuaZhang in #191
- [Bugfix] support vita model by @Huangzjun in #237
- [feature][for merge][part0]tau2 bench code by @SJTUyh in #249
- [feature][for merge][part1]tau2 bench docs by @SJTUyh in #250
- [feature][for merge][part2] tau2 bench test cases by @SJTUyh in #251
- [docs] 20260415 pre-release docs update by @SJTUyh in #252
- remove transformers version limit by @wenba0 in #261
- [Bug Fix] Fixed import/recognition issues with
postprocess-related modules when used as packages, preventing abnormal behavior caused by missing package initialization files. by @GaoHuaZhang in #263 - [bugfix] Add postprocessor to extract last option for MMMU datasets by @SijieFu in #238
- 【Feature】Support Vbench third party and license by @GaoHuaZhang in #152
- Feature/refcoco/+/g benchmark support by @zhongzhouTan-coder in #201
- [Bugfix]Fix the tau2 bench metrics display in pass^k situation by @SJTUyh in #272
- support mathvision by @wenba0 in #264
- [feature] support aime26 dataset by @yejj710 in #274
- 【Feature】【Part1】AISBench support the VBench 1.0 Video Quality Evaluation Pipeline. by @GaoHuaZhang in #273
- 【Feature】【Part2】Add VBench eval examples and doc by @GaoHuaZhang in #270
- 【Feature】 add verified_mini dataset mapping, example config, and guide updates by @GaoHuaZhang in #271
- 【feature】Support RealworldQA dataset by @wanlongze in #268
- 【UT】补充realworldqa相关ut测试用例 by @wanlongze in #287
- add UT & doc for mathvision by @wenba0 in #288
- Test/refcoco by @zhongzhouTan-coder in #276
- [docs] add docs for aime26 by @yejj710 in #289
- 【doc】补充realworldqa资料 by @wanlongze in #290
- Docs/refcoco by @zhongzhouTan-coder in #277
- 【doc】增加Agentic Coding评测方案设计 by @junemoon-happy in #292
- 【doc】fix Error in user YAML: () by @junemoon-happy in #293
- modify mmmu by @wenba0 in #265
- fix mm input_tokens=0 when server return prompt_tokens by @wenba0 in #299
- [docs]Add mini dataset docs by @SJTUyh in #302
- [bugfix]remove MMMU abcd postprocess, maybe E by @wenba0 in #303
- feat(datasets): add HLE dataset by @ivanbao9783 in #301
- [bugfix] Fix multilingual mini config by @SJTUyh in #305
- [Docs][Bug]Change the way to install mini-swe-agent by @SJTUyh in #308
- docs: fix incorrect path agent_examples → agent_example in tau2_bench docs by @zhongzhouTan-coder with @Copilot in #312
- [UT]: add UT coverage for swebench by @wanlongze in #316
- [UT]: add UT coverage for core datasets (aime, realworldqa, math, gsm8k, gpqa, dapo_math) by @wanlongze in #315
- [feature][for merge][part1] Support terminal-bench-2 by @SJTUyh in #318
- [feature][for merge][part0] Support terminal-bench-2 by @SJTUyh in #319
- [feature][for merge][part2] Support terminal-bench-2 by @SJTUyh in #320
- [feature][for merge][part3] Support terminal-bench-2 by @SJTUyh in #321
- [UT]: add UT coverage for MMMU dataset by @wanlongze in #325
- [UT]: add UT coverage for MMMU-Pro and MMStar datasets by @wanlongze in #326
- support PreTrainedTokenizerFast for dsv32 by @wenba0 in #330
- feat(dataset): add swe-bench-pro dataset by @ivanbao9783 in #333
- feat(docs): Add swe-bench-pro dataset documentation by @ivanbao9783 in #334
- feat(UT): add swe-bench_pro UT tests by @ivanbao9783 in #335
- feat(UT): add swe-bench_pro UT tests - eval by @ivanbao9783 in #336
- feat(UT): add swe-bench_pro UT tasks - utils by @ivanbao9783 in #338
- feat(UT): add swe-bench_pro UT tests - infer by @ivanbao9783 in #337
- [sub PR for merge] Add OVERVIEW.md for dockerfile by @SJTUyh in #340
- [sub PR for merge] Add dockerfile and build script by @SJTUyh in #339
- [CI/docs] AISBench Benchmark upload to pypi by @SJTUyh in #344
- [feature] Support multi-arch docker images of AISBench/benchmark by @SJTUyh in #332
- Add logging for exceptions and label for Docker container by @zhongzhouTan-coder in #351
- fix(doc): 删除swebp数据集Readme中的重复文字 by @ivanbao9783 in #355
- fix(doc): 修改hle数据集Readme中的任务名称 by @ivanbao9783...
v3.1-20260415-master
🌟 Release Note
📦 Docker Images For This Release
| name | arch | python version | offline resources | image size | tar.gz size |
|---|---|---|---|---|---|
| v3.1-20260415-master_aarch64_py_310 | aarch64 | 3.10 | https://aisbench.obs.cn-north-4.myhuaweicloud.com/images/benchmark/github/ais_bench_benchmark_image_v3.1-20260415-master_aarch64_py_310.tar.gz | 2.77 GB | 846 MB |
| v3.1-20260415-master_x86_64_py_415 | x86_64 | 3.10 | https://aisbench.obs.cn-north-4.myhuaweicloud.com/images/benchmark/github/ais_bench_benchmark_image_v3.1-20260415-master_x86_64_py_310.tar.gz | 3.04 GB | 926 MB |
👉 Click Here For Docker Images Usage Guidance
📄 Document For This Release
https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/v3.1-20260415-master/
What's Changed
- [bugfix] Let trust_remote_code take effect by @SJTUyh in #226
- Update textvqa.py by @1037husterljx in #230
- 【Feature】 Adapt util/worker config typing and defaults for custom cfg by @GaoHuaZhang in #239
- 【Feature】add SWE-bench eval task, dataset loader, and summarizer integration by @GaoHuaZhang in #240
- 【Feature】Support SWE-Bench benchmark pipeline and Mini SWE Agent integration by @GaoHuaZhang in #241
- 【Feature】add SWE-bench example configs and bilingual user guide by @GaoHuaZhang in #191
- [Bugfix] support vita model by @Huangzjun in #237
- [feature][for merge][part0]tau2 bench code by @SJTUyh in #249
- [feature][for merge][part1]tau2 bench docs by @SJTUyh in #250
- [feature][for merge][part2] tau2 bench test cases by @SJTUyh in #251
- [docs] 20260415 pre-release docs update by @SJTUyh in #252
New Contributors
- @1037husterljx made their first contribution in #230
- @Huangzjun made their first contribution in #237
Full Changelog: v3.1-20260330-master...v3.1-20260415-master
😄 Thanks for using AISBench/benchmark !
v3.1-20260330-master
🌟 Release Note
📦 Docker Images For This Release
| name | arch | python version | offline resources | tar.gz size | image size |
|---|---|---|---|---|---|
| v3.1-20260330-master_aarch64_py_310 | aarch64 | 3.10 | https://aisbench.obs.cn-north-4.myhuaweicloud.com/images/benchmark/github/ais_bench_benchmark_image_v3.1-20260330-master_aarch64_py_310.tar.gz | 2.77 GB | 850 MB |
| v3.1-20260330-master_x86_64_py_310 | x86_64 | 3.10 | https://aisbench.obs.cn-north-4.myhuaweicloud.com/images/benchmark/github/ais_bench_benchmark_image_v3.1-20260330-master_x86_64_py_310.tar.gz | 3.04 GB | 950 MB |
👉 Click Here For Docker Images Usage Guidance
📄 Document For This Release
https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/v3.1-20260330-master/
😄 Thanks for using AISBench/benchmark !
v3.0-20251219-master (pre-release)
🌟 Release Note
📦 Docker Images For This Release
👉 Click Here For Docker Images Usage Guidance
📄 Document For This Release
https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/v3.0-20251219-master/