Skip to content

Releases: AISBench/benchmark

v3.1-20260630-master

Choose a tag to compare

@SJTUyh SJTUyh released this 30 Jun 11:40
1e7cb37

🌟 Release Note

👉 Click Here For Details

📦 Docker Images For This Release

name python version offline resources image size offline resource size
v3.1-20260630-master-openeuler22.03-py310 3.10 aarch64
x86_64
aarch64: 3.01 GB
x86_64: 3.17 GB
aarch64: 822 MB
x86_64: 902 MB
v3.1-20260630-master-ubuntu22.04-py310 3.10 aarch64
x86_64
aarch64: 2.38 GB
x86_64: 2.59 GB
aarch64: 653 MB
x86_64: 723 MB
v3.1-20260630-master-openeuler24.03-py311 3.11 aarch64
x86_64
aarch64: 2.75 GB
x86_64: 2.95 GB
aarch64: 759 MB
x86_64: 849 MB
v3.1-20260630-master-ubuntu24.04-py312 3.12 aarch64
x86_64
aarch64: 2.37 GB
x86_64: 2.75 GB
aarch64: 701 MB
x86_64: 777 MB

👉 Click Here For Docker Images Usage Guidance

🔍 Install This Version From Pypi

pip3 install ais_bench_benchmark==3.1.20260630

📄 Document For This Release

https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/v3.1-20260415-master/

What's Changed

  • [bugfix] Let trust_remote_code take effect by @SJTUyh in #226
  • Update textvqa.py by @1037husterljx in #230
  • 【Feature】 Adapt util/worker config typing and defaults for custom cfg by @GaoHuaZhang in #239
  • 【Feature】add SWE-bench eval task, dataset loader, and summarizer integration by @GaoHuaZhang in #240
  • 【Feature】Support SWE-Bench benchmark pipeline and Mini SWE Agent integration by @GaoHuaZhang in #241
  • 【Feature】add SWE-bench example configs and bilingual user guide by @GaoHuaZhang in #191
  • [Bugfix] support vita model by @Huangzjun in #237
  • [feature][for merge][part0]tau2 bench code by @SJTUyh in #249
  • [feature][for merge][part1]tau2 bench docs by @SJTUyh in #250
  • [feature][for merge][part2] tau2 bench test cases by @SJTUyh in #251
  • [docs] 20260415 pre-release docs update by @SJTUyh in #252
  • remove transformers version limit by @wenba0 in #261
  • [Bug Fix] Fixed import/recognition issues with postprocess-related modules when used as packages, preventing abnormal behavior caused by missing package initialization files. by @GaoHuaZhang in #263
  • [bugfix] Add postprocessor to extract last option for MMMU datasets by @SijieFu in #238
  • 【Feature】Support Vbench third party and license by @GaoHuaZhang in #152
  • Feature/refcoco/+/g benchmark support by @zhongzhouTan-coder in #201
  • [Bugfix]Fix the tau2 bench metrics display in pass^k situation by @SJTUyh in #272
  • support mathvision by @wenba0 in #264
  • [feature] support aime26 dataset by @yejj710 in #274
  • 【Feature】【Part1】AISBench support the VBench 1.0 Video Quality Evaluation Pipeline. by @GaoHuaZhang in #273
  • 【Feature】【Part2】Add VBench eval examples and doc by @GaoHuaZhang in #270
  • 【Feature】 add verified_mini dataset mapping, example config, and guide updates by @GaoHuaZhang in #271
  • 【feature】Support RealworldQA dataset by @wanlongze in #268
  • 【UT】补充realworldqa相关ut测试用例 by @wanlongze in #287
  • add UT & doc for mathvision by @wenba0 in #288
  • Test/refcoco by @zhongzhouTan-coder in #276
  • [docs] add docs for aime26 by @yejj710 in #289
  • 【doc】补充realworldqa资料 by @wanlongze in #290
  • Docs/refcoco by @zhongzhouTan-coder in #277
  • 【doc】增加Agentic Coding评测方案设计 by @junemoon-happy in #292
  • 【doc】fix Error in user YAML: () by @junemoon-happy in #293
  • modify mmmu by @wenba0 in #265
  • fix mm input_tokens=0 when server return prompt_tokens by @wenba0 in #299
  • [docs]Add mini dataset docs by @SJTUyh in #302
  • [bugfix]remove MMMU abcd postprocess, maybe E by @wenba0 in #303
  • feat(datasets): add HLE dataset by @ivanbao9783 in #301
  • [bugfix] Fix multilingual mini config by @SJTUyh in #305
  • [Docs][Bug]Change the way to install mini-swe-agent by @SJTUyh in #308
  • docs: fix incorrect path agent_examples → agent_example in tau2_bench docs by @zhongzhouTan-coder with @Copilot in #312
  • [UT]: add UT coverage for swebench by @wanlongze in #316
  • [UT]: add UT coverage for core datasets (aime, realworldqa, math, gsm8k, gpqa, dapo_math) by @wanlongze in #315
  • [feature][for merge][part1] Support terminal-bench-2 by @SJTUyh in #318
  • [feature][for merge][part0] Support terminal-bench-2 by @SJTUyh in #319
  • [feature][for merge][part2] Support terminal-bench-2 by @SJTUyh in #320
  • [feature][for merge][part3] Support terminal-bench-2 by @SJTUyh in #321
  • [UT]: add UT coverage for MMMU dataset by @wanlongze in #325
  • [UT]: add UT coverage for MMMU-Pro and MMStar datasets by @wanlongze in #326
  • support PreTrainedTokenizerFast for dsv32 by @wenba0 in #330
  • feat(dataset): add swe-bench-pro dataset by @ivanbao9783 in #333
  • feat(docs): Add swe-bench-pro dataset documentation by @ivanbao9783 in #334
  • feat(UT): add swe-bench_pro UT tests by @ivanbao9783 in #335
  • feat(UT): add swe-bench_pro UT tests - eval by @ivanbao9783 in #336
  • feat(UT): add swe-bench_pro UT tasks - utils by @ivanbao9783 in #338
  • feat(UT): add swe-bench_pro UT tests - infer by @ivanbao9783 in #337
  • [sub PR for merge] Add OVERVIEW.md for dockerfile by @SJTUyh in #340
  • [sub PR for merge] Add dockerfile and build script by @SJTUyh in #339
  • [CI/docs] AISBench Benchmark upload to pypi by @SJTUyh in #344
  • [feature] Support multi-arch docker images of AISBench/benchmark by @SJTUyh in #332
  • Add logging for exceptions and label for Docker container by @zhongzhouTan-coder in #351
  • fix(doc): 删除swebp数据集Readme中的重复文字 by @ivanbao9783 in #355
  • fix(doc): 修改hle数据集Readme中的任务名称 by @ivanbao9783...
Read more

v3.1-20260415-master

v3.1-20260415-master Pre-release
Pre-release

Choose a tag to compare

@SJTUyh SJTUyh released this 15 Apr 03:56
9c023bc

🌟 Release Note

👉 Click Here For Details

📦 Docker Images For This Release

name arch python version offline resources image size tar.gz size
v3.1-20260415-master_aarch64_py_310 aarch64 3.10 https://aisbench.obs.cn-north-4.myhuaweicloud.com/images/benchmark/github/ais_bench_benchmark_image_v3.1-20260415-master_aarch64_py_310.tar.gz 2.77 GB 846 MB
v3.1-20260415-master_x86_64_py_415 x86_64 3.10 https://aisbench.obs.cn-north-4.myhuaweicloud.com/images/benchmark/github/ais_bench_benchmark_image_v3.1-20260415-master_x86_64_py_310.tar.gz 3.04 GB 926 MB

👉 Click Here For Docker Images Usage Guidance

📄 Document For This Release

https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/v3.1-20260415-master/

What's Changed

  • [bugfix] Let trust_remote_code take effect by @SJTUyh in #226
  • Update textvqa.py by @1037husterljx in #230
  • 【Feature】 Adapt util/worker config typing and defaults for custom cfg by @GaoHuaZhang in #239
  • 【Feature】add SWE-bench eval task, dataset loader, and summarizer integration by @GaoHuaZhang in #240
  • 【Feature】Support SWE-Bench benchmark pipeline and Mini SWE Agent integration by @GaoHuaZhang in #241
  • 【Feature】add SWE-bench example configs and bilingual user guide by @GaoHuaZhang in #191
  • [Bugfix] support vita model by @Huangzjun in #237
  • [feature][for merge][part0]tau2 bench code by @SJTUyh in #249
  • [feature][for merge][part1]tau2 bench docs by @SJTUyh in #250
  • [feature][for merge][part2] tau2 bench test cases by @SJTUyh in #251
  • [docs] 20260415 pre-release docs update by @SJTUyh in #252

New Contributors

Full Changelog: v3.1-20260330-master...v3.1-20260415-master

😄 Thanks for using AISBench/benchmark !

v3.1-20260330-master

Choose a tag to compare

@SJTUyh SJTUyh released this 31 Mar 02:18
cbe9c2f

v3.0-20251219-master (pre-release)

Pre-release

Choose a tag to compare

@SJTUyh SJTUyh released this 19 Dec 09:37
469e8df