操作系统及版本
openeuler22.04.5 LTS
安装工具的python环境
docker容器中的python环境
python版本
3.11
AISBench工具版本
3.1.20260327
AISBench执行命令
ais_bench --models vllm_api_general_chat --datasets mmstar_gen --debug
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-general-chat",
path="",
model="qwen3-vl-32b",
stream=False,
request_rate=0,
use_timestamp=False,
retry=2,
api_key="",
host_ip="0.0.0.0",
host_port=8000,
url="",
max_out_len=16000,
batch_size=128,
trust_remote_code=False,
generation_kwargs=dict(
temperature=0.0,
ignore_eos=True,
),
pred_postprocessor=dict(type=extract_non_reasoning_content),
]
预期行为
期望得到qwen3-vl-32b-instruct模型对于mmstar数据集的精度测试结果
实际行为
| dataset |
version |
metric |
mode |
vllm-api-general-chat |
| mmstar |
d9b8ec |
Overall |
gen |
0.00 |
| mmstar |
d9b8ec |
coarse perception |
gen |
0.00 |
| mmstar |
d9b8ec |
fine-grained perception |
gen |
0.00 |
| mmstar |
d9b8ec |
instance reasoning |
gen |
0.00 |
| mmstar |
d9b8ec |
logical reasoning |
gen |
0.00 |
| mmstar |
d9b8ec |
math |
gen |
0.00 |
| mmstar |
d9b8ec |
science & technology |
gen |
0.00 |
前置检查
操作系统及版本
openeuler22.04.5 LTS
安装工具的python环境
docker容器中的python环境
python版本
3.11
AISBench工具版本
3.1.20260327
AISBench执行命令
ais_bench --models vllm_api_general_chat --datasets mmstar_gen --debug
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-general-chat",
path="",
model="qwen3-vl-32b",
stream=False,
request_rate=0,
use_timestamp=False,
retry=2,
api_key="",
host_ip="0.0.0.0",
host_port=8000,
url="",
max_out_len=16000,
batch_size=128,
trust_remote_code=False,
generation_kwargs=dict(
temperature=0.0,
ignore_eos=True,
),
pred_postprocessor=dict(type=extract_non_reasoning_content),
]
预期行为
期望得到qwen3-vl-32b-instruct模型对于mmstar数据集的精度测试结果
实际行为
前置检查