Skip to content

[Bug] mmstar数据集精度测试结果全0 #254

Description

@huiying-zhu

操作系统及版本

openeuler22.04.5 LTS

安装工具的python环境

docker容器中的python环境

python版本

3.11

AISBench工具版本

3.1.20260327

AISBench执行命令

ais_bench --models vllm_api_general_chat --datasets mmstar_gen --debug

模型配置文件或自定义配置文件内容

from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content

models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-general-chat",
path="",
model="qwen3-vl-32b",
stream=False,
request_rate=0,
use_timestamp=False,
retry=2,
api_key="",
host_ip="0.0.0.0",
host_port=8000,
url="",
max_out_len=16000,
batch_size=128,
trust_remote_code=False,
generation_kwargs=dict(
temperature=0.0,
ignore_eos=True,
),

pred_postprocessor=dict(type=extract_non_reasoning_content),

)

]

预期行为

期望得到qwen3-vl-32b-instruct模型对于mmstar数据集的精度测试结果

实际行为

dataset version metric mode vllm-api-general-chat
mmstar d9b8ec Overall gen 0.00
mmstar d9b8ec coarse perception gen 0.00
mmstar d9b8ec fine-grained perception gen 0.00
mmstar d9b8ec instance reasoning gen 0.00
mmstar d9b8ec logical reasoning gen 0.00
mmstar d9b8ec math gen 0.00
mmstar d9b8ec science & technology gen 0.00

前置检查

  • 我已读懂主页文档的快速入门,无法解决问题
  • 我已检索过FAQ,无重复问题
  • 我已搜索过现有Issue,无重复问题
  • 我已更新到最新版本,问题仍存在

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions