[Bug] MultiTurnGenInferencer `infer_every` 模式未正确累积历史对话

### 操作系统及版本

openeuler

### 安装工具的python环境

在anaconda/miniconda创建的python虚拟环境

### python版本

3.11

### AISBench工具版本

v3.1-20260611-master-3-gf8e3e76

### AISBench执行命令

 ais_bench --config-dir ./results/test1 --models model_config --datasets dataset_config --mode perf --summarizer default_perf --work-dir ./results/test1 --num-warmups 0 --debug

### 模型配置文件或自定义配置文件内容

在 `infer_every` 模式下，`MultiTurnGenInferencer` 未能将模型生成的回答正确累积到后续轮次的对话历史中。这导致多轮对话退化为"每轮只发送当前问题"，失去了多轮对话测试的意义。

  ---

  ### Environment

  - ais_bench version: v3.1-20260611-master
  - Python version:3.11

  ---

  ###复现步骤

  1. 准备一个多轮对话数据集（JSONL 格式）：
     ```json
     {"question": ["问题1", "问题2", "问题3"], "answer": ["", "", ""], "prefix": "system prompt..."}

  2. 配置使用 MultiTurnGenInferencer with infer_mode='every'：
  infer_cfg=dict(
      prompt_template=dict(
          type=MultiTurnPrefixPromptTemplate,
          template=dict(
              begin=[dict(role='SYSTEM', prompt='{prefix}')],
              round=[
                  dict(role='HUMAN', prompt='{question}'),
                  dict(role='BOT', prompt='{answer}'),
              ]
          )
      ),
      inferencer=dict(
          type=MultiTurnGenInferencer,
          infer_mode='every',
      )
  )
  3. 运行推理并检查 _details.jsonl 中的 input_tokens 字段。

  根因分析

  位于 ais_bench/benchmark/openicl/icl_inferencer/icl_multiturn_inferencer.py 第 120-157 行的 infer_every 方法：

  async def infer_every(self, data: dict, session: aiohttp.ClientSession):
      # ...
      bot_indices = [i for i, item in enumerate(chat) if item['role'] == 'BOT']
      turn_id = 0
      for i in bot_indices:
          # 问题在这里：chat[:i] 是列表切片，创建新副本
          # 即使后面修改了 chat[i]["prompt"]，也不会影响已经切片的 chat[:i]
          history = await asyncio.to_thread(self.model.parse_template,
              PromptList(left_prompt + chat[:i] + right_prompt), mode="gen")

          await self.model.generate(history, max_out_len, output, session=session, **data)

          if output.success:
              # 这里修改的是 chat[i]，但 history 已经是 chat[:i] 的副本
              chat[i]["prompt"] = output.content
              # ...

  问题：第 140 行的 chat[:i] 创建了一个新列表副本。第 149 行修改 chat[i]["prompt"] 不会影响已创建的 chat[:i] 副本。因此，每次循环中 history 里的 BOT 回答仍然是空字符串。

  ---
  修复建议

  维护一个累积的对话历史列表，在每轮推理后追加生成的 BOT 回答：

  # 伪代码
  accumulated_history = PromptList(left_prompt)
  turn_id = 0
  for i in bot_indices:
      # 从 accumulated_history 构建当前轮的完整 history
      current_history = accumulated_history + chat[len(accumulated_history) - len(left_prompt):i] + right_prompt

      await self.model.generate(current_history, ...)

      if output.success:
          # 将当前轮 HUMAN + BOT 追加到累积历史
          bot_response = copy.deepcopy(chat[i])
          bot_response["prompt"] = output.content
          accumulated_history.append(chat[i-1])  # HUMAN
          accumulated_history.append(bot_response)  # BOT with generated answer


### 预期行为

  - Round 0: input_tokens = prefix + q1
  - Round 1: input_tokens = prefix + q1 + a1 + q2（包含上一轮回答）
  - Round 2: input_tokens = prefix + q1 + a1 + q2 + a2 + q3（包含之前所有回答）

### 实际行为

  - Round 0: input_tokens = prefix + q1
  - Round 1: input_tokens = prefix + q1 + q2（缺少 a1）
  - Round 2: input_tokens = prefix + q1 + q2 + q3（缺少 a1, a2）

### 前置检查

- [x] 我已读懂主页文档的快速入门，无法解决问题
- [x] 我已检索过FAQ，无重复问题
- [x] 我已搜索过现有Issue，无重复问题
- [x] 我已更新到最新版本，问题仍存在

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] MultiTurnGenInferencer `infer_every` 模式未正确累积历史对话 #369

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

Environment

伪代码

预期行为

实际行为

前置检查

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] MultiTurnGenInferencer infer_every 模式未正确累积历史对话 #369

Description

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

Environment

伪代码

预期行为

实际行为

前置检查

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] MultiTurnGenInferencer `infer_every` 模式未正确累积历史对话 #369