Skip to content

【Feature】Supports rerunning specified use cases on SWE dataset#331

Open
yejj710 wants to merge 1 commit into
AISBench:masterfrom
yejj710:swe_plus1
Open

【Feature】Supports rerunning specified use cases on SWE dataset#331
yejj710 wants to merge 1 commit into
AISBench:masterfrom
yejj710:swe_plus1

Conversation

@yejj710

@yejj710 yejj710 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

🔍 Motivation / 变更动机

支持在一个txt文件中,逐行写入指定用例进行swe测评,方便重跑没有生成patch的用例。

📝 Modification / 修改内容

正则表达式匹配对于一些特定case的场景,不方便使用,直接指定用例更加直白方便。

✅ Checklist / 检查列表

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues. / 使用预提交或其他 linting 工具来修复潜在的 lint 问题。
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。
  • The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。
  • All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects. / 如果此拉取请求对下游或其他相关项目有潜在影响,应在那些项目中测试此 PR。
  • CLA has been signed and all committers have signed the CLA in this PR. / CLA 已签署,且本 PR 中的所有提交者均已签署 CLA。

👥 Collaboration Info / 协作信息

  • Suggested Reviewers / 建议审核人: @xxx
  • Relevant Module Owners / 相关模块负责人: @xxx
  • Other Collaboration Notes / 其他协作说明:

🌟 Useful CI Command / 实用的CI命令

Command / 命令 Introduction / 介绍
/gemini review Performs a code review for the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 执行代码审核。
/gemini summary Provides a summary of the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 提供摘要。
/gemini help Displays a list of available commands of Gemini. / 显示 Gemini 可用命令的列表。
/readthedocs build Triggers a build of the documentation for the current pull request in its current state by Read the Docs. / 触发当前拉取请求在当前状态下由 Read the Docs 构建文档。

@yejj710 yejj710 temporarily deployed to smoke-test-approval June 8, 2026 12:48 — with GitHub Actions Inactive

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to filter SWE-Bench dataset instances using a list of instance IDs loaded from a text file. It updates the SWEBenchDataset class to parse this file and apply the filters, updates several example configuration files, and adds unit tests. The reviewer suggests removing the strict .txt file extension requirement to support other plain text formats, and adding validation to raise an error if the provided instance IDs file is empty. They also recommend updating the unit tests to align with these changes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +55 to +79
def _load_instance_ids_file(self, instance_ids_file: str) -> set[str]:
path = Path(instance_ids_file).expanduser()
if not path.is_file():
raise FileOperationError(
SWEB_CODES.LOCAL_PATH_RESOLVE_FAILED,
f"SWE-Bench instance ids file does not exist: {instance_ids_file!r}",
)
if path.suffix.lower() != ".txt":
raise FileOperationError(
SWEB_CODES.LOCAL_PATH_RESOLVE_FAILED,
f"SWE-Bench instance ids file must be a .txt file: {instance_ids_file!r}",
)

try:
instance_ids = {
line.strip()
for line in path.read_text(encoding="utf-8").splitlines()
if line.strip()
}
except OSError as e:
raise FileOperationError(
SWEB_CODES.LOCAL_PATH_RESOLVE_FAILED,
f"Failed to read SWE-Bench instance ids file {instance_ids_file!r}: {e}",
)
return instance_ids

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

  1. 移除不必要的 .txt 后缀限制:限制 instance_ids_file 必须为 .txt 后缀是不必要的。用户可能会使用 .log.csv.list 或无后缀的文件。只要文件是纯文本且每行一个 ID,就应该允许读取。移除此限制可以提升通用性和用户体验。
  2. 增加空文件校验:如果用户提供了一个空文件,当前代码会返回一个空集合 set(),导致后续过滤后的数据集为空,并在没有任何错误提示的情况下静默结束运行。建议在读取文件后,如果解析出的 instance_ids 为空,则抛出 FileOperationError 异常,以明确提示用户文件内容无效。
    def _load_instance_ids_file(self, instance_ids_file: str) -> set[str]:
        path = Path(instance_ids_file).expanduser()
        if not path.is_file():
            raise FileOperationError(
                SWEB_CODES.LOCAL_PATH_RESOLVE_FAILED,
                f"SWE-Bench instance ids file does not exist: {instance_ids_file!r}",
            )

        try:
            instance_ids = {
                line.strip()
                for line in path.read_text(encoding="utf-8").splitlines()
                if line.strip()
            }
        except OSError as e:
            raise FileOperationError(
                SWEB_CODES.LOCAL_PATH_RESOLVE_FAILED,
                f"Failed to read SWE-Bench instance ids file {instance_ids_file!r}: {e}",
            )

        if not instance_ids:
            raise FileOperationError(
                SWEB_CODES.LOCAL_PATH_RESOLVE_FAILED,
                f"SWE-Bench instance ids file is empty or contains no valid ids: {instance_ids_file!r}",
            )
        return instance_ids

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加这种约束是为了避免太宽松导致的一些不确定性问题

Comment on lines +24 to +30
def test_load_instance_ids_file_requires_txt_suffix(self):
with tempfile.TemporaryDirectory() as temp_dir:
ids_file = Path(temp_dir) / "ids.csv"
ids_file.write_text("django__django-1\n", encoding="utf-8")

with self.assertRaises(FileOperationError):
self.dataset._load_instance_ids_file(str(ids_file))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

由于移除了对 .txt 后缀的限制,原有的 test_load_instance_ids_file_requires_txt_suffix 测试用例将不再适用。建议将其替换为针对空文件抛出 FileOperationError 异常的测试用例,以确保新添加的空文件校验逻辑得到充分测试。

Suggested change
def test_load_instance_ids_file_requires_txt_suffix(self):
with tempfile.TemporaryDirectory() as temp_dir:
ids_file = Path(temp_dir) / "ids.csv"
ids_file.write_text("django__django-1\n", encoding="utf-8")
with self.assertRaises(FileOperationError):
self.dataset._load_instance_ids_file(str(ids_file))
def test_load_instance_ids_file_empty_raises_error(self):
with tempfile.TemporaryDirectory() as temp_dir:
ids_file = Path(temp_dir) / "ids.txt"
ids_file.write_text(" \n\n \n", encoding="utf-8")
with self.assertRaises(FileOperationError):
self.dataset._load_instance_ids_file(str(ids_file))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant