-
Notifications
You must be signed in to change notification settings - Fork 46
【Feature】Supports rerunning specified use cases on SWE dataset #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yejj710
wants to merge
1
commit into
AISBench:master
Choose a base branch
from
yejj710:swe_plus1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| split="test", | ||
| step_limit=STEP_LIMIT, | ||
| filter_spec="", | ||
| instance_ids_file="", | ||
| shuffle=False, | ||
| ), | ||
| ] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| split="test", | ||
| step_limit=STEP_LIMIT, | ||
| filter_spec="", | ||
| instance_ids_file="", | ||
| shuffle=False, | ||
| ), | ||
| ] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| split="test", | ||
| step_limit=STEP_LIMIT, | ||
| filter_spec="", | ||
| instance_ids_file="", | ||
| shuffle=False, | ||
| ), | ||
| ] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| split="test", | ||
| step_limit=STEP_LIMIT, | ||
| filter_spec="", | ||
| instance_ids_file="", | ||
| shuffle=False, | ||
| ), | ||
| ] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| split="test", | ||
| step_limit=STEP_LIMIT, | ||
| filter_spec="", | ||
| instance_ids_file="", | ||
| shuffle=False, | ||
| ), | ||
| ] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,49 @@ | ||||||||||||||||||||||||||||||
| import tempfile | ||||||||||||||||||||||||||||||
| import unittest | ||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||
| from unittest import mock | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| from ais_bench.benchmark.datasets.swebench import SWEBenchDataset | ||||||||||||||||||||||||||||||
| from ais_bench.benchmark.utils.logging.exceptions import FileOperationError | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| class TestSWEBenchDataset(unittest.TestCase): | ||||||||||||||||||||||||||||||
| def setUp(self): | ||||||||||||||||||||||||||||||
| self.dataset = object.__new__(SWEBenchDataset) | ||||||||||||||||||||||||||||||
| self.dataset.logger = mock.MagicMock() | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| def test_load_instance_ids_file(self): | ||||||||||||||||||||||||||||||
| with tempfile.TemporaryDirectory() as temp_dir: | ||||||||||||||||||||||||||||||
| ids_file = Path(temp_dir) / "ids.txt" | ||||||||||||||||||||||||||||||
| ids_file.write_text("django__django-1\n\nsympy__sympy-2\nsympy__sympy-2\n", encoding="utf-8") | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| instance_ids = self.dataset._load_instance_ids_file(str(ids_file)) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| self.assertEqual(instance_ids, {"django__django-1", "sympy__sympy-2"}) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| def test_load_instance_ids_file_requires_txt_suffix(self): | ||||||||||||||||||||||||||||||
| with tempfile.TemporaryDirectory() as temp_dir: | ||||||||||||||||||||||||||||||
| ids_file = Path(temp_dir) / "ids.csv" | ||||||||||||||||||||||||||||||
| ids_file.write_text("django__django-1\n", encoding="utf-8") | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| with self.assertRaises(FileOperationError): | ||||||||||||||||||||||||||||||
| self.dataset._load_instance_ids_file(str(ids_file)) | ||||||||||||||||||||||||||||||
|
Comment on lines
+24
to
+30
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 由于移除了对
Suggested change
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| def test_filter_instances_by_filter_spec_and_instance_ids(self): | ||||||||||||||||||||||||||||||
| instances = [ | ||||||||||||||||||||||||||||||
| {"instance_id": "django__django-1"}, | ||||||||||||||||||||||||||||||
| {"instance_id": "django__django-2"}, | ||||||||||||||||||||||||||||||
| {"instance_id": "sympy__sympy-1"}, | ||||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| filtered = self.dataset.filter_instances( | ||||||||||||||||||||||||||||||
| instances, | ||||||||||||||||||||||||||||||
| filter_spec=r"^django__", | ||||||||||||||||||||||||||||||
| instance_ids={"django__django-2", "sympy__sympy-1"}, | ||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| self.assertEqual(filtered, [{"instance_id": "django__django-2"}]) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| if __name__ == "__main__": | ||||||||||||||||||||||||||||||
| unittest.main() | ||||||||||||||||||||||||||||||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.txt后缀限制:限制instance_ids_file必须为.txt后缀是不必要的。用户可能会使用.log、.csv、.list或无后缀的文件。只要文件是纯文本且每行一个 ID,就应该允许读取。移除此限制可以提升通用性和用户体验。set(),导致后续过滤后的数据集为空,并在没有任何错误提示的情况下静默结束运行。建议在读取文件后,如果解析出的instance_ids为空,则抛出FileOperationError异常,以明确提示用户文件内容无效。There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加这种约束是为了避免太宽松导致的一些不确定性问题