Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
9c4ef63
feat: add token ids
sdevare-nv Nov 16, 2025
d48bba8
feat: add eval_output_dir
sdevare-nv Nov 16, 2025
2efc3e1
feat: add selected_id
sdevare-nv Nov 16, 2025
9f1a06c
feat: add instance_dict
sdevare-nv Nov 17, 2025
3745b1c
feat: add cookie
sdevare-nv Nov 17, 2025
533eb9f
feat: add provider_specific_fields to final dict
sdevare-nv Nov 17, 2025
08e8ede
fix: token ids for non tool call
sdevare-nv Nov 18, 2025
c72f49f
feat: instance dict path
sdevare-nv Nov 18, 2025
3aa8e44
fix: typo
sdevare-nv Nov 18, 2025
3bfa44e
fix: loading jsonl
sdevare-nv Nov 18, 2025
83b2fb0
feat: remove valid sample check
sdevare-nv Nov 18, 2025
6fdad88
feat: context length
sdevare-nv Nov 18, 2025
abf5f81
feat: nv-internal
sdevare-nv Nov 30, 2025
9f5f711
feat: r2e support
sdevare-nv Dec 2, 2025
e796aed
feat: internal support
sdevare-nv Dec 8, 2025
220c02e
feat: internal env fix
sdevare-nv Dec 8, 2025
9c63cc1
feat: add config file path
sdevare-nv Dec 10, 2025
65759f7
feat: reduce timeout
sdevare-nv Dec 10, 2025
ea8a053
feat: action runtime error return to model
sdevare-nv Dec 15, 2025
932d65f
feat: kill process
sdevare-nv Dec 16, 2025
ce013b8
feat: send only last message
sdevare-nv Jan 5, 2026
5be7ab2
feat: increase local runtime
sdevare-nv Jan 9, 2026
e1e4629
feat: increase port finding attempts
sdevare-nv Jan 9, 2026
0656b96
feat: init combine commands
sdevare-nv Jan 9, 2026
52b7805
feat: remove verison control
sdevare-nv Jan 9, 2026
8e67695
feat: add version control to run infer
sdevare-nv Jan 9, 2026
38386b0
feat: speedup server spinup
sdevare-nv Jan 12, 2026
3016535
feat: add print
sdevare-nv Jan 12, 2026
1f0f772
feat: reduce server start time
sdevare-nv Jan 13, 2026
74ab9c4
feat: cap max timeout
sdevare-nv Jan 13, 2026
94851f5
feat: blocking check
sdevare-nv Jan 14, 2026
7af1058
feat: fix run_infer logic
sdevare-nv Jan 16, 2026
e4bef47
feat: add action latency
sdevare-nv Jan 21, 2026
8acdde3
feat: add kill blocklist
sdevare-nv Jan 22, 2026
f12c3fb
feat: add atomic write for traj json
sdevare-nv Jan 22, 2026
dfd04f4
feat: add tmux memory monitor
sdevare-nv Jan 23, 2026
d6bd12e
feat: add validation failure action
sdevare-nv Feb 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions evaluation/benchmarks/swe_bench/prompts/swe_default.j2
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
<uploaded_files>
/workspace/{{ workspace_dir_name }}
{{ workspace_path }}
</uploaded_files>

I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
{% set language = instance.repo_language | default('python') %}
I've uploaded a {{ language }} code repository in the directory {{ workspace_path }}. Consider the following issue description:

<issue_description>
{{ instance.problem_statement }}
</issue_description>

Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
Also the development environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the {{ workspace_path }} directory to ensure the <issue_description> is satisfied.

Follow these phases to resolve the issue:

Expand Down
2 changes: 1 addition & 1 deletion evaluation/benchmarks/swe_bench/prompts/swe_gpt4.j2
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Your thinking should be thorough and so it's fine if it's very long. You can thi

You MUST iterate and keep going until the problem is solved.

You already have everything you need to solve this problem in the /workspace/{{ workspace_dir_name }} folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.
You already have everything you need to solve this problem in the {{ workspace_path }} folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.

Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct.
NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
Expand Down
6 changes: 3 additions & 3 deletions evaluation/benchmarks/swe_bench/prompts/swt.j2
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<uploaded_files>
/workspace/{{ workspace_dir_name }}
{{ workspace_path }}
</uploaded_files>
I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
I've uploaded a python code repository in the directory {{ workspace_path }}. Consider the following issue description:

<issue_description>
{{ instance.problem_statement }}
Expand All @@ -10,7 +10,7 @@ I've uploaded a python code repository in the directory {{ workspace_dir_name }}

Can you help me implement the necessary changes to the repository to test whether the issue in <issue_description> was resolved?
I will take care of all changes to any of the non-test files. This means you DON'T have to modify the actual logic and ONLY have to update test logic and tests!
Your task is to make the minimal changes to tests files in the /workspace directory to reproduce the issue in the <issue_description>, i.e., such that the generated tests fail in the current state (where the issue is unresolved) and pass when the issue will be resolved.
Your task is to make the minimal changes to tests files in the {{ workspace_path }} directory to reproduce the issue in the <issue_description>, i.e., such that the generated tests fail in the current state (where the issue is unresolved) and pass when the issue will be resolved.
Follow these steps to reproduce the issue:
1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.
2. Create a script `reproduction.py` to reproduce the error and execute it with `python reproduction.py` using the BashTool, to confirm the error
Expand Down
Loading
Loading