fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg by jclee941 · Pull Request #2367 · The-PR-Agent/pr-agent

jclee941 · 2026-05-03T05:51:01Z

Summary

Six small stability fixes found during a static audit of pr_agent/. No behavior changes for callers; all 302 unit tests pass on this branch (pytest tests/unittest).

Fixes

1. File descriptor leak in `git_providers/utils.py`

apply_repo_settings() calls tempfile.mkstemp() and writes to the returned fd but never closes it before os.remove(). Under sustained load (every PR creates a new fd) this can exhaust file descriptors. Wrap the write in try/finally: os.close(fd).

2. Unsafe `open()` in `servers/bitbucket_app.py`

handle_manifest() reads atlassian-connect.json with open(...).read() — the file handle is only freed by GC. Switch to with open(...) as f.

3. Mutable default argument in `algo/token_handler.py`

TokenHandler.__init__(..., vars: dict = {}) — the empty dict is shared across all instances. While the current code never mutates vars, this is a latent footgun. Switch to vars: dict = None + if vars is None: vars = {}.

4. Missing request timeouts (4 sites)

algo/utils.py — requests.get(RATE_LIMIT_URL, ...) (×2, called from a retry loop)
servers/bitbucket_app.py — Bitbucket commits API
servers/github_polling.py — GitHub PR comments fetch (was inside an async function)
git_providers/gerrit_provider.py — Gerrit patch upload POST

All four were susceptible to indefinite hangs on socket/proxy issues. Added timeout=10 (rate-limit) / timeout=30 (others).

5. `bare except:` blocks (×3)

algo/utils.py and servers/bitbucket_app.py had except: clauses that swallowed every error — including KeyboardInterrupt and SystemExit — without logging. Replaced with except Exception: + exc_info=True so failures surface in logs.

Test

$ pytest tests/unittest -q
302 passed in 3.64s

Files changed

pr_agent/algo/token_handler.py
pr_agent/algo/utils.py
pr_agent/git_providers/gerrit_provider.py
pr_agent/git_providers/utils.py
pr_agent/servers/bitbucket_app.py
pr_agent/servers/github_polling.py

Each commit is independently reviewable:

38f43b2c — fd leak, context manager, timeouts
c9135346 — bare except + rate-limit timeout + mutable default arg

Notes

Discovered during a stabilization audit of a private fork. There are ~60 more bare except: / except Exception: pass sites in pr_agent/ and ~10 string-concat-in-loops; happy to send follow-up PRs if useful.

…anager - pr_agent/git_providers/utils.py: close mkstemp fd before remove (prevents fd leak under load when applying repo settings). - pr_agent/servers/bitbucket_app.py: open atlassian-connect.json with context manager; add timeout=30 on bitbucket commits API; replace bare except. - pr_agent/servers/github_polling.py: add timeout=30 on GitHub PR comments fetch (was hanging connection risk). - pr_agent/git_providers/gerrit_provider.py: add timeout=30 on patch upload POST. Identified during full-project stabilization audit.

- pr_agent/algo/token_handler.py: TokenHandler.__init__ used vars: dict = {} as default, which is a shared mutable across instances. Switch to None sentinel + assignment inside the function. - pr_agent/algo/utils.py: get_rate_limit_status / validate_and_await_rate_limit used bare except: that swallowed all errors silently and called requests.get with no timeout. Use except Exception: + exc_info logging and timeout=10s on both rate-limit GET calls. Found during full-project stabilization audit.

qodo-free-for-open-source-projects · 2026-05-03T05:51:17Z

Review Summary by Qodo

Fix stability issues: fd leak, timeouts, bare excepts, mutable defaults

🐞 Bug fix

Walkthroughs

Description

• Close file descriptor leak in mkstemp() call before file removal
• Add 10-30 second timeouts to four unprotected HTTP requests
• Replace bare except: blocks with except Exception: for proper error handling
• Fix mutable default argument in TokenHandler.__init__() using None sentinel
• Use context manager for file operations to ensure proper resource cleanup

Diagram

flowchart LR
  A["Resource Leaks"] --> B["Close mkstemp fd"]
  A --> C["Use context manager"]
  D["Network Hangs"] --> E["Add request timeouts"]
  F["Error Handling"] --> G["Replace bare except"]
  H["Mutable State"] --> I["Fix default args"]
  B --> J["Stability Improvements"]
  C --> J
  E --> J
  G --> J
  I --> J

File Changes

1. pr_agent/algo/token_handler.py 🐞 Bug fix +3/-1

Fix mutable default argument in TokenHandler

• Changed vars: dict = {} default parameter to vars: dict = None
• Added None check with assignment inside __init__() to prevent mutable default argument footgun
• Ensures each instance gets its own dictionary instead of sharing across instances

pr_agent/algo/token_handler.py

2. pr_agent/algo/utils.py 🐞 Bug fix +6/-5

Add timeouts and fix error handling in rate limit functions

• Added timeout=10 to both requests.get() calls in get_rate_limit_status()
• Replaced bare except: with except Exception: in both rate-limit functions
• Added exc_info=True logging parameter to surface failures in logs
• Added warning log message when rate limit check fails before retry

pr_agent/algo/utils.py

3. pr_agent/git_providers/gerrit_provider.py 🐞 Bug fix +2/-1

Add timeout to Gerrit patch upload POST request
• Added timeout=30 parameter to requests.post() call for patch upload
• Prevents indefinite hangs on socket or proxy issues during Gerrit patch uploads
pr_agent/git_providers/gerrit_provider.py

View more (3)

4. pr_agent/git_providers/utils.py 🐞 Bug fix +4/-1

Close mkstemp file descriptor before removal

• Wrapped os.write(fd, repo_settings) in try/finally block
• Added os.close(fd) in finally clause to ensure file descriptor is closed
• Prevents file descriptor leak under sustained load when applying repo settings

pr_agent/git_providers/utils.py

5. pr_agent/servers/bitbucket_app.py 🐞 Bug fix +5/-4

Fix file handling, add timeout, improve error handling

• Changed open().read() to context manager with open() as f: pattern
• Added timeout=30 to requests.get() call for Bitbucket commits API
• Replaced bare except: with except Exception: in manifest handler
• Added exc_info=True logging to capture exception details

pr_agent/servers/bitbucket_app.py

6. pr_agent/servers/github_polling.py 🐞 Bug fix +1/-1

Add timeout to GitHub PR comments fetch request
• Added timeout=30 parameter to requests.get() call for GitHub PR comments fetch
• Prevents indefinite hangs when fetching previous comments in PR validation
pr_agent/servers/github_polling.py

qodo-free-for-open-source-projects · 2026-05-03T05:51:19Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1)

1. Rate-limit retry not executed 🐞 Bug ☼ Reliability

Description

In get_rate_limit_status(), the initial requests.get() happens before the try/except, so
timeouts/connection failures bypass the intended "retry once" logic and warning log. With the new
timeout=10, this becomes a likely runtime failure mode (and upstream callers either crash or skip
rate-limit validation depending on which wrapper calls it).

Code

pr_agent/algo/utils.py[R1207-1217]

+    response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
    try:
        rate_limit_info = response.json()
        if rate_limit_info.get('message') == 'Rate limiting is not enabled.':  # for github enterprise
            return {'resources': {}}
        response.raise_for_status()  # Check for HTTP errors
-    except:  # retry
+    except Exception:  # retry
+        get_logger().warning("Rate limit check failed, retrying once", exc_info=True)
        time.sleep(0.1)
-        response = requests.get(RATE_LIMIT_URL, headers=HEADERS)
+        response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
        return response.json()

Evidence

Python only catches exceptions raised inside the try block; since requests.get() is executed on the
line before try:, a Timeout/ConnectionError will skip the warning+retry code entirely. Separately,
validate_rate_limit_github() treats exceptions as success (returns True), so failures can silently
disable rate-limit protection for callers that use that helper.

pr_agent/algo/utils.py[1198-1218]
pr_agent/algo/utils.py[1221-1237]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`get_rate_limit_status()` intends to retry once and log on failure, but `requests.get()` is executed before the `try:` so request-layer exceptions (timeout/connection) are not caught and the retry/log never run.

### Issue Context
This regression becomes more visible after adding `timeout=10` because timeouts will now be raised rather than hanging.

### Fix Focus Areas
- pr_agent/algo/utils.py[1198-1218]

### Suggested change
- Move the initial `requests.get(...)` into the `try:` block, or add an outer `try/except requests.RequestException` around both GET calls.
- On request exceptions, log once (with `exc_info=True`), sleep, retry once, and if the retry fails either:
 - re-raise a clear exception, or
 - return a safe sentinel value and let callers handle it (but don’t silently treat it as “rate limit OK” unless that is explicitly desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Hardcoded requests timeouts 📘 Rule violation ⚙ Maintainability

Description

The new requests.get()/requests.post() calls hard-code timeout values (10/30) in source,
making runtime behavior non-configurable via Dynaconf overrides. This can hinder tuning across
environments and violates the project's configuration-override requirement.

Code

pr_agent/algo/utils.py[R1207-1216]

+    response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
    try:
        rate_limit_info = response.json()
        if rate_limit_info.get('message') == 'Rate limiting is not enabled.':  # for github enterprise
            return {'resources': {}}
        response.raise_for_status()  # Check for HTTP errors
-    except:  # retry
+    except Exception:  # retry
+        get_logger().warning("Rate limit check failed, retrying once", exc_info=True)
        time.sleep(0.1)
-        response = requests.get(RATE_LIMIT_URL, headers=HEADERS)
+        response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)

Evidence
PR Compliance ID 6 requires runtime-tunable behavior to be configurable via .pr_agent.toml /
pr_agent/settings/*.toml rather than hard-coded in Python. This PR introduces hard-coded HTTP
timeout values in multiple modified call sites.
AGENTS.md
pr_agent/algo/utils.py[1207-1216]
pr_agent/git_providers/gerrit_provider.py[165-170]
pr_agent/servers/bitbucket_app.py[101-107]
pr_agent/servers/github_polling.py[116-118]
pr_agent/settings/configuration.toml[5-30]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Hard-coded HTTP timeout values were introduced in Python source (`timeout=10` / `timeout=30`). Per compliance, runtime-tunable behavior should be configurable via Dynaconf (`.pr_agent.toml` / `pr_agent/settings/*.toml`) with sensible defaults.

## Issue Context
These timeout values may need to vary by environment (local dev vs CI vs enterprise proxies). Centralizing them in settings preserves flexibility and reduces future code churn.

## Fix Focus Areas
- pr_agent/settings/configuration.toml[5-30]
- pr_agent/algo/utils.py[1207-1216]
- pr_agent/git_providers/gerrit_provider.py[165-170]
- pr_agent/servers/bitbucket_app.py[101-107]
- pr_agent/servers/github_polling.py[116-118]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Blocking request in FastAPI 🐞 Bug ➹ Performance

Description

_validate_time_from_last_commit_to_pr_update() is async but calls synchronous requests.get(),
blocking the FastAPI event loop thread during network I/O. With timeout=30, a single webhook can
stall the server coroutine for up to 30 seconds, reducing overall request throughput.

Code

pr_agent/servers/bitbucket_app.py[105]
+        response = requests.get(commits_api, headers=headers, timeout=30)

Evidence
The Bitbucket webhook path calls an async def validator, but the modified line uses
requests.get(...) (sync) inside that coroutine, which blocks until completion/timeout.
pr_agent/servers/bitbucket_app.py[87-138]
pr_agent/servers/bitbucket_app.py[140-159]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A synchronous `requests.get()` is used inside an `async def` in the Bitbucket FastAPI app, blocking the event loop during the HTTP request.

### Issue Context
This validator is awaited from the webhook handling flow, so blocking here reduces concurrency for all in-flight webhook requests.

### Fix Focus Areas
- pr_agent/servers/bitbucket_app.py[87-138]

### Suggested change
- Use `aiohttp` for the commits API call, e.g. `async with aiohttp.ClientSession() as session: ... await session.get(..., timeout=ClientTimeout(total=30))`.
- Optionally reuse a single ClientSession stored on app state to avoid creating a new session per webhook.
- Keep existing status-code checks and JSON parsing, but convert parsing to `await resp.json()` and handle non-200 responses similarly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-05-03T05:53:39Z

+    response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
    try:
        rate_limit_info = response.json()
        if rate_limit_info.get('message') == 'Rate limiting is not enabled.':  # for github enterprise
            return {'resources': {}}
        response.raise_for_status()  # Check for HTTP errors
-    except:  # retry
+    except Exception:  # retry
+        get_logger().warning("Rate limit check failed, retrying once", exc_info=True)
        time.sleep(0.1)
-        response = requests.get(RATE_LIMIT_URL, headers=HEADERS)
+        response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
        return response.json()


1. Rate-limit retry not executed 🐞 Bug ☼ Reliability

In get_rate_limit_status(), the initial requests.get() happens before the try/except, so timeouts/connection failures bypass the intended "retry once" logic and warning log. With the new timeout=10, this becomes a likely runtime failure mode (and upstream callers either crash or skip rate-limit validation depending on which wrapper calls it).

Agent Prompt

### Issue description `get_rate_limit_status()` intends to retry once and log on failure, but `requests.get()` is executed before the `try:` so request-layer exceptions (timeout/connection) are not caught and the retry/log never run. ### Issue Context This regression becomes more visible after adding `timeout=10` because timeouts will now be raised rather than hanging. ### Fix Focus Areas - pr_agent/algo/utils.py[1198-1218] ### Suggested change - Move the initial `requests.get(...)` into the `try:` block, or add an outer `try/except requests.RequestException` around both GET calls. - On request exceptions, log once (with `exc_info=True`), sleep, retry once, and if the retry fails either: - re-raise a clear exception, or - return a safe sentinel value and let callers handle it (but don’t silently treat it as “rate limit OK” unless that is explicitly desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

jclee added 2 commits May 3, 2026 14:49

github-actions Bot added the bug label May 3, 2026

qodo-free-for-open-source-projects Bot reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg#2367

fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg#2367
jclee941 wants to merge 2 commits intoThe-PR-Agent:mainfrom
jclee941:fix/stability-mkstemp-fd-and-timeouts

jclee941 commented May 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects Bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jclee941 commented May 3, 2026

Summary

Fixes

1. File descriptor leak in git_providers/utils.py

2. Unsafe open() in servers/bitbucket_app.py

3. Mutable default argument in algo/token_handler.py

4. Missing request timeouts (4 sites)

5. bare except: blocks (×3)

Test

Files changed

Notes

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-free-for-open-source-projects Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. File descriptor leak in `git_providers/utils.py`

2. Unsafe `open()` in `servers/bitbucket_app.py`

3. Mutable default argument in `algo/token_handler.py`

5. `bare except:` blocks (×3)

qodo-free-for-open-source-projects Bot commented May 3, 2026 •

edited

Loading