Skip to content

fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg#2367

Open
jclee941 wants to merge 2 commits intoThe-PR-Agent:mainfrom
jclee941:fix/stability-mkstemp-fd-and-timeouts
Open

fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg#2367
jclee941 wants to merge 2 commits intoThe-PR-Agent:mainfrom
jclee941:fix/stability-mkstemp-fd-and-timeouts

Conversation

@jclee941
Copy link
Copy Markdown

@jclee941 jclee941 commented May 3, 2026

Summary

Six small stability fixes found during a static audit of pr_agent/. No behavior changes for callers; all 302 unit tests pass on this branch (pytest tests/unittest).

Fixes

1. File descriptor leak in git_providers/utils.py

apply_repo_settings() calls tempfile.mkstemp() and writes to the returned fd but never closes it before os.remove(). Under sustained load (every PR creates a new fd) this can exhaust file descriptors. Wrap the write in try/finally: os.close(fd).

2. Unsafe open() in servers/bitbucket_app.py

handle_manifest() reads atlassian-connect.json with open(...).read() — the file handle is only freed by GC. Switch to with open(...) as f.

3. Mutable default argument in algo/token_handler.py

TokenHandler.__init__(..., vars: dict = {}) — the empty dict is shared across all instances. While the current code never mutates vars, this is a latent footgun. Switch to vars: dict = None + if vars is None: vars = {}.

4. Missing request timeouts (4 sites)

  • algo/utils.pyrequests.get(RATE_LIMIT_URL, ...) (×2, called from a retry loop)
  • servers/bitbucket_app.py — Bitbucket commits API
  • servers/github_polling.py — GitHub PR comments fetch (was inside an async function)
  • git_providers/gerrit_provider.py — Gerrit patch upload POST

All four were susceptible to indefinite hangs on socket/proxy issues. Added timeout=10 (rate-limit) / timeout=30 (others).

5. bare except: blocks (×3)

algo/utils.py and servers/bitbucket_app.py had except: clauses that swallowed every error — including KeyboardInterrupt and SystemExit — without logging. Replaced with except Exception: + exc_info=True so failures surface in logs.

Test

$ pytest tests/unittest -q
302 passed in 3.64s

Files changed

  • pr_agent/algo/token_handler.py
  • pr_agent/algo/utils.py
  • pr_agent/git_providers/gerrit_provider.py
  • pr_agent/git_providers/utils.py
  • pr_agent/servers/bitbucket_app.py
  • pr_agent/servers/github_polling.py

Each commit is independently reviewable:

  • 38f43b2c — fd leak, context manager, timeouts
  • c9135346 — bare except + rate-limit timeout + mutable default arg

Notes

Discovered during a stabilization audit of a private fork. There are ~60 more bare except: / except Exception: pass sites in pr_agent/ and ~10 string-concat-in-loops; happy to send follow-up PRs if useful.

jclee added 2 commits May 3, 2026 14:49
…anager

- pr_agent/git_providers/utils.py: close mkstemp fd before remove (prevents
  fd leak under load when applying repo settings).
- pr_agent/servers/bitbucket_app.py: open atlassian-connect.json with context
  manager; add timeout=30 on bitbucket commits API; replace bare except.
- pr_agent/servers/github_polling.py: add timeout=30 on GitHub PR comments
  fetch (was hanging connection risk).
- pr_agent/git_providers/gerrit_provider.py: add timeout=30 on patch upload
  POST.

Identified during full-project stabilization audit.
- pr_agent/algo/token_handler.py: TokenHandler.__init__ used vars: dict = {}
  as default, which is a shared mutable across instances. Switch to
  None sentinel + assignment inside the function.
- pr_agent/algo/utils.py: get_rate_limit_status / validate_and_await_rate_limit
  used bare except: that swallowed all errors silently and called
  requests.get with no timeout. Use except Exception: + exc_info logging
  and timeout=10s on both rate-limit GET calls.

Found during full-project stabilization audit.
@github-actions github-actions Bot added the bug label May 3, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Fix stability issues: fd leak, timeouts, bare excepts, mutable defaults

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Close file descriptor leak in mkstemp() call before file removal
• Add 10-30 second timeouts to four unprotected HTTP requests
• Replace bare except: blocks with except Exception: for proper error handling
• Fix mutable default argument in TokenHandler.__init__() using None sentinel
• Use context manager for file operations to ensure proper resource cleanup
Diagram
flowchart LR
  A["Resource Leaks"] --> B["Close mkstemp fd"]
  A --> C["Use context manager"]
  D["Network Hangs"] --> E["Add request timeouts"]
  F["Error Handling"] --> G["Replace bare except"]
  H["Mutable State"] --> I["Fix default args"]
  B --> J["Stability Improvements"]
  C --> J
  E --> J
  G --> J
  I --> J
Loading

Grey Divider

File Changes

1. pr_agent/algo/token_handler.py 🐞 Bug fix +3/-1

Fix mutable default argument in TokenHandler

• Changed vars: dict = {} default parameter to vars: dict = None
• Added None check with assignment inside __init__() to prevent mutable default argument footgun
• Ensures each instance gets its own dictionary instead of sharing across instances

pr_agent/algo/token_handler.py


2. pr_agent/algo/utils.py 🐞 Bug fix +6/-5

Add timeouts and fix error handling in rate limit functions

• Added timeout=10 to both requests.get() calls in get_rate_limit_status()
• Replaced bare except: with except Exception: in both rate-limit functions
• Added exc_info=True logging parameter to surface failures in logs
• Added warning log message when rate limit check fails before retry

pr_agent/algo/utils.py


3. pr_agent/git_providers/gerrit_provider.py 🐞 Bug fix +2/-1

Add timeout to Gerrit patch upload POST request

• Added timeout=30 parameter to requests.post() call for patch upload
• Prevents indefinite hangs on socket or proxy issues during Gerrit patch uploads

pr_agent/git_providers/gerrit_provider.py


View more (3)
4. pr_agent/git_providers/utils.py 🐞 Bug fix +4/-1

Close mkstemp file descriptor before removal

• Wrapped os.write(fd, repo_settings) in try/finally block
• Added os.close(fd) in finally clause to ensure file descriptor is closed
• Prevents file descriptor leak under sustained load when applying repo settings

pr_agent/git_providers/utils.py


5. pr_agent/servers/bitbucket_app.py 🐞 Bug fix +5/-4

Fix file handling, add timeout, improve error handling

• Changed open().read() to context manager with open() as f: pattern
• Added timeout=30 to requests.get() call for Bitbucket commits API
• Replaced bare except: with except Exception: in manifest handler
• Added exc_info=True logging to capture exception details

pr_agent/servers/bitbucket_app.py


6. pr_agent/servers/github_polling.py 🐞 Bug fix +1/-1

Add timeout to GitHub PR comments fetch request

• Added timeout=30 parameter to requests.get() call for GitHub PR comments fetch
• Prevents indefinite hangs when fetching previous comments in PR validation

pr_agent/servers/github_polling.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 3, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1)

Grey Divider


Action required

1. Rate-limit retry not executed 🐞 Bug ☼ Reliability
Description
In get_rate_limit_status(), the initial requests.get() happens before the try/except, so
timeouts/connection failures bypass the intended "retry once" logic and warning log. With the new
timeout=10, this becomes a likely runtime failure mode (and upstream callers either crash or skip
rate-limit validation depending on which wrapper calls it).
Code

pr_agent/algo/utils.py[R1207-1217]

+    response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
    try:
        rate_limit_info = response.json()
        if rate_limit_info.get('message') == 'Rate limiting is not enabled.':  # for github enterprise
            return {'resources': {}}
        response.raise_for_status()  # Check for HTTP errors
-    except:  # retry
+    except Exception:  # retry
+        get_logger().warning("Rate limit check failed, retrying once", exc_info=True)
        time.sleep(0.1)
-        response = requests.get(RATE_LIMIT_URL, headers=HEADERS)
+        response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
        return response.json()
Evidence
Python only catches exceptions raised inside the try block; since requests.get() is executed on the
line before try:, a Timeout/ConnectionError will skip the warning+retry code entirely. Separately,
validate_rate_limit_github() treats exceptions as success (returns True), so failures can silently
disable rate-limit protection for callers that use that helper.

pr_agent/algo/utils.py[1198-1218]
pr_agent/algo/utils.py[1221-1237]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`get_rate_limit_status()` intends to retry once and log on failure, but `requests.get()` is executed before the `try:` so request-layer exceptions (timeout/connection) are not caught and the retry/log never run.

### Issue Context
This regression becomes more visible after adding `timeout=10` because timeouts will now be raised rather than hanging.

### Fix Focus Areas
- pr_agent/algo/utils.py[1198-1218]

### Suggested change
- Move the initial `requests.get(...)` into the `try:` block, or add an outer `try/except requests.RequestException` around both GET calls.
- On request exceptions, log once (with `exc_info=True`), sleep, retry once, and if the retry fails either:
 - re-raise a clear exception, or
 - return a safe sentinel value and let callers handle it (but don’t silently treat it as “rate limit OK” unless that is explicitly desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Hardcoded requests timeouts 📘 Rule violation ⚙ Maintainability
Description
The new requests.get()/requests.post() calls hard-code timeout values (10/30) in source,
making runtime behavior non-configurable via Dynaconf overrides. This can hinder tuning across
environments and violates the project's configuration-override requirement.
Code

pr_agent/algo/utils.py[R1207-1216]

+    response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
    try:
        rate_limit_info = response.json()
        if rate_limit_info.get('message') == 'Rate limiting is not enabled.':  # for github enterprise
            return {'resources': {}}
        response.raise_for_status()  # Check for HTTP errors
-    except:  # retry
+    except Exception:  # retry
+        get_logger().warning("Rate limit check failed, retrying once", exc_info=True)
        time.sleep(0.1)
-        response = requests.get(RATE_LIMIT_URL, headers=HEADERS)
+        response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
Evidence
PR Compliance ID 6 requires runtime-tunable behavior to be configurable via .pr_agent.toml /
pr_agent/settings/*.toml rather than hard-coded in Python. This PR introduces hard-coded HTTP
timeout values in multiple modified call sites.

AGENTS.md
pr_agent/algo/utils.py[1207-1216]
pr_agent/git_providers/gerrit_provider.py[165-170]
pr_agent/servers/bitbucket_app.py[101-107]
pr_agent/servers/github_polling.py[116-118]
pr_agent/settings/configuration.toml[5-30]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Hard-coded HTTP timeout values were introduced in Python source (`timeout=10` / `timeout=30`). Per compliance, runtime-tunable behavior should be configurable via Dynaconf (`.pr_agent.toml` / `pr_agent/settings/*.toml`) with sensible defaults.

## Issue Context
These timeout values may need to vary by environment (local dev vs CI vs enterprise proxies). Centralizing them in settings preserves flexibility and reduces future code churn.

## Fix Focus Areas
- pr_agent/settings/configuration.toml[5-30]
- pr_agent/algo/utils.py[1207-1216]
- pr_agent/git_providers/gerrit_provider.py[165-170]
- pr_agent/servers/bitbucket_app.py[101-107]
- pr_agent/servers/github_polling.py[116-118]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Blocking request in FastAPI 🐞 Bug ➹ Performance
Description
_validate_time_from_last_commit_to_pr_update() is async but calls synchronous requests.get(),
blocking the FastAPI event loop thread during network I/O. With timeout=30, a single webhook can
stall the server coroutine for up to 30 seconds, reducing overall request throughput.
Code

pr_agent/servers/bitbucket_app.py[105]

+        response = requests.get(commits_api, headers=headers, timeout=30)
Evidence
The Bitbucket webhook path calls an async def validator, but the modified line uses
requests.get(...) (sync) inside that coroutine, which blocks until completion/timeout.

pr_agent/servers/bitbucket_app.py[87-138]
pr_agent/servers/bitbucket_app.py[140-159]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A synchronous `requests.get()` is used inside an `async def` in the Bitbucket FastAPI app, blocking the event loop during the HTTP request.

### Issue Context
This validator is awaited from the webhook handling flow, so blocking here reduces concurrency for all in-flight webhook requests.

### Fix Focus Areas
- pr_agent/servers/bitbucket_app.py[87-138]

### Suggested change
- Use `aiohttp` for the commits API call, e.g. `async with aiohttp.ClientSession() as session: ... await session.get(..., timeout=ClientTimeout(total=30))`.
- Optionally reuse a single ClientSession stored on app state to avoid creating a new session per webhook.
- Keep existing status-code checks and JSON parsing, but convert parsing to `await resp.json()` and handle non-200 responses similarly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread pr_agent/algo/utils.py
Comment on lines +1207 to 1217
response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
try:
rate_limit_info = response.json()
if rate_limit_info.get('message') == 'Rate limiting is not enabled.': # for github enterprise
return {'resources': {}}
response.raise_for_status() # Check for HTTP errors
except: # retry
except Exception: # retry
get_logger().warning("Rate limit check failed, retrying once", exc_info=True)
time.sleep(0.1)
response = requests.get(RATE_LIMIT_URL, headers=HEADERS)
response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10)
return response.json()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Rate-limit retry not executed 🐞 Bug ☼ Reliability

In get_rate_limit_status(), the initial requests.get() happens before the try/except, so
timeouts/connection failures bypass the intended "retry once" logic and warning log. With the new
timeout=10, this becomes a likely runtime failure mode (and upstream callers either crash or skip
rate-limit validation depending on which wrapper calls it).
Agent Prompt
### Issue description
`get_rate_limit_status()` intends to retry once and log on failure, but `requests.get()` is executed before the `try:` so request-layer exceptions (timeout/connection) are not caught and the retry/log never run.

### Issue Context
This regression becomes more visible after adding `timeout=10` because timeouts will now be raised rather than hanging.

### Fix Focus Areas
- pr_agent/algo/utils.py[1198-1218]

### Suggested change
- Move the initial `requests.get(...)` into the `try:` block, or add an outer `try/except requests.RequestException` around both GET calls.
- On request exceptions, log once (with `exc_info=True`), sleep, retry once, and if the retry fails either:
  - re-raise a clear exception, or
  - return a safe sentinel value and let callers handle it (but don’t silently treat it as “rate limit OK” unless that is explicitly desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant