fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg#2367
fix(stability): close mkstemp fd, add request timeouts, fix mutable default arg#2367jclee941 wants to merge 2 commits intoThe-PR-Agent:mainfrom
Conversation
…anager - pr_agent/git_providers/utils.py: close mkstemp fd before remove (prevents fd leak under load when applying repo settings). - pr_agent/servers/bitbucket_app.py: open atlassian-connect.json with context manager; add timeout=30 on bitbucket commits API; replace bare except. - pr_agent/servers/github_polling.py: add timeout=30 on GitHub PR comments fetch (was hanging connection risk). - pr_agent/git_providers/gerrit_provider.py: add timeout=30 on patch upload POST. Identified during full-project stabilization audit.
- pr_agent/algo/token_handler.py: TokenHandler.__init__ used vars: dict = {}
as default, which is a shared mutable across instances. Switch to
None sentinel + assignment inside the function.
- pr_agent/algo/utils.py: get_rate_limit_status / validate_and_await_rate_limit
used bare except: that swallowed all errors silently and called
requests.get with no timeout. Use except Exception: + exc_info logging
and timeout=10s on both rate-limit GET calls.
Found during full-project stabilization audit.
Review Summary by QodoFix stability issues: fd leak, timeouts, bare excepts, mutable defaults
WalkthroughsDescription• Close file descriptor leak in mkstemp() call before file removal • Add 10-30 second timeouts to four unprotected HTTP requests • Replace bare except: blocks with except Exception: for proper error handling • Fix mutable default argument in TokenHandler.__init__() using None sentinel • Use context manager for file operations to ensure proper resource cleanup Diagramflowchart LR
A["Resource Leaks"] --> B["Close mkstemp fd"]
A --> C["Use context manager"]
D["Network Hangs"] --> E["Add request timeouts"]
F["Error Handling"] --> G["Replace bare except"]
H["Mutable State"] --> I["Fix default args"]
B --> J["Stability Improvements"]
C --> J
E --> J
G --> J
I --> J
File Changes1. pr_agent/algo/token_handler.py
|
Code Review by Qodo
1. Rate-limit retry not executed
|
| response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10) | ||
| try: | ||
| rate_limit_info = response.json() | ||
| if rate_limit_info.get('message') == 'Rate limiting is not enabled.': # for github enterprise | ||
| return {'resources': {}} | ||
| response.raise_for_status() # Check for HTTP errors | ||
| except: # retry | ||
| except Exception: # retry | ||
| get_logger().warning("Rate limit check failed, retrying once", exc_info=True) | ||
| time.sleep(0.1) | ||
| response = requests.get(RATE_LIMIT_URL, headers=HEADERS) | ||
| response = requests.get(RATE_LIMIT_URL, headers=HEADERS, timeout=10) | ||
| return response.json() |
There was a problem hiding this comment.
1. Rate-limit retry not executed 🐞 Bug ☼ Reliability
In get_rate_limit_status(), the initial requests.get() happens before the try/except, so timeouts/connection failures bypass the intended "retry once" logic and warning log. With the new timeout=10, this becomes a likely runtime failure mode (and upstream callers either crash or skip rate-limit validation depending on which wrapper calls it).
Agent Prompt
### Issue description
`get_rate_limit_status()` intends to retry once and log on failure, but `requests.get()` is executed before the `try:` so request-layer exceptions (timeout/connection) are not caught and the retry/log never run.
### Issue Context
This regression becomes more visible after adding `timeout=10` because timeouts will now be raised rather than hanging.
### Fix Focus Areas
- pr_agent/algo/utils.py[1198-1218]
### Suggested change
- Move the initial `requests.get(...)` into the `try:` block, or add an outer `try/except requests.RequestException` around both GET calls.
- On request exceptions, log once (with `exc_info=True`), sleep, retry once, and if the retry fails either:
- re-raise a clear exception, or
- return a safe sentinel value and let callers handle it (but don’t silently treat it as “rate limit OK” unless that is explicitly desired).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary
Six small stability fixes found during a static audit of
pr_agent/. No behavior changes for callers; all 302 unit tests pass on this branch (pytest tests/unittest).Fixes
1. File descriptor leak in
git_providers/utils.pyapply_repo_settings()callstempfile.mkstemp()and writes to the returnedfdbut never closes it beforeos.remove(). Under sustained load (every PR creates a new fd) this can exhaust file descriptors. Wrap the write intry/finally: os.close(fd).2. Unsafe
open()inservers/bitbucket_app.pyhandle_manifest()readsatlassian-connect.jsonwithopen(...).read()— the file handle is only freed by GC. Switch towith open(...) as f.3. Mutable default argument in
algo/token_handler.pyTokenHandler.__init__(..., vars: dict = {})— the empty dict is shared across all instances. While the current code never mutatesvars, this is a latent footgun. Switch tovars: dict = None+if vars is None: vars = {}.4. Missing request timeouts (4 sites)
algo/utils.py—requests.get(RATE_LIMIT_URL, ...)(×2, called from a retry loop)servers/bitbucket_app.py— Bitbucket commits APIservers/github_polling.py— GitHub PR comments fetch (was inside an async function)git_providers/gerrit_provider.py— Gerrit patch upload POSTAll four were susceptible to indefinite hangs on socket/proxy issues. Added
timeout=10(rate-limit) /timeout=30(others).5.
bare except:blocks (×3)algo/utils.pyandservers/bitbucket_app.pyhadexcept:clauses that swallowed every error — includingKeyboardInterruptandSystemExit— without logging. Replaced withexcept Exception:+exc_info=Trueso failures surface in logs.Test
$ pytest tests/unittest -q 302 passed in 3.64sFiles changed
pr_agent/algo/token_handler.pypr_agent/algo/utils.pypr_agent/git_providers/gerrit_provider.pypr_agent/git_providers/utils.pypr_agent/servers/bitbucket_app.pypr_agent/servers/github_polling.pyEach commit is independently reviewable:
38f43b2c— fd leak, context manager, timeoutsc9135346— bare except + rate-limit timeout + mutable default argNotes
Discovered during a stabilization audit of a private fork. There are ~60 more
bare except:/except Exception: passsites inpr_agent/and ~10 string-concat-in-loops; happy to send follow-up PRs if useful.