Skip to content

Tune gunicorn lifecycle for streaming requests#101

Merged
wilsonccccc merged 1 commit into
TensorBlock:mainfrom
wilsonccccc:codex/gunicorn-streaming-lifecycle
May 16, 2026
Merged

Tune gunicorn lifecycle for streaming requests#101
wilsonccccc merged 1 commit into
TensorBlock:mainfrom
wilsonccccc:codex/gunicorn-streaming-lifecycle

Conversation

@wilsonccccc
Copy link
Copy Markdown
Contributor

Summary

  • Move Gunicorn runtime settings into gunicorn_conf.py so lifecycle parameters are explicit and env-overridable.
  • Increase request and graceful shutdown timeouts to 300s for long provider streams.
  • Disable max_requests worker recycling by default to avoid interrupting streaming /v1/chat/completions requests.

Context

Railway HTTP logs showed many /v1/chat/completions 502s with upstream connection closed unexpectedly. The old deployment also showed repeated Maximum request limit ... Terminating process events, which can recycle workers while streaming requests are active.

Validation

  • python3 -m py_compile gunicorn_conf.py
  • Loaded gunicorn_conf.py locally and confirmed defaults: timeout=300, graceful_timeout=300, keepalive=75, max_requests=0, max_requests_jitter=0.
  • git diff --check HEAD~1..HEAD

@wilsonccccc wilsonccccc marked this pull request as ready for review May 16, 2026 17:46
@wilsonccccc wilsonccccc merged commit db6c9f5 into TensorBlock:main May 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant