Tune gunicorn lifecycle for streaming requests by wilsonccccc · Pull Request #101 · TensorBlock/forge

wilsonccccc · 2026-05-16T17:46:31Z

Summary

Move Gunicorn runtime settings into gunicorn_conf.py so lifecycle parameters are explicit and env-overridable.
Increase request and graceful shutdown timeouts to 300s for long provider streams.
Disable max_requests worker recycling by default to avoid interrupting streaming /v1/chat/completions requests.

Context

Railway HTTP logs showed many /v1/chat/completions 502s with upstream connection closed unexpectedly. The old deployment also showed repeated Maximum request limit ... Terminating process events, which can recycle workers while streaming requests are active.

Validation

python3 -m py_compile gunicorn_conf.py
Loaded gunicorn_conf.py locally and confirmed defaults: timeout=300, graceful_timeout=300, keepalive=75, max_requests=0, max_requests_jitter=0.
git diff --check HEAD~1..HEAD

Tune gunicorn lifecycle for streaming requests

0eb586b

wilsonccccc marked this pull request as ready for review May 16, 2026 17:46

wilsonccccc merged commit db6c9f5 into TensorBlock:main May 16, 2026
3 checks passed

wilsonccccc mentioned this pull request May 16, 2026

Revert misplaced gunicorn lifecycle change #102

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune gunicorn lifecycle for streaming requests#101

Tune gunicorn lifecycle for streaming requests#101
wilsonccccc merged 1 commit into
TensorBlock:mainfrom
wilsonccccc:codex/gunicorn-streaming-lifecycle

wilsonccccc commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wilsonccccc commented May 16, 2026

Summary

Context

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant