Skip to content

Sync main: bulk kanban create tool + FD-leak zombie-shutdown fixes#1

Merged
rephapeng merged 5 commits into
mainfrom
feat/sync-main-kanban-fd-leak
Jun 24, 2026
Merged

Sync main: bulk kanban create tool + FD-leak zombie-shutdown fixes#1
rephapeng merged 5 commits into
mainfrom
feat/sync-main-kanban-fd-leak

Conversation

@rephapeng

Copy link
Copy Markdown
Owner

Summary

Brings the integrated work from local main into the fork's main. Combines two independent changes (also proposed upstream as anvie#80 and anvie#81):

feat: kanban_bulk_create_tasks tool

Create multiple Kanban tasks in one call. Removes the failure where the model hallucinated a non-existent bulk-create tool, got rejected by the quality monitor, and then falsely reported a "board/registry error". Supports partial-on-error and intra-batch dependencies via depends_on_index.

fix: FD-leak self-shutdown becoming a zombie

  • app.pyteardown_request now also closes the thread-local api_rate_limit / rate_limit SQLite connections (3 FDs each in WAL mode). These leaked per request thread (~180 FDs on api_rate_limit.db), tripping the FD watchdog. Verified flat at 8 handles across 70+ requests.
  • runtime.py_signal_handler arms an os._exit(0) daemon backstop before sys.exit(0). sys.exit only raises SystemExit in the main thread, which the threaded WSGI accept loop swallows, leaving a half-dead process that systemd never restarts.
  • ssh_backend.py — resolve the remote evonic dir against the REMOTE $HOME instead of os.path.expanduser (local HOME), fixing the original SFTP permission-denied retry loop that leaked sockets.

Testing

  • Both Python modules parse; tools.json is valid JSON
  • Live: restarted the service, hammered 70+ /api/* requests, confirmed FD count stays bounded and queue workers/scheduler recover

When asked to create several tasks at once, the model reaches for a single bulk
call. Without such a tool it hallucinates `kanban_bulk_create_tasks`; the quality
monitor rejects it, and after the 2-correction cap the model often gives up and
falsely reports a "board/registry error server-side" — so the tasks never get
created (observed with the 9B/9C/9D plan).

Provide the real tool. It wraps kanban_create_task.execute per item (identical
permission/validation), creates partial-on-error, and supports intra-batch
dependencies via depends_on_index (1-based index of an earlier task in the same
batch) so e.g. 9C/9D can depend on 9B before its id exists.
Three fixes for the recurring "evonic mati sendiri" (FD watchdog SIGTERMs
at fd>400) where the process survived as a half-dead zombie:

1. app.py teardown_request: also close the thread-local SQLite connections
   for api_rate_limit and rate_limit. The before_request rate-limit check
   opened a per-thread connection (3 FDs each in WAL mode) on every /api/*
   request; with Flask's thread-per-request these accumulated until GC
   (~180 FDs on api_rate_limit.db alone) — the dominant FD-leak source now
   that the SFTP loop is fixed. Mirrors the existing db.close() pattern.
   Verified flat at 8 handles across 70+ requests (was growing unbounded).

2. runtime._signal_handler: arm a daemon hard-exit backstop before sys.exit.
   sys.exit() only raises SystemExit in the main thread; when SIGTERM lands
   while the threaded WSGI server blocks in its accept loop, the server
   swallows SystemExit — runtime drains but the process keeps serving.
   systemd still sees it active so Restart=always never fires. os._exit
   backstop guarantees the restart after the graceful attempt.

3. ssh_backend: resolve _REMOTE_EVONIC_DIR's ~ against the REMOTE $HOME
   instead of os.path.expanduser (local HOME) — the original SFTP
   permission-denied retry loop that leaked sockets (root-cause fix,
   previously uncommitted).
Brings the integrated work from local main into the fork:
- feat: kanban_bulk_create_tasks tool (batch task creation)
- fix: FD-leak self-shutdown zombie (rate-limit conn close, SIGTERM
  hard-exit backstop, ssh remote-HOME resolution)
@rephapeng rephapeng merged commit f5f6ba0 into main Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant