fix: --version/--help flags + startup timing observability for #29 reconnect diagnostics#30
Open
kiki830621 wants to merge 3 commits intomainfrom
Open
fix: --version/--help flags + startup timing observability for #29 reconnect diagnostics#30kiki830621 wants to merge 3 commits intomainfrom
kiki830621 wants to merge 3 commits intomainfrom
Conversation
Diagnostic finding (#29): /mcp reconnect failures correlate with the binary's ≥10s cold-start latency. Health-check callers (Claude Code's reconnect probe, wrapper version checks, /mcp scan) currently pay full TDLib framework load even when they only need the version string. Solution scoped to che-msg: extract argv parsing into a pure CLIBootstrap struct in CheTelegramAllMCPCore, dispatch in main.swift BEFORE any TDLib / MCP server init. --version / -v now exit in 0.008s warm / 0.976s cold, versus ≥10s for the original full-init path. Caveats (per diagnosis): this fixes the health-check entry point, NOT the underlying reconnect timeout. Wrapper-side fast-path (skip GitHub API curl when version matches) and Claude Code reconnect timeout review remain as sister concerns out of this repo's scope. Changes: - New: Sources/CheTelegramAllMCPCore/CLIBootstrap.swift — CLIAction enum (.showVersion / .showHelp / .runServer) + parse(_:) pure function + version constant + helpText (mentions stdio JSON-RPC primary mode + CHE_TELEGRAM_LOG_STARTUP env var for upcoming S3) - Modified: Sources/CheTelegramAllMCP/main.swift — switch on CLIBootstrap.parse(CommandLine.arguments) before existing server flow - New: Tests/CheTelegramAllMCPTests/CLIBootstrapTests.swift — 10 unit tests covering long/short flag forms, no-args / unknown-args fallthrough, empty argv defensiveness, and helpText content invariants (version constant + stdio mention + env var name) Out of scope (will revisit in S3): startup timing log to stderr.
When CHE_TELEGRAM_LOG_STARTUP=1 is set, CheTelegramAllMCPServer.init emits four `[startup]` lines to stderr breaking down its cold-start latency: [startup] tdlib_init=N.NNNs # try await TDLibClient() [startup] tools_define=N.NNNs # Self.defineTools() [startup] register_handlers=N.NNNs# await registerHandlers() [startup] total=N.NNNs # whole init() block Default behavior unchanged (env unset → silent). stderr is the correct channel — stdout is reserved for MCP JSON-RPC traffic and any output there would corrupt the protocol. Why this is observability not fix (per #29 diagnosis): the reconnect failure manifests as Claude Code timing out before binary's JSON-RPC `initialize` response arrives (~10s on cold start in production). Without per-phase breakdown we cannot tell if TDLib framework load is the bottleneck, or something earlier (dynamic linker page-in of the 233MB universal binary). This log gives whoever investigates next the data they need. DispatchTime.now().uptimeNanoseconds chosen over Swift 6 ContinuousClock for: - simpler API (UInt64 subtraction vs. Duration.components) - no dependency on Swift 6 standard library refinements - well-defined wraparound semantics via &- (saturating subtract) Helper logStartupDuration is file-scope (not method) to avoid leaking into CheTelegramAllMCPServer's public API surface — it's a private diagnostic detail.
…patcher coverage (#29) verify-29 round (Agent Team + Codex GPT-5.5) found 4 P1 findings, 3 of which are addressed here (1 deferred as user-visible CLI behavior change requiring more deliberation): F1 (pr-test-analyzer P1 + Codex P1) — dispatcher in main.swift was untested. Pure parser unit tests don't catch a typo like swapping print(version) for print(help), or exit(0) → exit(1). Added 3 subprocess tests using Process API + Bundle(for:) path resolution to spawn the built binary and assert exit code + stdout for --version, -v, --help. Tests skip cleanly when binary not built. ~100ms per test, deterministic (does not enter TDLib path). F2 (silent-failure-hunter P1 + Codex P2) — TDLib init throw silenced the [startup] tdlib_init line. The user reaching for CHE_TELEGRAM_LOG_STARTUP=1 is exactly the user diagnosing slow init that may also throw — failure path observability is the higher-stakes case. Wrapped tdlib_init in do/catch that emits tdlib_init_FAILED + total_FAILED before re-throwing. tools_define and register_handlers don't throw so they keep simple guarded form. F4 (pr-test-analyzer P1) — env-gating predicate (== "1" check) was the production safety mechanism but lived inline in init, not unit-testable. Extracted as internal func shouldLogStartup(env:) accepting an env dict explicitly. Added 6 unit tests pinning strict "1" contract: true on "1", false on unset / "" / "0" / "true" / "TRUE". A typo like == "0" would now be caught at PR time, not after merge floods stderr. F3 (silent-failure-hunter P1, Codex did not flag) — DEFERRED. Reviewer proposed adding .unknownFlag(String) enum case to reject typos like --versoin. Disagreement between reviewers (code-reviewer accepted current fallthrough as documented behavior). Deferring to follow-up because: (a) it changes user-visible CLI behavior (existing harness scripts may pass benign --debug flags expecting them to be ignored), (b) requires choice on whether non-flag positionals (paths) should also be rejected. Tracking inline in verify report. Tests: 210 passed / 11 skipped (E2E creds) / 0 failures, up from 201. - 6 new shouldLogStartup unit tests - 3 new subprocess integration tests for --version/-v/--help logStartupDuration visibility changed from private to internal so tests in CheTelegramAllMCPTests target can reach shouldLogStartup helper from the same file (Swift internal == module-wide).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #29
Summary
/mcp顯示Failed to reconnect to plugin:che-telegram-mcp:telegram-all的根因是 binary cold start ≥10s,觸發 Claude Code 的 reconnect timeout。本 PR 在che-msg範圍內提供:--version/-vexit 0.008s warm,完全不進 TDLib init,讓 wrapper 與監控可低成本確認 binary 存活CHE_TELEGRAM_LOG_STARTUP=1→[startup] phase=N.NNNs至 stderr,含tdlib_init_FAILED/total_FAILED路徑(verify F2 修復)--help/-h列出 stdio JSON-RPC 為 primary mode + env var 文件Real production leverage:wrapper 安裝後可改用
$BINARY --version替代 GitHub API curl 健檢。Changes
fe0013b--version/-v/--help/-hflags + dispatcher + 10 unit tests3856b3f[startup]timing log inCheTelegramAllMCPServer.initc384f6dFiles(
origin/main..HEAD,4 files / +353 / -7):Sources/CheTelegramAllMCP/main.swift— argv switch dispatcherSources/CheTelegramAllMCPCore/CLIBootstrap.swift(NEW) —CLIActionenum +CLIBootstrapparser +helpText+versionconstantSources/CheTelegramAllMCPCore/Server.swift— env-gated startup timing(do/catch for failure-path),shouldLogStartuptestable predicateTests/CheTelegramAllMCPTests/CLIBootstrapTests.swift(NEW) — 19 tests(10 parser + 6 env predicate + 3 subprocess)Verification
6-AI cross-model verification PASS(3 specialized Claude reviewers via
pr-review-toolkit+ Codex GPT-5.5 xhigh,共 4 cross-checks across 2 model families)。詳見 issue #29 的 Verify Report。c384f6d· 1/4 P1 deferred(F3,reviewer 意見分歧)· 多個 P2/P3 documented as follow-ups--version0.008s warm(原 ≥10s)· env-gated startup log 雙向驗證(env=1 → 4 lines stderr · env unset → silent)Checklist
/idd-close #29after mergeRelated / Follow-ups(documented in verify report,not auto-filed)
--versointypo silent fallthrough(--prefix unknown flag rejection — API design choice)→ che-msg follow-upCLIBootstrap.version↔Server.swift:46↔ wrapperDESIRED_VERSION)psychquant-claude-plugins:wrapper version-match fast-pathpsychquant-claude-plugins:wrapper stderr log file🤖 Generated by
/idd-allPhase 5(PR mode unattended)。Do NOT addCloses #29— IDD discipline requires manual/idd-closeafter merge to enforce checklist gate + closing summary。