Harden service discovery and game-server readiness handling#28
Closed
VG-prog wants to merge 3 commits into
Closed
Conversation
Add the shared protobuf, generated runtime code, events, GUID helpers, auth identity helpers, and configuration contracts used by clustered gateway, registry, group, guild, matchmaking, and sidecar flows. Tests, local tooling, and broad documentation are intentionally excluded from this upstream-focused scope.
Wire service discovery, map readiness, stale-safe health and metrics observers, degraded game-server health handling, gateway-scoped cleanup, and shared GUID allocation support. Registry and health code now distinguish world-loop degraded state from process or transport death while preserving live map ownership.
Keep world-loop degraded game servers registered for ownership and existing lookups, but mark them as non-admitting so new player placement skips them. Clear the drain state on successful health recovery and fall back to healthy all-map nodes when an assigned owner is degraded.
Author
|
Closing this draft because it was opened from a stacked branch while targeting That replacement PR presents the current integration honestly as one review surface. Sorry for the review noise. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR hardens ToCloud9 service discovery for clustered game-server ownership and player placement.
The important distinction is that a temporarily degraded world loop is not the same as process death, but it also should not be treated as a normal target for new players.
If the sidecar health probe returns
503/504or times out because the world loop did not process the lightweight probe in time, the registry now keeps the game server registered so existing ownership and owner lookups do not disappear. At the same time, it marks that node as non-admitting, so new player placement skips it and prefers healthy candidates. Actual transport or process liveness failures still remove the server and allow reassignment.What changed
503/504style health results.Why this matters
Dropping a live owner during world-loop pressure can split authoritative in-memory state for LFG, battleground, arena, or crossrealm owners. Sending new players to that same overloaded node is also not desirable.
This PR separates those two decisions: degraded nodes stay registered for owner continuity, but they are drained from new placement until they recover. Real process or transport death still removes the node.
Validation
git diff --check origin/master..HEADenv GOCACHE=/tmp/tc9-go-build GOFLAGS=-buildvcs=false go build ./...env GOCACHE=/tmp/tc9-go-build GOFLAGS=-buildvcs=false make install