test: add e2e replication tests for multi-region validation#394
Draft
WentingWu666666 wants to merge 5 commits into
Draft
test: add e2e replication tests for multi-region validation#394WentingWu666666 wants to merge 5 commits into
WentingWu666666 wants to merge 5 commits into
Conversation
Add a new replication test area that validates DocumentDB cross-cluster
replication semantics within a single Kind cluster, following the same
approach CNPG uses for its replication tests.
Changes:
- Add ReplicationLabel to labels.go and allAreaLabels() in suite_test.go
- Add ReplicationReady (10min) and DataSync (3min) timeout operations
- Create replication mixin template for clusterReplication config
- Create test/e2e/tests/replication/ test area with:
- Suite bootstrap with Ginkgo SynchronizedBeforeSuite/AfterSuite
- Helpers including ExternalName bridge services that simulate
cross-cluster DNS resolution within a single cluster
- Deploy test: deploys primary + replica, verifies CNPG
ReplicaCluster config, pg_basebackup source, and ExternalClusters
- Data replication test: validates bulk insert count, content
fidelity, and update replication via MongoDB wire protocol
All 4 tests pass on a local Kind cluster (~204 seconds).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
b8ac2e1 to
4387913
Compare
Add failover_test.go that validates: - Pre-failover data seeding and replication to replica - Promotion via spec.clusterReplication.primary patch - CNPG cluster role swap (replica→primary, primary→replica) - Pre-existing data accessibility on the new primary - New primary accepts writes after promotion Also adds Failover timeout (10min) to the timeouts package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Consolidate deploy_replication_test.go into data_replication_test.go to avoid duplicate primary+replica deployments. CNPG config assertions now run in the BeforeAll of the data replication spec, cutting total test time by eliminating a redundant ~2 min setup phase. Move findCNPGCluster() helper to helpers_test.go so it is shared by both the data replication and failover test files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
Add a 5th failover test case that writes data on the new primary and verifies it replicates to the demoted replica. This confirms the replication pipeline remains functional after a promotion, with data flowing from the new primary to the demoted node. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
6d006e9 to
d279473
Compare
…ivergence) Add timeline_divergence_test.go with tests that reproduce three sub-issues from issue documentdb#375: - Sub-issue 2: promotionToken not cleared after successful promotion (CNPG cluster reports 'Cluster is unrecoverable') - Second rapid failover: cluster becomes unrecoverable after A→B→A switchback before replication is healthy - Sub-issue 1: replication broken after rapid back-to-back failover (writes fail with 'Exceeded time limit waiting for primary') All three tests are designed to FAIL against the current operator, confirming the bugs exist. They will pass once issue documentdb#375 is fixed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
|
🤖 Auto-triaged by documentdb-triage-tool. Applied: Reasoningcomponent from path globs (test); effort from diff stats (1208+3 LOC, 9 files); LLM: Adds a new e2e test area for multi-region replication validation across multiple files with new labels, timeouts, helpers, and test suites — touches test infrastructure broadly. If a label is wrong, remove it manually and ping |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a new replication test area that validates DocumentDB cross-cluster replication semantics within a single Kind cluster, following the same approach CNPG uses for its replication tests.
Changes
Approach
Since DocumentDB replication is designed for multi-cluster deployments (with service mesh handling DNS), testing within a single cluster requires ExternalName bridge services to CNAME the expected DNS names to actual service FQDNs. This is test-only scaffolding — it does not change production code.
Testing
All 4 tests pass on a local Kind cluster (~204 seconds):
\\�ash
cd test/e2e
ginkgo -r --label-filter=replication ./tests/...
\\